HW Target

Device	PVC	MTL	DG2	LNL/BMG(TODO)	ARL(TODO)
ISA	Xe	Xe-lpg	Xe-hpg	Xe2	Xe-lpg+
DPAS	8,8,16	NA	8,8,8	8,4,16	8,8,8
2D Block	32, 64	NA	NA	32, 64	NA
1D Block	64	32	32	64	32

How to Add a new HW

+template <>
+struct arch_attr_t<gpu_arch::Xe2> {
+  template <msg_type message_type = msg_type::block_2d>
+  using load_store_attr = load_store_attr_t<message_type, gpu_arch::Xe2>;
+
+  template <grf_mode grf_num_mode = grf_mode::double_grf>
+  using register_attr = register_attr_t<grf_num_mode, gpu_arch::Xe2>;
+
+  using dpas_attr = dpas_attr_t<gpu_arch::Xe2>;
+
+  static constexpr uint32_t max_wg_num = 16;
+  static constexpr uint32_t local_mem_size = 128 * 1024;
+};

Workflow of INT4 GEMM

graph TD
    H[INT4 type] --> B
    I[Block size] -->B
    J[Layout] --> B
    A[Perf tuning] --> D[compute policy]
    B[Quantization info] --> D
    C[MMA engine] --> D
    Q[Arch] --> D
    D --> G[micro GEMM kernel]
    G --> E[GEMM kernel]
    O[Epilogue] --> E
    P[Group Dispatch] --> E

graph LR
   A[Activation] -- PrologueA --- C
   C[GetActivation in SLM] --> D[GemmCore]
   E[Weight in HBM] --PrologueB --- G
   G[GetWeight in SLM] --> D
   D --> H[Accumalator]
   H --> I[Epilogue]

airMeng/XeTLA.md

HW Target

How to Add a new HW

Workflow of INT4 GEMM

Int4 Recommended pattern