Device | PVC | MTL | DG2 | LNL/BMG(TODO) | ARL(TODO) |
---|---|---|---|---|---|
ISA | Xe | Xe-lpg | Xe-hpg | Xe2 | Xe-lpg+ |
DPAS | 8,8,16 | NA | 8,8,8 | 8,4,16 | 8,8,8 |
2D Block | 32, 64 | NA | NA | 32, 64 | NA |
1D Block | 64 | 32 | 32 | 64 | 32 |
+template <>
+struct arch_attr_t<gpu_arch::Xe2> {
+ template <msg_type message_type = msg_type::block_2d>
+ using load_store_attr = load_store_attr_t<message_type, gpu_arch::Xe2>;
+
+ template <grf_mode grf_num_mode = grf_mode::double_grf>
+ using register_attr = register_attr_t<grf_num_mode, gpu_arch::Xe2>;
+
+ using dpas_attr = dpas_attr_t<gpu_arch::Xe2>;
+
+ static constexpr uint32_t max_wg_num = 16;
+ static constexpr uint32_t local_mem_size = 128 * 1024;
+};
graph TD
H[INT4 type] --> B
I[Block size] -->B
J[Layout] --> B
A[Perf tuning] --> D[compute policy]
B[Quantization info] --> D
C[MMA engine] --> D
Q[Arch] --> D
D --> G[micro GEMM kernel]
G --> E[GEMM kernel]
O[Epilogue] --> E
P[Group Dispatch] --> E
graph LR
A[Activation] -- PrologueA --- C
C[GetActivation in SLM] --> D[GemmCore]
E[Weight in HBM] --PrologueB --- G
G[GetWeight in SLM] --> D
D --> H[Accumalator]
H --> I[Epilogue]