Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save iree-github-actions-bot/ec65b2a35c2bf2830f96fb3ed5827665 to your computer and use it in GitHub Desktop.
Save iree-github-actions-bot/ec65b2a35c2bf2830f96fb3ed5827665 to your computer and use it in GitHub Desktop.

Full Benchmark Summary

Data-Tiling Comparison Table

Name No-DT (baseline) DT-Only DT-UK
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 733.968 (1.0X) N/A 219.955 (3.3X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7.000 (1.0X) N/A 8.531 (0.8X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 36.126 (1.0X) N/A 34.211 (1.1X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.862 (1.0X) N/A 5.030 (1.2X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.176 (1.0X) N/A 8.494 (1.1X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.136 (1.0X) N/A 8.970 (1.2X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.961 (1.0X) N/A 13.841 (0.9X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.527 (1.0X) N/A 61.049 (0.5X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.119 (1.0X) N/A 61.344 (0.6X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 69.104 (1.0X) N/A 64.549 (1.1X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.057 (1.0X) N/A 4.601 (1.1X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.770 (1.0X) N/A 4.932 (0.8X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.902 (1.0X) N/A 5.430 (1.1X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.873 (1.0X) N/A 2.822 (1.0X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.502 (1.0X) N/A 9.930 (0.9X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.788 (1.0X) N/A 0.657 (1.2X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.208 (1.0X) N/A 5.299 (0.8X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.547 (1.0X) N/A 7.580 (1.0X)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 6.611 (1.0X) N/A 1.807 (3.7X)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 214.945 (1.0X) N/A 107.796 (2.0X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.243 (1.0X) N/A 30.034 (1.1X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 276.804 (1.0X) N/A 229.538 (1.2X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 26.841 (1.0X) N/A 13.044 (2.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 70.669 (1.0X) N/A 39.049 (1.8X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 88.791 (1.0X) N/A 41.602 (2.1X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.335 (1.0X) N/A 59.345 (1.3X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 179.632 (1.0X) N/A 186.517 (1.0X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 181.154 (1.0X) N/A 191.026 (0.9X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 520.569 (1.0X) N/A 244.485 (2.1X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 25.127 (1.0X) N/A 17.864 (1.4X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.753 (1.0X) N/A 11.325 (1.0X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.548 (1.0X) N/A 11.847 (1.8X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.770 (1.0X) N/A 2.695 (1.0X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.167 (1.0X) N/A 31.051 (1.1X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.715 (1.0X) N/A 0.580 (1.2X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.637 (1.0X) N/A 19.533 (0.9X)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.055 (1.0X) N/A 0.055 (1.0X)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.043 (1.0X) N/A 0.022 (2.0X)

Similar Latencies

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 214.945 (vs. 231.192, 7.03%↓) 207.401 15.803
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.057 (vs. 4.844, 4.40%↑) 5.073 0.068
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.637 (vs. 17.978, 1.89%↓) 17.646 0.111
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.873 (vs. 2.924, 1.76%↓) 2.870 0.018
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 39.049 (vs. 39.693, 1.62%↓) 38.412 1.407
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 31.051 (vs. 31.482, 1.37%↓) 30.949 0.301
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 191.026 (vs. 193.429, 1.24%↓) 189.510 3.889
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.822 (vs. 2.793, 1.04%↑) 2.824 0.018
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 25.127 (vs. 24.868, 1.04%↑) 25.139 0.266
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.335 (vs. 78.565, 0.98%↑) 79.185 1.012
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7816.167 (vs. 7741.125, 0.97%↑) 7831.459 27.373
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 107.796 (vs. 108.821, 0.94%↓) 107.541 1.172
matmul\_1x256x2048\_i8\_i8\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.022 (vs. 0.022, 0.94%↓) 0.022 0.000
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7.000 (vs. 7.065, 0.91%↓) 7.007 0.029
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.695 (vs. 2.719, 0.89%↓) 2.693 0.018
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 181.154 (vs. 182.712, 0.85%↓) 180.158 3.170
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.167 (vs. 34.456, 0.84%↓) 34.163 0.280
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 219.955 (vs. 221.694, 0.78%↓) 219.832 1.557
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.136 (vs. 11.049, 0.78%↑) 10.833 0.616
matmul\_1x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.055 (vs. 0.055, 0.76%↑) 0.055 0.000
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.961 (vs. 12.053, 0.76%↓) 11.792 0.295
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.502 (vs. 8.567, 0.76%↓) 8.479 0.049
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.119 (vs. 33.868, 0.74%↑) 33.577 0.925
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.770 (vs. 3.798, 0.74%↓) 3.772 0.015
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 19.533 (vs. 19.673, 0.71%↓) 19.487 0.125
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 13.841 (vs. 13.747, 0.68%↑) 13.862 0.136
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.753 (vs. 11.834, 0.68%↓) 11.735 0.075
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 13.044 (vs. 13.132, 0.67%↓) 13.033 0.038
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.208 (vs. 4.232, 0.58%↓) 4.208 0.016
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.548 (vs. 21.671, 0.57%↓) 21.546 0.082
matmul\_1x256x2048\_i8\_i8\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.043 (vs. 0.043, 0.56%↓) 0.043 0.001
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 36.126 (vs. 35.929, 0.55%↑) 35.381 1.057
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.494 (vs. 8.449, 0.54%↑) 8.460 0.084
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 88.791 (vs. 88.324, 0.53%↑) 88.226 3.759
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.847 (vs. 11.909, 0.52%↓) 11.854 0.059
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 276.804 (vs. 278.211, 0.51%↓) 276.215 3.823
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.430 (vs. 5.403, 0.50%↑) 5.426 0.014
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 733.968 (vs. 730.402, 0.49%↑) 716.974 45.002
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.932 (vs. 4.910, 0.45%↑) 4.929 0.029
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.547 (vs. 7.580, 0.43%↓) 7.547 0.019
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.176 (vs. 9.214, 0.41%↓) 8.936 0.452
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.325 (vs. 11.370, 0.40%↓) 11.323 0.098
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 41.602 (vs. 41.766, 0.39%↓) 40.724 1.738
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.211 (vs. 34.322, 0.32%↓) 34.071 0.362
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.970 (vs. 8.942, 0.31%↑) 8.934 0.101
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.902 (vs. 5.920, 0.31%↓) 5.904 0.016
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 70.669 (vs. 70.462, 0.29%↑) 70.977 2.663
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 61.344 (vs. 61.166, 0.29%↑) 61.027 0.609
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 30.034 (vs. 30.121, 0.29%↓) 30.082 0.150
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.715 (vs. 0.717, 0.28%↓) 0.715 0.001
matmul\_256x256x2048\_i8\_i8\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 1.807 (vs. 1.812, 0.27%↓) 1.807 0.004
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.930 (vs. 9.904, 0.26%↑) 9.924 0.036
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.770 (vs. 2.777, 0.26%↓) 2.767 0.034
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.657 (vs. 0.658, 0.25%↓) 0.656 0.003
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 229.538 (vs. 230.115, 0.25%↓) 227.799 4.102
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 59.345 (vs. 59.201, 0.24%↑) 58.879 1.025
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.862 (vs. 5.876, 0.24%↓) 5.857 0.015
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 64.549 (vs. 64.396, 0.24%↑) 64.406 0.368
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 186.517 (vs. 186.850, 0.18%↓) 185.125 3.134
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.580 (vs. 0.579, 0.17%↑) 0.580 0.001
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.531 (vs. 8.516, 0.17%↑) 8.534 0.026
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.580 (vs. 7.567, 0.17%↑) 7.588 0.033
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.788 (vs. 0.789, 0.16%↓) 0.788 0.001
matmul\_256x256x2048\_i8\_i8\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 6.611 (vs. 6.621, 0.15%↓) 6.623 0.056
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 69.104 (vs. 69.210, 0.15%↓) 68.785 0.571
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.527 (vs. 33.575, 0.14%↓) 32.986 0.948
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 520.569 (vs. 519.870, 0.13%↑) 519.147 2.781
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.601 (vs. 4.595, 0.13%↑) 4.595 0.023
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.243 (vs. 32.280, 0.11%↓) 32.210 0.121
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.030 (vs. 5.026, 0.09%↑) 5.024 0.019
MobileNetV3Small\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 1748.490 (vs. 1747.624, 0.05%↑) 1748.791 0.567
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.299 (vs. 5.297, 0.05%↑) 5.301 0.011
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 26.841 (vs. 26.829, 0.04%↑) 26.841 0.108
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 244.485 (vs. 244.573, 0.04%↓) 244.103 1.225
matmul\_1x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.055 (vs. 0.055, 0.03%↑) 0.055 0.000
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 61.049 (vs. 61.032, 0.03%↑) 60.773 0.528
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 179.632 (vs. 179.660, 0.02%↓) 178.241 2.828
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.864 (vs. 17.863, 0.00%↑) 17.799 0.220

All Compilation Metrics

Benchmark Name Compilation Time (ms) Total Dispatch Size (bytes) Total Artifact Size (bytes) Stream IR Dispatch Count (# of cmd.dispatch ops)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1082 (vs. 1178, 8.15%↓) 4400 (vs. 4400, 0.00%) 273657 (vs. 273657, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1617 (vs. 1882, 14.08%↓) 9680 (vs. 9680, 0.00%) 278905 (vs. 278905, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 967 (vs. 1158, 16.49%↓) 3328 (vs. 3328, 0.00%) 534713 (vs. 534713, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1734 (vs. 1665, 4.14%↑) 8480 (vs. 8480, 0.00%) 539833 (vs. 539833, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 23721 (vs. 23340, 1.63%↑) 144544 (vs. 144544, 0.00%) 399493 (vs. 399493, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 45440 (vs. 39401, 15.33%↑) 238656 (vs. 238656, 0.00%) 10455045 (vs. 10455045, 0.00%) 97 (vs. 97, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 35867 (vs. 35285, 1.65%↑) 177696 (vs. 177696, 0.00%) 2957509 (vs. 2957509, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 55464 (vs. 61628, 10.00%↓) 682752 (vs. 682752, 0.00%) 5603845 (vs. 5603845, 0.00%) 89 (vs. 89, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 21989 (vs. 23908, 8.03%↓) 175008 (vs. 175008, 0.00%) 17092293 (vs. 17092293, 0.00%) 51 (vs. 51, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 34381 (vs. 33397, 2.95%↑) 190512 (vs. 190512, 0.00%) 14172293 (vs. 14172293, 0.00%) 74 (vs. 74, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 62256 (vs. 75200, 17.21%↓) 568880 (vs. 568880, 0.00%) 4216837 (vs. 4216837, 0.00%) 144 (vs. 144, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 47985 (vs. 45540, 5.37%↑) 287728 (vs. 287728, 0.00%) 18226245 (vs. 18226245, 0.00%) 124 (vs. 124, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 24990 (vs. 21082, 18.54%↑) 142464 (vs. 142464, 0.00%) 5195333 (vs. 5195333, 0.00%) 48 (vs. 48, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 66231 (vs. 65933, 0.45%↑) 91888 (vs. 91888, 0.00%) 99892293 (vs. 99892293, 0.00%) 703 (vs. 703, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 64075 (vs. 66136, 3.12%↓) 100448 (vs. 100448, 0.00%) 98413445 (vs. 98413445, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 327691 (vs. 333884, 1.85%↓) 6817872 (vs. 6817872, 0.00%) 33068869 (vs. 33068869, 0.00%) 1053 (vs. 1053, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 105063 (vs. 113267, 7.24%↓) 216688 (vs. 216688, 0.00%) 164493804 (vs. 164493804, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 34101 (vs. 33211, 2.68%↑) 67120 (vs. 67120, 0.00%) 133993839 (vs. 133993839, 0.00%) 185 (vs. 185, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 33631 (vs. 34544, 2.64%↓) 27504 (vs. 27504, 0.00%) 652742996 (vs. 652742996, 0.00%) 221 (vs. 221, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 34752 (vs. 35252, 1.42%↓) 15008 (vs. 15008, 0.00%) 652726804 (vs. 652726804, 0.00%) 246 (vs. 246, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 32126 (vs. 38176, 15.85%↓) 76768 (vs. 76768, 0.00%) 533839615 (vs. 533839615, 0.00%) 188 (vs. 188, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 29919 (vs. 36892, 18.90%↓) 58176 (vs. 58176, 0.00%) 1336009791 (vs. 1336009791, 0.00%) 365 (vs. 365, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 1033 (vs. 1061, 2.64%↓) 4400 (vs. 4400, 0.00%) 273657 (vs. 273657, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 1686 (vs. 1847, 8.72%↓) 9680 (vs. 9680, 0.00%) 278969 (vs. 278969, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 2142 (vs. 2329, 8.03%↓) 2976 (vs. 2976, 0.00%) 534329 (vs. 534329, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 2452 (vs. 2466, 0.57%↓) 6432 (vs. 6432, 0.00%) 538501 (vs. 538501, 0.00%) 3 (vs. 3, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 23523 (vs. 25826, 8.92%↓) 106672 (vs. 106672, 0.00%) 368069 (vs. 368069, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 32367 (vs. 39608, 18.28%↓) 96992 (vs. 96992, 0.00%) 10394437 (vs. 10394437, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 30064 (vs. 35230, 14.66%↓) 114416 (vs. 114416, 0.00%) 2917637 (vs. 2917637, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 44425 (vs. 48643, 8.67%↓) 269312 (vs. 269312, 0.00%) 5215301 (vs. 5215301, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 17623 (vs. 16733, 5.32%↑) 60352 (vs. 60352, 0.00%) 17014213 (vs. 17014213, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 30448 (vs. 26529, 14.77%↑) 95328 (vs. 95328, 0.00%) 14129477 (vs. 14129477, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 57004 (vs. 59495, 4.19%↓) 329856 (vs. 329856, 0.00%) 3999301 (vs. 3999301, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 38825 (vs. 40196, 3.41%↓) 135552 (vs. 135552, 0.00%) 18353669 (vs. 18353669, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 14537 (vs. 17033, 14.65%↓) 44256 (vs. 44256, 0.00%) 5146821 (vs. 5146821, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 86806 (vs. 81616, 6.36%↑) 48496 (vs. 48496, 0.00%) 99982021 (vs. 99982021, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 79538 (vs. 78007, 1.96%↑) 48880 (vs. 48880, 0.00%) 98493893 (vs. 98493893, 0.00%) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 281842 (vs. 288731, 2.39%↓) 3422160 (vs. 3422160, 0.00%) 29827141 (vs. 29827141, 0.00%) 2136 (vs. 2136, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 103361 (vs. 120869, 14.49%↓) 155328 (vs. 155328, 0.00%) 169900268 (vs. 169900268, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 35272 (vs. 41865, 15.75%↓) 36576 (vs. 36576, 0.00%) 219461551 (vs. 219461551, 0.00%) 342 (vs. 342, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 42331 (vs. 45495, 6.95%↓) 17344 (vs. 17344, 0.00%) 992526804 (vs. 992526804, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 40261 (vs. 43495, 7.44%↓) 10272 (vs. 10272, 0.00%) 992522068 (vs. 992522068, 0.00%) 355 (vs. 355, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 43528 (vs. 50042, 13.02%↓) 38224 (vs. 38224, 0.00%) 875850367 (vs. 875850367, 0.00%) 346 (vs. 346, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 53352 (vs. 53843, 0.91%↓) 41568 (vs. 41568, 0.00%) 1336020671 (vs. 1336020671, 0.00%) 678 (vs. 678, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 1191 (vs. 1186, 0.42%↑) 4400 (vs. 4400, 0.00%) 273657 (vs. 273657, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 1602 (vs. 1888, 15.15%↓) 9680 (vs. 9680, 0.00%) 278905 (vs. 278905, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 2559 (vs. 2691, 4.91%↓) 2464 (vs. 2464, 0.00%) 533817 (vs. 533817, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 2896 (vs. 2874, 0.77%↑) 4800 (vs. 4800, 0.00%) 536837 (vs. 536837, 0.00%) 3 (vs. 3, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 25422 (vs. 25568, 0.57%↓) 100464 (vs. 100464, 0.00%) 361861 (vs. 361861, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 32630 (vs. 36202, 9.87%↓) 99152 (vs. 99152, 0.00%) 10396549 (vs. 10396549, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 32305 (vs. 37038, 12.78%↓) 119792 (vs. 119792, 0.00%) 2923013 (vs. 2923013, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 48141 (vs. 51197, 5.97%↓) 260256 (vs. 260256, 0.00%) 5206277 (vs. 5206277, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 17959 (vs. 21161, 15.13%↓) 66256 (vs. 66256, 0.00%) 17020101 (vs. 17020101, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 34217 (vs. 28472, 20.18%↑) 100048 (vs. 100048, 0.00%) 14134149 (vs. 14134149, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 57043 (vs. 56819, 0.39%↑) 325280 (vs. 325280, 0.00%) 3994693 (vs. 3994693, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 40200 (vs. 42272, 4.90%↓) 123312 (vs. 123312, 0.00%) 18341381 (vs. 18341381, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 16889 (vs. 15077, 12.02%↑) 47808 (vs. 47808, 0.00%) 5150341 (vs. 5150341, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 73332 (vs. 85879, 14.61%↓) 39616 (vs. 39616, 0.00%) 99973125 (vs. 99973125, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 82684 (vs. 78296, 5.60%↑) 40016 (vs. 40016, 0.00%) 98484933 (vs. 98484933, 0.00%) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 260983 (vs. 263865, 1.09%↓) 3414288 (vs. 3414288, 0.00%) 29819269 (vs. 29819269, 0.00%) 2136 (vs. 2136, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 104802 (vs. 112586, 6.91%↓) 141760 (vs. 141760, 0.00%) 169886636 (vs. 169886636, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 35142 (vs. 41298, 14.91%↓) 33600 (vs. 33600, 0.00%) 219458607 (vs. 219458607, 0.00%) 342 (vs. 342, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 48185 (vs. 50277, 4.16%↓) 18784 (vs. 18784, 0.00%) 992528276 (vs. 992528276, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 43702 (vs. 46085, 5.17%↓) 11312 (vs. 11312, 0.00%) 992523092 (vs. 992523092, 0.00%) 355 (vs. 355, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 45025 (vs. 55063, 18.23%↓) 33968 (vs. 33968, 0.00%) 875846143 (vs. 875846143, 0.00%) 346 (vs. 346, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 56279 (vs. 50152, 12.22%↑) 38112 (vs. 38112, 0.00%) 1336017151 (vs. 1336017151, 0.00%) 678 (vs. 678, 0.00%)
EfficientNetV2STF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 87036 (vs. 105098, 17.19%↓) 841528 (vs. 841528, 0.00%) 165164692 (vs. 165164692, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 26528 (vs. 28455, 6.77%↓) 184528 (vs. 184528, 0.00%) 134119041 (vs. 134119041, 0.00%) 185 (vs. 185, 0.00%)
BertForMaskedLMTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 36161 (vs. 34137, 5.93%↑) 264548 (vs. 264548, 0.00%) 534033972 (vs. 534033972, 0.00%) 188 (vs. 188, 0.00%)
BertLargeTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 33812 (vs. 36239, 6.70%↓) 168052 (vs. 168052, 0.00%) 1336126133 (vs. 1336126133, 0.00%) 365 (vs. 365, 0.00%)
matmul_3456x1024x2048_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1755 (vs. 1834, 4.31%↓) 30384 (vs. 30384, 0.00%) 42579 (vs. 42579, 0.00%) 1 (vs. 1, 0.00%)
matmul_3456x1024x2048_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1834 (vs. 2056, 10.80%↓) 44912 (vs. 44912, 0.00%) 57107 (vs. 57107, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1625 (vs. 1765, 7.93%↓) 28344 (vs. 28344, 0.00%) 40475 (vs. 40475, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1933 (vs. 2108, 8.30%↓) 41284 (vs. 41284, 0.00%) 53415 (vs. 53415, 0.00%) 1 (vs. 1, 0.00%)
matmul_2564x2564x2564_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2356 (vs. 2459, 4.19%↓) 85956 (vs. 85956, 0.00%) 98087 (vs. 98087, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2564x2562_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2475 (vs. 2533, 2.29%↓) 88832 (vs. 88832, 0.00%) 101027 (vs. 101027, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 3162 (vs. 2864, 10.41%↑) 84036 (vs. 84036, 0.00%) 96231 (vs. 96231, 0.00%) 1 (vs. 1, 0.00%)
matmul_123x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2365 (vs. 2419, 2.23%↓) 51016 (vs. 51016, 0.00%) 63210 (vs. 63210, 0.00%) 1 (vs. 1, 0.00%)
matmul_128x256x8192_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 935 (vs. 1063, 12.04%↓) 9004 (vs. 9004, 0.00%) 28085 (vs. 28085, 0.00%) 2 (vs. 2, 0.00%)
matmul_128x256x8192_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 922 (vs. 1055, 12.61%↓) 9748 (vs. 9748, 0.00%) 28833 (vs. 28833, 0.00%) 2 (vs. 2, 0.00%)
MiniLML12H384Uncased(stablehlo) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 33989 (vs. 37526, 9.43%↓) 55792 (vs. 55792, 0.00%) 133982193 (vs. 133982193, 0.00%) 185 (vs. 185, 0.00%)
DeepLabV3_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 19523 (vs. 19468, 0.28%↑) 42768 (vs. 42768, 0.00%) 2822215 (vs. 2822215, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 37914 (vs. 45164, 16.05%↓) 188816 (vs. 188816, 0.00%) 5109575 (vs. 5109575, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 69789 (vs. 68332, 2.13%↑) 50432 (vs. 50432, 0.00%) 98363079 (vs. 98363079, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 369181 (vs. 386965, 4.60%↓) 3537584 (vs. 3537584, 0.00%) 29788295 (vs. 29788295, 0.00%) 1053 (vs. 1053, 0.00%)
MobileNetV1_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 14174 (vs. 16437, 13.77%↓) 51488 (vs. 51488, 0.00%) 16971207 (vs. 16971207, 0.00%) 65 (vs. 65, 0.00%)
MobileNetV2_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 49482 (vs. 57015, 13.21%↓) 217328 (vs. 217328, 0.00%) 3864967 (vs. 3864967, 0.00%) 144 (vs. 144, 0.00%)
PersonDetect_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 19078 (vs. 17304, 10.25%↑) 59520 (vs. 59520, 0.00%) 314183 (vs. 314183, 0.00%) 60 (vs. 60, 0.00%)
EfficientNet_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 56914 (vs. 59883, 4.96%↓) 393668 (vs. 393668, 0.00%) 5314439 (vs. 5314439, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 442232 (vs. 450071, 1.74%↓) 3825444 (vs. 3825444, 0.00%) 30076167 (vs. 30076167, 0.00%) 1053 (vs. 1053, 0.00%)
PersonDetect_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 21792 (vs. 22197, 1.82%↓) 125236 (vs. 125236, 0.00%) 379847 (vs. 379847, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV2_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 52771 (vs. 55937, 5.66%↓) 330176 (vs. 330176, 0.00%) 3977799 (vs. 3977799, 0.00%) 144 (vs. 144, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 20915 (vs. 19218, 8.83%↑) 56384 (vs. 56384, 0.00%) 2835845 (vs. 2835845, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 57455 (vs. 70063, 18.00%↓) 33856 (vs. 33856, 0.00%) 98346437 (vs. 98346437, 0.00%) 679 (vs. 679, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 33827 (vs. 36162, 6.46%↓) 20208 (vs. 20208, 0.00%) 652735380 (vs. 652735380, 0.00%) 221 (vs. 221, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 29929 (vs. 30076, 0.49%↓) 8992 (vs. 8992, 0.00%) 652720468 (vs. 652720468, 0.00%) 246 (vs. 246, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 296781 (vs. 310712, 4.48%↓) 4939360 (vs. 4939360, 0.00%) 31190085 (vs. 31190085, 0.00%) 1053 (vs. 1053, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 104427 (vs. 101678, 2.70%↑) 819952 (vs. 819952, 0.00%) 88844933 (vs. 88844933, 0.00%) 255 (vs. 255, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 31241 (vs. 32344, 3.41%↓) 50848 (vs. 50848, 0.00%) 2844741 (vs. 2844741, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 83253 (vs. 79977, 4.10%↑) 21840 (vs. 21840, 0.00%) 98466437 (vs. 98466437, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 40339 (vs. 46817, 13.84%↓) 11536 (vs. 11536, 0.00%) 992496148 (vs. 992496148, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 42890 (vs. 51568, 16.83%↓) 9136 (vs. 9136, 0.00%) 992496020 (vs. 992496020, 0.00%) 355 (vs. 355, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 236565 (vs. 242445, 2.43%↓) 1800576 (vs. 1800576, 0.00%) 28205189 (vs. 28205189, 0.00%) 2136 (vs. 2136, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 76028 (vs. 76495, 0.61%↓) 121824 (vs. 121824, 0.00%) 88135429 (vs. 88135429, 0.00%) 375 (vs. 375, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 26230 (vs. 24188, 8.44%↑) 41008 (vs. 41008, 0.00%) 2834885 (vs. 2834885, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 64978 (vs. 68084, 4.56%↓) 22064 (vs. 22064, 0.00%) 98466693 (vs. 98466693, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 43421 (vs. 38417, 13.03%↑) 10736 (vs. 10736, 0.00%) 992495316 (vs. 992495316, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 40530 (vs. 44046, 7.98%↓) 8672 (vs. 8672, 0.00%) 992495572 (vs. 992495572, 0.00%) 355 (vs. 355, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 234814 (vs. 240704, 2.45%↓) 1808112 (vs. 1808112, 0.00%) 28212741 (vs. 28212741, 0.00%) 2136 (vs. 2136, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 63888 (vs. 70255, 9.06%↓) 124384 (vs. 124384, 0.00%) 88137989 (vs. 88137989, 0.00%) 375 (vs. 375, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 941 (vs. 1090, 13.67%↓) 2464 (vs. 2464, 0.00%) 271353 (vs. 271353, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 1462 (vs. 1088, 34.38%↑) 3872 (vs. 3872, 0.00%) 272761 (vs. 272761, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 919 (vs. 1134, 18.96%↓) 2368 (vs. 2368, 0.00%) 533433 (vs. 533433, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 965 (vs. 1187, 18.70%↓) 3088 (vs. 3088, 0.00%) 534137 (vs. 534137, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 1920 (vs. 2124, 9.60%↓) 3504 (vs. 3504, 0.00%) 273093 (vs. 273093, 0.00%) 2 (vs. 2, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 2375 (vs. 2554, 7.01%↓) 4528 (vs. 4528, 0.00%) 274693 (vs. 274693, 0.00%) 4 (vs. 4, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 2555 (vs. 2528, 1.07%↑) 2368 (vs. 2368, 0.00%) 533433 (vs. 533433, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 2740 (vs. 2768, 1.01%↓) 3360 (vs. 3360, 0.00%) 535109 (vs. 535109, 0.00%) 3 (vs. 3, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 1363 (vs. 1582, 13.84%↓) 4144 (vs. 4144, 0.00%) 273733 (vs. 273733, 0.00%) 2 (vs. 2, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 1896 (vs. 2199, 13.78%↓) 6496 (vs. 6496, 0.00%) 276677 (vs. 276677, 0.00%) 4 (vs. 4, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 2029 (vs. 2093, 3.06%↓) 2640 (vs. 2640, 0.00%) 533689 (vs. 533689, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 2232 (vs. 2421, 7.81%↓) 4352 (vs. 4352, 0.00%) 536069 (vs. 536069, 0.00%) 3 (vs. 3, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 55937 (vs. 59593, 6.13%↓) 257480 (vs. 257480, 0.00%) 98583085 (vs. 98583085, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,compile-stats] 67141 (vs. 64610, 3.92%↑) 257480 (vs. 257480, 0.00%) 98583085 (vs. 98583085, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 67505 (vs. 62952, 7.23%↑) 148236 (vs. 148236, 0.00%) 98473901 (vs. 98473901, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,demote-f32-to-f16,compile-stats] 105987 (vs. 114323, 7.29%↓) 3178840 (vs. 3178840, 0.00%) 53160559 (vs. 53160559, 0.00%) 703 (vs. 703, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 198114 (vs. 209226, 5.31%↓) 7145960 (vs. 7145960, 0.00%) 33672074 (vs. 33672074, 0.00%) 1053 (vs. 1053, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 67229 (vs. 76425, 12.03%↓) 148220 (vs. 148220, 0.00%) 98475629 (vs. 98475629, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16,compile-stats] 74486 (vs. 79236, 5.99%↓) 3180856 (vs. 3180856, 0.00%) 53169775 (vs. 53169775, 0.00%) 703 (vs. 703, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 188389 (vs. 209481, 10.07%↓) 7144540 (vs. 7144540, 0.00%) 33657802 (vs. 33657802, 0.00%) 1053 (vs. 1053, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 106685 (vs. 116996, 8.81%↓) 148220 (vs. 148220, 0.00%) 99696557 (vs. 99696557, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16,compile-stats] 123124 (vs. 128889, 4.47%↓) 3180856 (vs. 3180856, 0.00%) 54433775 (vs. 54433775, 0.00%) 703 (vs. 703, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 331124 (vs. 359107, 7.79%↓) 7144540 (vs. 7144540, 0.00%) 35551114 (vs. 35551114, 0.00%) 1053 (vs. 1053, 0.00%)
MobileNetV2_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 21750 (vs. 20976, 3.69%↑) 189941 (vs. 189941, 0.00%) 14185150 (vs. 14185150, 0.00%) 171 (vs. 171, 0.00%)
MobileNetV3Small_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 26859 (vs. 29568, 9.16%↓) 277813 (vs. 277813, 0.00%) 10508286 (vs. 10508286, 0.00%) 208 (vs. 208, 0.00%)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment