Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save iree-github-actions-bot/a5580e9be94b43f9b0c1b8ae3bb7e65c to your computer and use it in GitHub Desktop.
Save iree-github-actions-bot/a5580e9be94b43f9b0c1b8ae3bb7e65c to your computer and use it in GitHub Desktop.

Full Benchmark Summary

Data-Tiling Comparison Table

Name No-DT (baseline) DT-Only DT-UK
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 750.499 (1.0X) N/A 223.911 (3.4X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 6.977 (1.0X) N/A 8.607 (0.8X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 36.091 (1.0X) N/A 34.339 (1.1X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.834 (1.0X) N/A 5.053 (1.2X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.304 (1.0X) N/A 8.603 (1.1X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.243 (1.0X) N/A 9.057 (1.2X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.020 (1.0X) N/A 13.928 (0.9X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.193 (1.0X) N/A 61.286 (0.6X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.783 (1.0X) N/A 61.411 (0.6X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 69.265 (1.0X) N/A 64.393 (1.1X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.738 (1.0X) N/A 4.619 (1.0X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.767 (1.0X) N/A 4.991 (0.8X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.874 (1.0X) N/A 5.456 (1.1X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.870 (1.0X) N/A 2.856 (1.0X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.563 (1.0X) N/A 9.951 (0.9X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.788 (1.0X) N/A 0.660 (1.2X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.184 (1.0X) N/A 5.268 (0.8X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.596 (1.0X) N/A 7.566 (1.0X)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 6.603 (1.0X) N/A 1.811 (3.6X)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 217.947 (1.0X) N/A 108.150 (2.0X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.405 (1.0X) N/A 30.164 (1.1X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 276.525 (1.0X) N/A 229.794 (1.2X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 26.961 (1.0X) N/A 13.183 (2.0X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 71.273 (1.0X) N/A 39.480 (1.8X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 91.251 (1.0X) N/A 41.914 (2.2X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.759 (1.0X) N/A 59.271 (1.3X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.011 (1.0X) N/A 187.447 (1.0X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.808 (1.0X) N/A 190.906 (0.9X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 520.433 (1.0X) N/A 244.176 (2.1X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 24.380 (1.0X) N/A 17.990 (1.4X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.965 (1.0X) N/A 11.260 (1.1X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.697 (1.0X) N/A 11.875 (1.8X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.818 (1.0X) N/A 2.711 (1.0X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.447 (1.0X) N/A 33.358 (1.0X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.717 (1.0X) N/A 0.580 (1.2X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.921 (1.0X) N/A 19.722 (0.9X)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.054 (1.0X) N/A 0.055 (1.0X)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.043 (1.0X) N/A 0.022 (2.0X)

Similar Latencies

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 750.499 (vs. 797.276, 5.87%↓) 735.927 44.226
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.358 (vs. 31.750, 5.07%↑) 33.288 0.419
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.020 (vs. 12.570, 4.38%↓) 11.799 0.343
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 19.722 (vs. 19.067, 3.43%↑) 19.681 0.111
matmul\_1x256x2048\_i8\_i8\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.022 (vs. 0.023, 3.37%↓) 0.022 0.000
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 217.947 (vs. 213.997, 1.85%↑) 212.147 15.705
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 223.911 (vs. 228.098, 1.84%↓) 223.938 2.429
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 36.091 (vs. 36.683, 1.61%↓) 35.373 1.054
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 61.411 (vs. 62.307, 1.44%↓) 61.090 0.603
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 190.906 (vs. 193.604, 1.39%↓) 188.895 3.288
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.965 (vs. 12.126, 1.33%↓) 11.945 0.139
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 61.286 (vs. 62.090, 1.30%↓) 61.062 0.532
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.339 (vs. 34.782, 1.27%↓) 34.147 0.360
matmul\_1x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.055 (vs. 0.055, 1.25%↑) 0.055 0.000
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.990 (vs. 17.769, 1.24%↑) 17.927 0.163
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.603 (vs. 8.708, 1.20%↓) 8.571 0.083
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 91.251 (vs. 90.214, 1.15%↑) 91.801 3.398
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.767 (vs. 3.808, 1.09%↓) 3.765 0.020
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.193 (vs. 33.844, 1.03%↑) 33.834 0.779
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.738 (vs. 4.786, 1.01%↓) 4.741 0.064
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.818 (vs. 2.790, 1.01%↑) 2.813 0.027
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 6.977 (vs. 7.048, 1.00%↓) 6.982 0.039
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 24.380 (vs. 24.623, 0.99%↓) 24.296 0.226
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.447 (vs. 34.775, 0.94%↓) 34.416 0.268
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.711 (vs. 2.687, 0.91%↑) 2.716 0.026
matmul\_256x256x2048\_i8\_i8\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 6.603 (vs. 6.663, 0.91%↓) 6.611 0.017
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 71.273 (vs. 70.655, 0.87%↑) 71.462 2.666
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.304 (vs. 9.225, 0.86%↑) 9.072 0.459
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.951 (vs. 10.035, 0.84%↓) 9.952 0.048
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.834 (vs. 5.882, 0.81%↓) 5.833 0.016
matmul\_1x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.054 (vs. 0.055, 0.78%↓) 0.054 0.000
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 64.393 (vs. 64.896, 0.77%↓) 64.261 0.466
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.260 (vs. 11.340, 0.70%↓) 11.245 0.115
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 108.150 (vs. 107.428, 0.67%↑) 107.958 1.203
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.405 (vs. 32.190, 0.67%↑) 32.444 0.185
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.874 (vs. 5.913, 0.67%↓) 5.872 0.016
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.596 (vs. 7.547, 0.65%↑) 7.591 0.009
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.870 (vs. 2.852, 0.62%↑) 2.871 0.015
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.268 (vs. 5.236, 0.61%↑) 5.266 0.018
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 13.183 (vs. 13.103, 0.60%↑) 13.154 0.075
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 30.164 (vs. 29.990, 0.58%↑) 30.175 0.144
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.697 (vs. 21.576, 0.56%↑) 21.689 0.055
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.053 (vs. 5.081, 0.55%↓) 5.053 0.014
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.580 (vs. 0.583, 0.52%↓) 0.580 0.001
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.783 (vs. 33.623, 0.48%↑) 33.367 0.874
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.991 (vs. 4.968, 0.47%↑) 4.991 0.018
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.921 (vs. 17.839, 0.46%↑) 17.878 0.106
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 26.961 (vs. 26.859, 0.38%↑) 26.947 0.090
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7347.834 (vs. 7320.634, 0.37%↑) 7349.913 4.745
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 69.265 (vs. 69.523, 0.37%↓) 69.131 0.545
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.856 (vs. 2.847, 0.31%↑) 2.853 0.017
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.566 (vs. 7.543, 0.31%↑) 7.562 0.018
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.619 (vs. 4.632, 0.28%↓) 4.617 0.020
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 276.525 (vs. 277.296, 0.28%↓) 275.293 4.511
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.011 (vs. 180.502, 0.27%↓) 179.101 2.746
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 229.794 (vs. 230.379, 0.25%↓) 227.891 3.851
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.875 (vs. 11.902, 0.23%↓) 11.841 0.074
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.184 (vs. 4.193, 0.21%↓) 4.190 0.025
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 41.914 (vs. 41.995, 0.19%↓) 41.033 1.664
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 59.271 (vs. 59.384, 0.19%↓) 58.845 0.970
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.243 (vs. 11.223, 0.18%↑) 10.867 0.648
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.788 (vs. 0.787, 0.14%↑) 0.788 0.001
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.057 (vs. 9.046, 0.12%↑) 9.022 0.088
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.660 (vs. 0.660, 0.12%↑) 0.660 0.003
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.563 (vs. 8.573, 0.11%↓) 8.555 0.066
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 520.433 (vs. 519.886, 0.11%↑) 519.417 2.752
MobileNetV3Small\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 1645.632 (vs. 1644.081, 0.09%↑) 1645.628 0.301
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 13.928 (vs. 13.940, 0.08%↓) 13.901 0.105
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 39.480 (vs. 39.450, 0.08%↑) 38.941 1.404
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 244.176 (vs. 244.355, 0.07%↓) 243.613 1.366
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.808 (vs. 180.686, 0.07%↑) 179.734 2.780
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.717 (vs. 0.716, 0.05%↑) 0.717 0.001
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.456 (vs. 5.454, 0.04%↑) 5.455 0.012
matmul\_256x256x2048\_i8\_i8\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 1.811 (vs. 1.811, 0.03%↓) 1.810 0.006
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.607 (vs. 8.608, 0.01%↓) 8.606 0.053
matmul\_1x256x2048\_i8\_i8\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.043 (vs. 0.043, 0.00%↓) 0.043 0.000
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 187.447 (vs. 187.450, 0.00%↓) 186.599 3.001
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.759 (vs. 79.759, 0.00%↑) 79.463 1.135

All Compilation Metrics

Benchmark Name Compilation Time (ms) Total Dispatch Size (bytes) Total Artifact Size (bytes) Stream IR Dispatch Count (# of cmd.dispatch ops)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 961 (vs. 1175, 18.21%↓) 4400 (vs. 4400, 0.00%) 273657 (vs. 273657, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1634 (vs. 1688, 3.20%↓) 9680 (vs. 9680, 0.00%) 278905 (vs. 278905, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 915 (vs. 1010, 9.41%↓) 3328 (vs. 3328, 0.00%) 534713 (vs. 534713, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1440 (vs. 1645, 12.46%↓) 8480 (vs. 8480, 0.00%) 539833 (vs. 539833, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 23365 (vs. 23714, 1.47%↓) 144544 (vs. 144544, 0.00%) 399621 (vs. 399621, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 40274 (vs. 38509, 4.58%↑) 238656 (vs. 238656, 0.00%) 10455045 (vs. 10455045, 0.00%) 97 (vs. 97, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 32601 (vs. 31158, 4.63%↑) 177696 (vs. 177696, 0.00%) 2957509 (vs. 2957509, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 57183 (vs. 62393, 8.35%↓) 682752 (vs. 682752, 0.00%) 5604037 (vs. 5604037, 0.00%) 89 (vs. 89, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 27051 (vs. 23468, 15.27%↑) 175008 (vs. 175008, 0.00%) 17092357 (vs. 17092357, 0.00%) 51 (vs. 51, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 28243 (vs. 31469, 10.25%↓) 190512 (vs. 190512, 0.00%) 14172293 (vs. 14172293, 0.00%) 74 (vs. 74, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 62841 (vs. 63166, 0.51%↓) 568880 (vs. 568880, 0.00%) 4216965 (vs. 4216965, 0.00%) 144 (vs. 144, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 45229 (vs. 42086, 7.47%↑) 287728 (vs. 287728, 0.00%) 18226373 (vs. 18226373, 0.00%) 124 (vs. 124, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 20259 (vs. 19324, 4.84%↑) 142464 (vs. 142464, 0.00%) 5195333 (vs. 5195333, 0.00%) 48 (vs. 48, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 63484 (vs. 67373, 5.77%↓) 91888 (vs. 91888, 0.00%) 99892421 (vs. 99892421, 0.00%) 703 (vs. 703, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 63604 (vs. 71309, 10.81%↓) 100448 (vs. 100448, 0.00%) 98413509 (vs. 98413509, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 320328 (vs. 325240, 1.51%↓) 6817872 (vs. 6817872, 0.00%) 33068933 (vs. 33068933, 0.00%) 1053 (vs. 1053, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 97655 (vs. 101888, 4.15%↓) 216688 (vs. 216688, 0.00%) 164493804 (vs. 164493804, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 27439 (vs. 28119, 2.42%↓) 67120 (vs. 67120, 0.00%) 133993839 (vs. 133993839, 0.00%) 185 (vs. 185, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 33696 (vs. 34627, 2.69%↓) 27504 (vs. 27504, 0.00%) 652743124 (vs. 652743124, 0.00%) 221 (vs. 221, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 29793 (vs. 35999, 17.24%↓) 15008 (vs. 15008, 0.00%) 652726868 (vs. 652726868, 0.00%) 246 (vs. 246, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 35185 (vs. 35349, 0.46%↓) 76768 (vs. 76768, 0.00%) 533839679 (vs. 533839679, 0.00%) 188 (vs. 188, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 31511 (vs. 33395, 5.64%↓) 58176 (vs. 58176, 0.00%) 1336009791 (vs. 1336009791, 0.00%) 365 (vs. 365, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 961 (vs. 1197, 19.72%↓) 4400 (vs. 4400, 0.00%) 273657 (vs. 273657, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 1527 (vs. 1686, 9.43%↓) 9680 (vs. 9680, 0.00%) 278969 (vs. 278969, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 1983 (vs. 2122, 6.55%↓) 2976 (vs. 2976, 0.00%) 534329 (vs. 534329, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 2210 (vs. 2448, 9.72%↓) 6432 (vs. 6432, 0.00%) 538565 (vs. 538565, 0.00%) 3 (vs. 3, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 25393 (vs. 22825, 11.25%↑) 106672 (vs. 106672, 0.00%) 368197 (vs. 368197, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 34205 (vs. 35238, 2.93%↓) 96992 (vs. 96992, 0.00%) 10394437 (vs. 10394437, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 28937 (vs. 27164, 6.53%↑) 114416 (vs. 114416, 0.00%) 2917765 (vs. 2917765, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 45182 (vs. 47893, 5.66%↓) 269312 (vs. 269312, 0.00%) 5215557 (vs. 5215557, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 18643 (vs. 18345, 1.62%↑) 60352 (vs. 60352, 0.00%) 17014277 (vs. 17014277, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 26516 (vs. 29336, 9.61%↓) 95328 (vs. 95328, 0.00%) 14129477 (vs. 14129477, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 54353 (vs. 52566, 3.40%↑) 329856 (vs. 329856, 0.00%) 3999429 (vs. 3999429, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 37713 (vs. 42338, 10.92%↓) 135552 (vs. 135552, 0.00%) 18353861 (vs. 18353861, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 16159 (vs. 14098, 14.62%↑) 44256 (vs. 44256, 0.00%) 5146949 (vs. 5146949, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 80595 (vs. 84624, 4.76%↓) 48496 (vs. 48496, 0.00%) 99982149 (vs. 99982149, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 74622 (vs. 78766, 5.26%↓) 48880 (vs. 48880, 0.00%) 98494021 (vs. 98494021, 0.00%) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 268127 (vs. 273839, 2.09%↓) 3422160 (vs. 3422160, 0.00%) 29827269 (vs. 29827269, 0.00%) 2136 (vs. 2136, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 105801 (vs. 107572, 1.65%↓) 155328 (vs. 155328, 0.00%) 169900268 (vs. 169900268, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 38717 (vs. 35009, 10.59%↑) 36576 (vs. 36576, 0.00%) 219461551 (vs. 219461551, 0.00%) 342 (vs. 342, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 44353 (vs. 43867, 1.11%↑) 17344 (vs. 17344, 0.00%) 992526932 (vs. 992526932, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 39329 (vs. 39531, 0.51%↓) 10272 (vs. 10272, 0.00%) 992522132 (vs. 992522132, 0.00%) 355 (vs. 355, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 44993 (vs. 41698, 7.90%↑) 38224 (vs. 38224, 0.00%) 875850431 (vs. 875850431, 0.00%) 346 (vs. 346, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 47298 (vs. 51167, 7.56%↓) 41568 (vs. 41568, 0.00%) 1336020735 (vs. 1336020735, 0.00%) 678 (vs. 678, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 970 (vs. 1233, 21.33%↓) 4400 (vs. 4400, 0.00%) 273657 (vs. 273657, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 1577 (vs. 1745, 9.63%↓) 9680 (vs. 9680, 0.00%) 278969 (vs. 278969, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 2397 (vs. 2743, 12.61%↓) 2464 (vs. 2464, 0.00%) 533817 (vs. 533817, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 2657 (vs. 2951, 9.96%↓) 4800 (vs. 4800, 0.00%) 536901 (vs. 536901, 0.00%) 3 (vs. 3, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 23784 (vs. 22788, 4.37%↑) 100464 (vs. 100464, 0.00%) 361989 (vs. 361989, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 33412 (vs. 33650, 0.71%↓) 99152 (vs. 99152, 0.00%) 10396549 (vs. 10396549, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 30163 (vs. 28437, 6.07%↑) 119792 (vs. 119792, 0.00%) 2923141 (vs. 2923141, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 46267 (vs. 46676, 0.88%↓) 260256 (vs. 260256, 0.00%) 5206533 (vs. 5206533, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 18270 (vs. 17957, 1.74%↑) 66256 (vs. 66256, 0.00%) 17020165 (vs. 17020165, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 29392 (vs. 32648, 9.97%↓) 100048 (vs. 100048, 0.00%) 14134149 (vs. 14134149, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 51055 (vs. 52665, 3.06%↓) 325280 (vs. 325280, 0.00%) 3994885 (vs. 3994885, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 41486 (vs. 38978, 6.43%↑) 123312 (vs. 123312, 0.00%) 18341637 (vs. 18341637, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 14501 (vs. 16010, 9.43%↓) 47808 (vs. 47808, 0.00%) 5150469 (vs. 5150469, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 72964 (vs. 77003, 5.25%↓) 39616 (vs. 39616, 0.00%) 99973253 (vs. 99973253, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 73602 (vs. 76975, 4.38%↓) 40016 (vs. 40016, 0.00%) 98485125 (vs. 98485125, 0.00%) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 252817 (vs. 253023, 0.08%↓) 3414288 (vs. 3414288, 0.00%) 29819397 (vs. 29819397, 0.00%) 2136 (vs. 2136, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 104648 (vs. 105587, 0.89%↓) 141760 (vs. 141760, 0.00%) 169886636 (vs. 169886636, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 34227 (vs. 33656, 1.70%↑) 33600 (vs. 33600, 0.00%) 219458607 (vs. 219458607, 0.00%) 342 (vs. 342, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 43757 (vs. 43959, 0.46%↓) 18784 (vs. 18784, 0.00%) 992528404 (vs. 992528404, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 44286 (vs. 44565, 0.63%↓) 11312 (vs. 11312, 0.00%) 992523156 (vs. 992523156, 0.00%) 355 (vs. 355, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 45064 (vs. 42588, 5.81%↑) 33968 (vs. 33968, 0.00%) 875846207 (vs. 875846207, 0.00%) 346 (vs. 346, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 49846 (vs. 52194, 4.50%↓) 38112 (vs. 38112, 0.00%) 1336017279 (vs. 1336017279, 0.00%) 678 (vs. 678, 0.00%)
EfficientNetV2STF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 90213 (vs. 94320, 4.35%↓) 828172 (vs. 828172, 0.00%) 165151316 (vs. 165151316, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 26042 (vs. 23947, 8.75%↑) 184528 (vs. 184528, 0.00%) 134119041 (vs. 134119041, 0.00%) 185 (vs. 185, 0.00%)
BertForMaskedLMTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 33630 (vs. 32538, 3.36%↑) 178332 (vs. 178332, 0.00%) 533947828 (vs. 533947828, 0.00%) 188 (vs. 188, 0.00%)
BertLargeTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 30886 (vs. 30488, 1.31%↑) 168052 (vs. 168052, 0.00%) 1336126133 (vs. 1336126133, 0.00%) 365 (vs. 365, 0.00%)
matmul_3456x1024x2048_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1630 (vs. 1674, 2.63%↓) 30384 (vs. 30384, 0.00%) 42579 (vs. 42579, 0.00%) 1 (vs. 1, 0.00%)
matmul_3456x1024x2048_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1847 (vs. 1976, 6.53%↓) 44912 (vs. 44912, 0.00%) 57107 (vs. 57107, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1589 (vs. 1620, 1.91%↓) 28344 (vs. 28344, 0.00%) 40475 (vs. 40475, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1799 (vs. 1793, 0.33%↑) 41284 (vs. 41284, 0.00%) 53415 (vs. 53415, 0.00%) 1 (vs. 1, 0.00%)
matmul_2564x2564x2564_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2249 (vs. 2137, 5.24%↑) 85956 (vs. 85956, 0.00%) 98087 (vs. 98087, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2564x2562_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2191 (vs. 2210, 0.86%↓) 88832 (vs. 88832, 0.00%) 101027 (vs. 101027, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2734 (vs. 2856, 4.27%↓) 84036 (vs. 84036, 0.00%) 96231 (vs. 96231, 0.00%) 1 (vs. 1, 0.00%)
matmul_123x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2021 (vs. 2113, 4.35%↓) 51016 (vs. 51016, 0.00%) 63210 (vs. 63210, 0.00%) 1 (vs. 1, 0.00%)
matmul_128x256x8192_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 850 (vs. 974, 12.73%↓) 9004 (vs. 9004, 0.00%) 28085 (vs. 28085, 0.00%) 2 (vs. 2, 0.00%)
matmul_128x256x8192_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 835 (vs. 973, 14.18%↓) 9748 (vs. 9748, 0.00%) 28833 (vs. 28833, 0.00%) 2 (vs. 2, 0.00%)
MiniLML12H384Uncased(stablehlo) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 30824 (vs. 32688, 5.70%↓) 55792 (vs. 55792, 0.00%) 133982193 (vs. 133982193, 0.00%) 185 (vs. 185, 0.00%)
DeepLabV3_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 17273 (vs. 20614, 16.21%↓) 42768 (vs. 42768, 0.00%) 2822215 (vs. 2822215, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 40658 (vs. 37547, 8.29%↑) 188816 (vs. 188816, 0.00%) 5109767 (vs. 5109767, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 63611 (vs. 65603, 3.04%↓) 50432 (vs. 50432, 0.00%) 98363143 (vs. 98363143, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 365882 (vs. 374612, 2.33%↓) 3537584 (vs. 3537584, 0.00%) 29788423 (vs. 29788423, 0.00%) 1053 (vs. 1053, 0.00%)
MobileNetV1_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 16714 (vs. 15347, 8.91%↑) 51488 (vs. 51488, 0.00%) 16971271 (vs. 16971271, 0.00%) 65 (vs. 65, 0.00%)
MobileNetV2_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 47315 (vs. 46139, 2.55%↑) 217328 (vs. 217328, 0.00%) 3865095 (vs. 3865095, 0.00%) 144 (vs. 144, 0.00%)
PersonDetect_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 17352 (vs. 18653, 6.97%↓) 59520 (vs. 59520, 0.00%) 314311 (vs. 314311, 0.00%) 60 (vs. 60, 0.00%)
EfficientNet_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 50344 (vs. 53604, 6.08%↓) 393668 (vs. 393668, 0.00%) 5314631 (vs. 5314631, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 428886 (vs. 437838, 2.04%↓) 3825444 (vs. 3825444, 0.00%) 30076295 (vs. 30076295, 0.00%) 1053 (vs. 1053, 0.00%)
PersonDetect_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 19927 (vs. 20027, 0.50%↓) 125236 (vs. 125236, 0.00%) 379975 (vs. 379975, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV2_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 53167 (vs. 58336, 8.86%↓) 330176 (vs. 330176, 0.00%) 3977927 (vs. 3977927, 0.00%) 144 (vs. 144, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 17949 (vs. 19041, 5.73%↓) 56384 (vs. 56384, 0.00%) 2835845 (vs. 2835845, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 60690 (vs. 61685, 1.61%↓) 33856 (vs. 33856, 0.00%) 98346565 (vs. 98346565, 0.00%) 679 (vs. 679, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 33368 (vs. 35134, 5.03%↓) 20208 (vs. 20208, 0.00%) 652735444 (vs. 652735444, 0.00%) 221 (vs. 221, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 28814 (vs. 31840, 9.50%↓) 8992 (vs. 8992, 0.00%) 652720532 (vs. 652720532, 0.00%) 246 (vs. 246, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 296721 (vs. 298017, 0.43%↓) 4939360 (vs. 4939360, 0.00%) 31190149 (vs. 31190149, 0.00%) 1053 (vs. 1053, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 97102 (vs. 103188, 5.90%↓) 819952 (vs. 819952, 0.00%) 88844933 (vs. 88844933, 0.00%) 255 (vs. 255, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 27342 (vs. 24185, 13.05%↑) 50848 (vs. 50848, 0.00%) 2844869 (vs. 2844869, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 74034 (vs. 73884, 0.20%↑) 21840 (vs. 21840, 0.00%) 98466565 (vs. 98466565, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 45356 (vs. 44231, 2.54%↑) 11536 (vs. 11536, 0.00%) 992496276 (vs. 992496276, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 43288 (vs. 40977, 5.64%↑) 9136 (vs. 9136, 0.00%) 992496084 (vs. 992496084, 0.00%) 355 (vs. 355, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 221906 (vs. 227715, 2.55%↓) 1800576 (vs. 1800576, 0.00%) 28205381 (vs. 28205381, 0.00%) 2136 (vs. 2136, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 74521 (vs. 79535, 6.30%↓) 121824 (vs. 121824, 0.00%) 88135429 (vs. 88135429, 0.00%) 375 (vs. 375, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 21503 (vs. 21538, 0.16%↓) 41008 (vs. 41008, 0.00%) 2835013 (vs. 2835013, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 68695 (vs. 72309, 5.00%↓) 22064 (vs. 22064, 0.00%) 98466821 (vs. 98466821, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 37643 (vs. 47206, 20.26%↓) 10736 (vs. 10736, 0.00%) 992495444 (vs. 992495444, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 36433 (vs. 37645, 3.22%↓) 8672 (vs. 8672, 0.00%) 992495636 (vs. 992495636, 0.00%) 355 (vs. 355, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 228346 (vs. 230927, 1.12%↓) 1808112 (vs. 1808112, 0.00%) 28212933 (vs. 28212933, 0.00%) 2136 (vs. 2136, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 65220 (vs. 70549, 7.55%↓) 124384 (vs. 124384, 0.00%) 88137989 (vs. 88137989, 0.00%) 375 (vs. 375, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 842 (vs. 974, 13.55%↓) 2464 (vs. 2464, 0.00%) 271353 (vs. 271353, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 945 (vs. 1367, 30.87%↓) 3872 (vs. 3872, 0.00%) 272761 (vs. 272761, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 819 (vs. 906, 9.60%↓) 2368 (vs. 2368, 0.00%) 533433 (vs. 533433, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 893 (vs. 941, 5.10%↓) 3088 (vs. 3088, 0.00%) 534137 (vs. 534137, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 1998 (vs. 2049, 2.49%↓) 3504 (vs. 3504, 0.00%) 273093 (vs. 273093, 0.00%) 2 (vs. 2, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 2198 (vs. 2133, 3.05%↑) 4528 (vs. 4528, 0.00%) 274693 (vs. 274693, 0.00%) 4 (vs. 4, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 2396 (vs. 2379, 0.71%↑) 2368 (vs. 2368, 0.00%) 533433 (vs. 533433, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 2585 (vs. 2561, 0.94%↑) 3360 (vs. 3360, 0.00%) 535109 (vs. 535109, 0.00%) 3 (vs. 3, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 991 (vs. 1434, 30.89%↓) 4144 (vs. 4144, 0.00%) 273733 (vs. 273733, 0.00%) 2 (vs. 2, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 1620 (vs. 1719, 5.76%↓) 6496 (vs. 6496, 0.00%) 276677 (vs. 276677, 0.00%) 4 (vs. 4, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 2026 (vs. 1908, 6.18%↑) 2640 (vs. 2640, 0.00%) 533689 (vs. 533689, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 2118 (vs. 2176, 2.67%↓) 4352 (vs. 4352, 0.00%) 536069 (vs. 536069, 0.00%) 3 (vs. 3, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 58474 (vs. 60827, 3.87%↓) 257480 (vs. 257480, 0.00%) 98583213 (vs. 98583213, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,compile-stats] 61430 (vs. 66838, 8.09%↓) 257480 (vs. 257480, 0.00%) 98583213 (vs. 98583213, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 61625 (vs. 64712, 4.77%↓) 148236 (vs. 148236, 0.00%) 98473965 (vs. 98473965, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,demote-f32-to-f16,compile-stats] 95508 (vs. 101228, 5.65%↓) 3178840 (vs. 3178840, 0.00%) 53160687 (vs. 53160687, 0.00%) 703 (vs. 703, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 187110 (vs. 187860, 0.40%↓) 7145960 (vs. 7145960, 0.00%) 33672202 (vs. 33672202, 0.00%) 1053 (vs. 1053, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 60687 (vs. 63261, 4.07%↓) 148220 (vs. 148220, 0.00%) 98475757 (vs. 98475757, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16,compile-stats] 72638 (vs. 74760, 2.84%↓) 3180856 (vs. 3180856, 0.00%) 53169839 (vs. 53169839, 0.00%) 703 (vs. 703, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 187338 (vs. 193401, 3.13%↓) 7144540 (vs. 7144540, 0.00%) 33657930 (vs. 33657930, 0.00%) 1053 (vs. 1053, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 102031 (vs. 107405, 5.00%↓) 148220 (vs. 148220, 0.00%) 99696621 (vs. 99696621, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16,compile-stats] 119131 (vs. 127106, 6.27%↓) 3180856 (vs. 3180856, 0.00%) 54433903 (vs. 54433903, 0.00%) 703 (vs. 703, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 332221 (vs. 336922, 1.40%↓) 7144540 (vs. 7144540, 0.00%) 35551242 (vs. 35551242, 0.00%) 1053 (vs. 1053, 0.00%)
MobileNetV2_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 18654 (vs. 17778, 4.93%↑) 197045 (vs. 197045, 0.00%) 14192638 (vs. 14192638, 0.00%) 171 (vs. 171, 0.00%)
MobileNetV3Small_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 25159 (vs. 25725, 2.20%↓) 288181 (vs. 288181, 0.00%) 10519166 (vs. 10519166, 0.00%) 208 (vs. 208, 0.00%)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment