Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save iree-github-actions-bot/7f1434247f5cf405ac5c7642606c37c1 to your computer and use it in GitHub Desktop.
Save iree-github-actions-bot/7f1434247f5cf405ac5c7642606c37c1 to your computer and use it in GitHub Desktop.

Full Benchmark Summary

Data-Tiling Comparison Table

Name No-DT (baseline) DT-Only DT-UK
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 783.326 (1.0X) N/A 224.143 (3.5X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7.011 (1.0X) N/A 8.555 (0.8X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 36.049 (1.0X) N/A 34.330 (1.1X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.848 (1.0X) N/A 5.051 (1.2X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.242 (1.0X) N/A 8.511 (1.1X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.103 (1.0X) N/A 8.954 (1.2X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.931 (1.0X) N/A 13.949 (0.9X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.026 (1.0X) N/A 61.019 (0.6X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.937 (1.0X) N/A 61.367 (0.6X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 68.973 (1.0X) N/A 64.331 (1.1X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.894 (1.0X) N/A 4.593 (1.1X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.751 (1.0X) N/A 4.943 (0.8X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.915 (1.0X) N/A 5.455 (1.1X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.849 (1.0X) N/A 2.823 (1.0X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.468 (1.0X) N/A 9.929 (0.9X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.788 (1.0X) N/A 0.657 (1.2X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.203 (1.0X) N/A 5.346 (0.8X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.554 (1.0X) N/A 7.594 (1.0X)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 6.609 (1.0X) N/A 1.805 (3.7X)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 230.708 (1.0X) N/A 107.258 (2.2X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.397 (1.0X) N/A 29.975 (1.1X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 276.133 (1.0X) N/A 229.388 (1.2X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 27.015 (1.0X) N/A 13.053 (2.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 69.601 (1.0X) N/A 38.601 (1.8X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 87.706 (1.0X) N/A 40.520 (2.2X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.567 (1.0X) N/A 58.683 (1.4X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.083 (1.0X) N/A 186.013 (1.0X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 183.108 (1.0X) N/A 190.917 (1.0X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 520.347 (1.0X) N/A 243.969 (2.1X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 25.241 (1.0X) N/A 17.862 (1.4X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.867 (1.0X) N/A 11.520 (1.0X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.560 (1.0X) N/A 11.823 (1.8X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.951 (1.0X) N/A 2.705 (1.1X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 35.232 (1.0X) N/A 31.333 (1.1X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.715 (1.0X) N/A 0.582 (1.2X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 18.100 (1.0X) N/A 19.907 (0.9X)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.055 (1.0X) N/A 0.055 (1.0X)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.042 (1.0X) N/A 0.022 (2.0X)

Similar Latencies

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 783.326 (vs. 739.843, 5.88%↑) 765.101 50.537
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.951 (vs. 2.828, 4.36%↑) 2.949 0.021
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.954 (vs. 9.162, 2.27%↓) 8.904 0.099
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.931 (vs. 12.207, 2.26%↓) 11.778 0.291
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.867 (vs. 12.105, 1.96%↓) 11.883 0.091
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.705 (vs. 2.756, 1.87%↓) 2.701 0.029
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.894 (vs. 4.987, 1.86%↓) 4.888 0.042
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.751 (vs. 3.819, 1.80%↓) 3.746 0.026
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 29.975 (vs. 30.463, 1.60%↓) 29.978 0.102
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.520 (vs. 11.345, 1.54%↑) 11.528 0.105
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.849 (vs. 2.893, 1.52%↓) 2.849 0.019
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 31.333 (vs. 30.872, 1.49%↑) 31.199 0.305
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 13.053 (vs. 13.235, 1.37%↓) 13.036 0.035
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 40.520 (vs. 41.028, 1.24%↓) 39.835 1.785
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 19.907 (vs. 19.671, 1.20%↑) 19.920 0.091
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 183.108 (vs. 181.310, 0.99%↑) 181.980 2.884
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.346 (vs. 5.295, 0.96%↑) 5.347 0.015
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 107.258 (vs. 108.293, 0.96%↓) 107.148 1.168
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7741.684 (vs. 7815.257, 0.94%↓) 7740.872 2.828
matmul\_1x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.055 (vs. 0.055, 0.88%↑) 0.055 0.000
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.468 (vs. 8.542, 0.87%↓) 8.444 0.052
MobileNetV3Small\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 1772.294 (vs. 1757.234, 0.86%↑) 1771.414 14.820
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 35.232 (vs. 35.518, 0.81%↓) 35.258 0.437
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.823 (vs. 2.844, 0.74%↓) 2.827 0.024
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.593 (vs. 4.627, 0.74%↓) 4.593 0.017
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7.011 (vs. 7.058, 0.67%↓) 7.004 0.022
matmul\_1x256x2048\_i8\_i8\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.042 (vs. 0.043, 0.66%↓) 0.043 0.000
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.455 (vs. 5.421, 0.62%↑) 5.458 0.016
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.242 (vs. 9.299, 0.61%↓) 9.015 0.454
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.848 (vs. 5.883, 0.60%↓) 5.843 0.011
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.594 (vs. 7.548, 0.60%↑) 7.586 0.042
matmul\_1x256x2048\_i8\_i8\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.022 (vs. 0.022, 0.59%↑) 0.022 0.000
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.083 (vs. 179.035, 0.59%↑) 178.778 2.782
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 61.019 (vs. 61.360, 0.55%↓) 60.694 0.538
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 64.331 (vs. 64.635, 0.47%↓) 64.207 0.348
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.203 (vs. 4.222, 0.45%↓) 4.210 0.021
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.582 (vs. 0.579, 0.43%↑) 0.582 0.001
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 38.601 (vs. 38.765, 0.42%↓) 37.974 1.501
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 18.100 (vs. 18.024, 0.42%↑) 18.084 0.162
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.937 (vs. 34.074, 0.40%↓) 33.665 0.713
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.560 (vs. 21.644, 0.39%↓) 21.579 0.160
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 229.388 (vs. 228.602, 0.34%↑) 227.762 4.759
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.567 (vs. 79.303, 0.33%↑) 79.078 0.961
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.715 (vs. 0.717, 0.33%↓) 0.715 0.001
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.862 (vs. 17.914, 0.29%↓) 17.822 0.168
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.026 (vs. 34.121, 0.28%↓) 33.681 0.784
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 520.347 (vs. 518.947, 0.27%↑) 519.162 2.485
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.103 (vs. 11.075, 0.26%↑) 10.739 0.635
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.943 (vs. 4.930, 0.26%↑) 4.940 0.019
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.397 (vs. 32.317, 0.25%↑) 32.355 0.154
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 224.143 (vs. 224.562, 0.19%↓) 224.159 1.856
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.915 (vs. 5.926, 0.18%↓) 5.910 0.018
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 186.013 (vs. 185.693, 0.17%↑) 184.633 2.990
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 13.949 (vs. 13.925, 0.17%↑) 13.858 0.262
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 243.969 (vs. 243.558, 0.17%↑) 243.380 1.384
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.657 (vs. 0.656, 0.17%↑) 0.657 0.003
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.554 (vs. 7.567, 0.16%↓) 7.559 0.017
matmul\_256x256x2048\_i8\_i8\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 6.609 (vs. 6.619, 0.14%↓) 6.614 0.015
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 61.367 (vs. 61.450, 0.13%↓) 61.046 0.609
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.823 (vs. 11.838, 0.13%↓) 11.804 0.047
matmul\_256x256x2048\_i8\_i8\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 1.805 (vs. 1.808, 0.13%↓) 1.804 0.006
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.511 (vs. 8.521, 0.12%↓) 8.445 0.123
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.330 (vs. 34.290, 0.12%↑) 34.123 0.347
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 27.015 (vs. 26.984, 0.12%↑) 26.978 0.088
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.788 (vs. 0.787, 0.11%↑) 0.788 0.001
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.929 (vs. 9.917, 0.11%↑) 9.916 0.046
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 68.973 (vs. 69.050, 0.11%↓) 68.699 0.573
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 190.917 (vs. 191.124, 0.11%↓) 189.061 3.228
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 25.241 (vs. 25.214, 0.11%↑) 25.220 0.175
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 36.049 (vs. 36.085, 0.10%↓) 35.425 1.076
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 58.683 (vs. 58.642, 0.07%↑) 58.255 1.082
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.555 (vs. 8.561, 0.07%↓) 8.544 0.035
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.051 (vs. 5.054, 0.06%↓) 5.050 0.009
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 276.133 (vs. 275.972, 0.06%↑) 275.215 3.247
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 230.708 (vs. 230.821, 0.05%↓) 227.682 14.227
matmul\_1x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.055 (vs. 0.055, 0.04%↑) 0.055 0.000
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 87.706 (vs. 87.734, 0.03%↓) 87.904 3.616
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 69.601 (vs. 69.595, 0.01%↑) 69.901 2.759

All Compilation Metrics

Benchmark Name Compilation Time (ms) Total Dispatch Size (bytes) Total Artifact Size (bytes) Stream IR Dispatch Count (# of cmd.dispatch ops)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1261 (vs. 1011, 24.73%↑) 4400 (vs. 4400, 0.00%) 273657 (vs. 273657, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1860 (vs. 1883, 1.22%↓) 9680 (vs. 9680, 0.00%) 278905 (vs. 278905, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1091 (vs. 1023, 6.65%↑) 3328 (vs. 3328, 0.00%) 534713 (vs. 534713, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1712 (vs. 1573, 8.84%↑) 8480 (vs. 8480, 0.00%) 539833 (vs. 539833, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 27366 (vs. 23485, 16.53%↑) 144544 (vs. 144544, 0.00%) 399493 (vs. 399493, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 42153 (vs. 42157, 0.01%↓) 238656 (vs. 238656, 0.00%) 10455045 (vs. 10455045, 0.00%) 97 (vs. 97, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 38353 (vs. 34602, 10.84%↑) 177696 (vs. 177696, 0.00%) 2957509 (vs. 2957509, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 67908 (vs. 61229, 10.91%↑) 682752 (vs. 682752, 0.00%) 5603845 (vs. 5603845, 0.00%) 89 (vs. 89, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 21202 (vs. 24709, 14.19%↓) 175008 (vs. 175008, 0.00%) 17092293 (vs. 17092293, 0.00%) 51 (vs. 51, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 36618 (vs. 28383, 29.01%↑) 190512 (vs. 190512, 0.00%) 14172293 (vs. 14172293, 0.00%) 74 (vs. 74, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 80050 (vs. 66505, 20.37%↑) 568880 (vs. 568880, 0.00%) 4216837 (vs. 4216837, 0.00%) 144 (vs. 144, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 53742 (vs. 47234, 13.78%↑) 287728 (vs. 287728, 0.00%) 18226245 (vs. 18226245, 0.00%) 124 (vs. 124, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 23468 (vs. 21165, 10.88%↑) 142464 (vs. 142464, 0.00%) 5195333 (vs. 5195333, 0.00%) 48 (vs. 48, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 71030 (vs. 64864, 9.51%↑) 91888 (vs. 91888, 0.00%) 99892293 (vs. 99892293, 0.00%) 703 (vs. 703, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 68895 (vs. 68177, 1.05%↑) 100448 (vs. 100448, 0.00%) 98413445 (vs. 98413445, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 338114 (vs. 318786, 6.06%↑) 6817872 (vs. 6817872, 0.00%) 33068869 (vs. 33068869, 0.00%) 1053 (vs. 1053, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 109177 (vs. 100583, 8.54%↑) 216688 (vs. 216688, 0.00%) 164493804 (vs. 164493804, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 36186 (vs. 30511, 18.60%↑) 67120 (vs. 67120, 0.00%) 133993839 (vs. 133993839, 0.00%) 185 (vs. 185, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 34854 (vs. 37144, 6.17%↓) 27504 (vs. 27504, 0.00%) 652742996 (vs. 652742996, 0.00%) 221 (vs. 221, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 38573 (vs. 31113, 23.98%↑) 15008 (vs. 15008, 0.00%) 652726804 (vs. 652726804, 0.00%) 246 (vs. 246, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 42848 (vs. 38642, 10.88%↑) 76768 (vs. 76768, 0.00%) 533839615 (vs. 533839615, 0.00%) 188 (vs. 188, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 38757 (vs. 33286, 16.44%↑) 58176 (vs. 58176, 0.00%) 1336009791 (vs. 1336009791, 0.00%) 365 (vs. 365, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 1106 (vs. 948, 16.67%↑) 4400 (vs. 4400, 0.00%) 273657 (vs. 273657, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 2020 (vs. 1729, 16.83%↑) 9680 (vs. 9680, 0.00%) 278969 (vs. 278969, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 2424 (vs. 2039, 18.88%↑) 2976 (vs. 2976, 0.00%) 534329 (vs. 534329, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 2647 (vs. 2552, 3.72%↑) 6432 (vs. 6432, 0.00%) 538501 (vs. 538501, 0.00%) 3 (vs. 3, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 27863 (vs. 23727, 17.43%↑) 106672 (vs. 106672, 0.00%) 368069 (vs. 368069, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 36708 (vs. 32212, 13.96%↑) 96992 (vs. 96992, 0.00%) 10394437 (vs. 10394437, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 35951 (vs. 30595, 17.51%↑) 114416 (vs. 114416, 0.00%) 2917637 (vs. 2917637, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 60160 (vs. 47888, 25.63%↑) 269312 (vs. 269312, 0.00%) 5215301 (vs. 5215301, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 16592 (vs. 18714, 11.34%↓) 60352 (vs. 60352, 0.00%) 17014213 (vs. 17014213, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 31840 (vs. 28852, 10.36%↑) 95328 (vs. 95328, 0.00%) 14129477 (vs. 14129477, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 62287 (vs. 53531, 16.36%↑) 329856 (vs. 329856, 0.00%) 3999301 (vs. 3999301, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 45330 (vs. 42914, 5.63%↑) 135552 (vs. 135552, 0.00%) 18353669 (vs. 18353669, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 17281 (vs. 14135, 22.26%↑) 44256 (vs. 44256, 0.00%) 5146821 (vs. 5146821, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 95334 (vs. 83482, 14.20%↑) 48496 (vs. 48496, 0.00%) 99982021 (vs. 99982021, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 89434 (vs. 79521, 12.47%↑) 48880 (vs. 48880, 0.00%) 98493893 (vs. 98493893, 0.00%) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 293889 (vs. 268876, 9.30%↑) 3422160 (vs. 3422160, 0.00%) 29827141 (vs. 29827141, 0.00%) 2136 (vs. 2136, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 119029 (vs. 108708, 9.49%↑) 155328 (vs. 155328, 0.00%) 169900268 (vs. 169900268, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 39003 (vs. 36575, 6.64%↑) 36576 (vs. 36576, 0.00%) 219461551 (vs. 219461551, 0.00%) 342 (vs. 342, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 51571 (vs. 45517, 13.30%↑) 17344 (vs. 17344, 0.00%) 992526804 (vs. 992526804, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 41606 (vs. 38660, 7.62%↑) 10272 (vs. 10272, 0.00%) 992522068 (vs. 992522068, 0.00%) 355 (vs. 355, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 47646 (vs. 47882, 0.49%↓) 38224 (vs. 38224, 0.00%) 875850367 (vs. 875850367, 0.00%) 346 (vs. 346, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 59724 (vs. 52122, 14.59%↑) 41568 (vs. 41568, 0.00%) 1336020671 (vs. 1336020671, 0.00%) 678 (vs. 678, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 1119 (vs. 986, 13.49%↑) 4400 (vs. 4400, 0.00%) 273657 (vs. 273657, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 1995 (vs. 1802, 10.71%↑) 9680 (vs. 9680, 0.00%) 278905 (vs. 278905, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 2821 (vs. 2635, 7.06%↑) 2464 (vs. 2464, 0.00%) 533817 (vs. 533817, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 2989 (vs. 2758, 8.38%↑) 4800 (vs. 4800, 0.00%) 536837 (vs. 536837, 0.00%) 3 (vs. 3, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 29195 (vs. 22658, 28.85%↑) 100464 (vs. 100464, 0.00%) 361861 (vs. 361861, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 38326 (vs. 35592, 7.68%↑) 99152 (vs. 99152, 0.00%) 10396549 (vs. 10396549, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 34716 (vs. 33439, 3.82%↑) 119792 (vs. 119792, 0.00%) 2923013 (vs. 2923013, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 50410 (vs. 50155, 0.51%↑) 260256 (vs. 260256, 0.00%) 5206277 (vs. 5206277, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 25085 (vs. 20798, 20.61%↑) 66256 (vs. 66256, 0.00%) 17020101 (vs. 17020101, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 34946 (vs. 32265, 8.31%↑) 100048 (vs. 100048, 0.00%) 14134149 (vs. 14134149, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 65435 (vs. 59274, 10.39%↑) 325280 (vs. 325280, 0.00%) 3994693 (vs. 3994693, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 50791 (vs. 45048, 12.75%↑) 123312 (vs. 123312, 0.00%) 18341381 (vs. 18341381, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 19247 (vs. 16551, 16.29%↑) 47808 (vs. 47808, 0.00%) 5150341 (vs. 5150341, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 90687 (vs. 77651, 16.79%↑) 39616 (vs. 39616, 0.00%) 99973125 (vs. 99973125, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 93561 (vs. 79108, 18.27%↑) 40016 (vs. 40016, 0.00%) 98484933 (vs. 98484933, 0.00%) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 278015 (vs. 254426, 9.27%↑) 3414288 (vs. 3414288, 0.00%) 29819269 (vs. 29819269, 0.00%) 2136 (vs. 2136, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 128388 (vs. 110414, 16.28%↑) 141760 (vs. 141760, 0.00%) 169886636 (vs. 169886636, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 40564 (vs. 40473, 0.22%↑) 33600 (vs. 33600, 0.00%) 219458607 (vs. 219458607, 0.00%) 342 (vs. 342, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 46280 (vs. 43922, 5.37%↑) 18784 (vs. 18784, 0.00%) 992528276 (vs. 992528276, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 51205 (vs. 49524, 3.39%↑) 11312 (vs. 11312, 0.00%) 992523092 (vs. 992523092, 0.00%) 355 (vs. 355, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 47479 (vs. 48822, 2.75%↓) 33968 (vs. 33968, 0.00%) 875846143 (vs. 875846143, 0.00%) 346 (vs. 346, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 66434 (vs. 53220, 24.83%↑) 38112 (vs. 38112, 0.00%) 1336017151 (vs. 1336017151, 0.00%) 678 (vs. 678, 0.00%)
EfficientNetV2STF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 109815 (vs. 92837, 18.29%↑) 841528 (vs. 841528, 0.00%) 165164692 (vs. 165164692, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 34415 (vs. 29094, 18.29%↑) 184528 (vs. 184528, 0.00%) 134119041 (vs. 134119041, 0.00%) 185 (vs. 185, 0.00%)
BertForMaskedLMTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 37760 (vs. 32366, 16.67%↑) 264548 (vs. 264548, 0.00%) 534033972 (vs. 534033972, 0.00%) 188 (vs. 188, 0.00%)
BertLargeTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 33929 (vs. 34789, 2.47%↓) 168052 (vs. 168052, 0.00%) 1336126133 (vs. 1336126133, 0.00%) 365 (vs. 365, 0.00%)
matmul_3456x1024x2048_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1894 (vs. 1747, 8.41%↑) 30384 (vs. 30384, 0.00%) 42579 (vs. 42579, 0.00%) 1 (vs. 1, 0.00%)
matmul_3456x1024x2048_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2222 (vs. 2002, 10.99%↑) 44912 (vs. 44912, 0.00%) 57107 (vs. 57107, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1809 (vs. 1743, 3.79%↑) 28344 (vs. 28344, 0.00%) 40475 (vs. 40475, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2302 (vs. 2131, 8.02%↑) 41284 (vs. 41284, 0.00%) 53415 (vs. 53415, 0.00%) 1 (vs. 1, 0.00%)
matmul_2564x2564x2564_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2660 (vs. 2193, 21.30%↑) 85956 (vs. 85956, 0.00%) 98087 (vs. 98087, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2564x2562_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2714 (vs. 2377, 14.18%↑) 88832 (vs. 88832, 0.00%) 101027 (vs. 101027, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 3198 (vs. 2998, 6.67%↑) 84036 (vs. 84036, 0.00%) 96231 (vs. 96231, 0.00%) 1 (vs. 1, 0.00%)
matmul_123x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2570 (vs. 2538, 1.26%↑) 51016 (vs. 51016, 0.00%) 63210 (vs. 63210, 0.00%) 1 (vs. 1, 0.00%)
matmul_128x256x8192_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 1065 (vs. 889, 19.80%↑) 9004 (vs. 9004, 0.00%) 28085 (vs. 28085, 0.00%) 2 (vs. 2, 0.00%)
matmul_128x256x8192_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 1116 (vs. 939, 18.85%↑) 9748 (vs. 9748, 0.00%) 28833 (vs. 28833, 0.00%) 2 (vs. 2, 0.00%)
MiniLML12H384Uncased(stablehlo) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 36782 (vs. 34407, 6.90%↑) 55792 (vs. 55792, 0.00%) 133982193 (vs. 133982193, 0.00%) 185 (vs. 185, 0.00%)
DeepLabV3_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 22653 (vs. 20581, 10.07%↑) 42768 (vs. 42768, 0.00%) 2822215 (vs. 2822215, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 44069 (vs. 46014, 4.23%↓) 188816 (vs. 188816, 0.00%) 5109575 (vs. 5109575, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 78877 (vs. 69112, 14.13%↑) 50432 (vs. 50432, 0.00%) 98363079 (vs. 98363079, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 379681 (vs. 359835, 5.52%↑) 3537584 (vs. 3537584, 0.00%) 29788295 (vs. 29788295, 0.00%) 1053 (vs. 1053, 0.00%)
MobileNetV1_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 18483 (vs. 16839, 9.76%↑) 51488 (vs. 51488, 0.00%) 16971207 (vs. 16971207, 0.00%) 65 (vs. 65, 0.00%)
MobileNetV2_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 50965 (vs. 52213, 2.39%↓) 217328 (vs. 217328, 0.00%) 3864967 (vs. 3864967, 0.00%) 144 (vs. 144, 0.00%)
PersonDetect_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 20642 (vs. 17857, 15.60%↑) 59520 (vs. 59520, 0.00%) 314183 (vs. 314183, 0.00%) 60 (vs. 60, 0.00%)
EfficientNet_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 53230 (vs. 49585, 7.35%↑) 393668 (vs. 393668, 0.00%) 5314439 (vs. 5314439, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 433241 (vs. 420529, 3.02%↑) 3825444 (vs. 3825444, 0.00%) 30076167 (vs. 30076167, 0.00%) 1053 (vs. 1053, 0.00%)
PersonDetect_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 23612 (vs. 21133, 11.73%↑) 125236 (vs. 125236, 0.00%) 379847 (vs. 379847, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV2_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 60578 (vs. 55110, 9.92%↑) 330176 (vs. 330176, 0.00%) 3977799 (vs. 3977799, 0.00%) 144 (vs. 144, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 21580 (vs. 20712, 4.19%↑) 56384 (vs. 56384, 0.00%) 2835845 (vs. 2835845, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 77183 (vs. 62923, 22.66%↑) 33856 (vs. 33856, 0.00%) 98346437 (vs. 98346437, 0.00%) 679 (vs. 679, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 44652 (vs. 35290, 26.53%↑) 20208 (vs. 20208, 0.00%) 652735380 (vs. 652735380, 0.00%) 221 (vs. 221, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 30112 (vs. 31827, 5.39%↓) 8992 (vs. 8992, 0.00%) 652720468 (vs. 652720468, 0.00%) 246 (vs. 246, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 314711 (vs. 290027, 8.51%↑) 4939360 (vs. 4939360, 0.00%) 31190085 (vs. 31190085, 0.00%) 1053 (vs. 1053, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 106632 (vs. 101422, 5.14%↑) 819952 (vs. 819952, 0.00%) 88844933 (vs. 88844933, 0.00%) 255 (vs. 255, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 27591 (vs. 29064, 5.07%↓) 50848 (vs. 50848, 0.00%) 2844741 (vs. 2844741, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 94153 (vs. 79676, 18.17%↑) 21840 (vs. 21840, 0.00%) 98466437 (vs. 98466437, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 53184 (vs. 41640, 27.72%↑) 11536 (vs. 11536, 0.00%) 992496148 (vs. 992496148, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 54454 (vs. 44182, 23.25%↑) 9136 (vs. 9136, 0.00%) 992496020 (vs. 992496020, 0.00%) 355 (vs. 355, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 250915 (vs. 233393, 7.51%↑) 1800576 (vs. 1800576, 0.00%) 28205189 (vs. 28205189, 0.00%) 2136 (vs. 2136, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 82585 (vs. 81681, 1.11%↑) 121824 (vs. 121824, 0.00%) 88135429 (vs. 88135429, 0.00%) 375 (vs. 375, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 27314 (vs. 23452, 16.47%↑) 41008 (vs. 41008, 0.00%) 2834885 (vs. 2834885, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 83809 (vs. 69813, 20.05%↑) 22064 (vs. 22064, 0.00%) 98466693 (vs. 98466693, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 52692 (vs. 41427, 27.19%↑) 10736 (vs. 10736, 0.00%) 992495316 (vs. 992495316, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 41225 (vs. 36482, 13.00%↑) 8672 (vs. 8672, 0.00%) 992495572 (vs. 992495572, 0.00%) 355 (vs. 355, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 244225 (vs. 234491, 4.15%↑) 1808112 (vs. 1808112, 0.00%) 28212741 (vs. 28212741, 0.00%) 2136 (vs. 2136, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 79973 (vs. 69765, 14.63%↑) 124384 (vs. 124384, 0.00%) 88137989 (vs. 88137989, 0.00%) 375 (vs. 375, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 1011 (vs. 867, 16.61%↑) 2464 (vs. 2464, 0.00%) 271353 (vs. 271353, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 1151 (vs. 1127, 2.13%↑) 3872 (vs. 3872, 0.00%) 272761 (vs. 272761, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 1000 (vs. 893, 11.98%↑) 2368 (vs. 2368, 0.00%) 533433 (vs. 533433, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 1069 (vs. 938, 13.97%↑) 3088 (vs. 3088, 0.00%) 534137 (vs. 534137, 0.00%) 1 (vs. 1, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 2184 (vs. 2002, 9.09%↑) 3504 (vs. 3504, 0.00%) 273093 (vs. 273093, 0.00%) 2 (vs. 2, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 2469 (vs. 2347, 5.20%↑) 4528 (vs. 4528, 0.00%) 274693 (vs. 274693, 0.00%) 4 (vs. 4, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 2745 (vs. 2387, 15.00%↑) 2368 (vs. 2368, 0.00%) 533433 (vs. 533433, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 2901 (vs. 2746, 5.64%↑) 3360 (vs. 3360, 0.00%) 535109 (vs. 535109, 0.00%) 3 (vs. 3, 0.00%)
matmul_1x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 1515 (vs. 1276, 18.73%↑) 4144 (vs. 4144, 0.00%) 273733 (vs. 273733, 0.00%) 2 (vs. 2, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 2106 (vs. 1981, 6.31%↑) 6496 (vs. 6496, 0.00%) 276677 (vs. 276677, 0.00%) 4 (vs. 4, 0.00%)
matmul_1x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 2346 (vs. 2071, 13.28%↑) 2640 (vs. 2640, 0.00%) 533689 (vs. 533689, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i8_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 2386 (vs. 2321, 2.80%↑) 4352 (vs. 4352, 0.00%) 536069 (vs. 536069, 0.00%) 3 (vs. 3, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 71198 (vs. 58233, 22.26%↑) 257480 (vs. 257480, 0.00%) 98583085 (vs. 98583085, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,compile-stats] 71481 (vs. 66561, 7.39%↑) 257480 (vs. 257480, 0.00%) 98583085 (vs. 98583085, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 66195 (vs. 66610, 0.62%↓) 148236 (vs. 148236, 0.00%) 98473901 (vs. 98473901, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,demote-f32-to-f16,compile-stats] 119011 (vs. 103727, 14.73%↑) 3178840 (vs. 3178840, 0.00%) 53160559 (vs. 53160559, 0.00%) 703 (vs. 703, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 208759 (vs. 193425, 7.93%↑) 7145960 (vs. 7145960, 0.00%) 33672074 (vs. 33672074, 0.00%) 1053 (vs. 1053, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 74800 (vs. 61357, 21.91%↑) 148220 (vs. 148220, 0.00%) 98475629 (vs. 98475629, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16,compile-stats] 83758 (vs. 72868, 14.94%↑) 3180856 (vs. 3180856, 0.00%) 53169775 (vs. 53169775, 0.00%) 703 (vs. 703, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 210188 (vs. 195242, 7.66%↑) 7144540 (vs. 7144540, 0.00%) 33657802 (vs. 33657802, 0.00%) 1053 (vs. 1053, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 121387 (vs. 106664, 13.80%↑) 148220 (vs. 148220, 0.00%) 99696557 (vs. 99696557, 0.00%) 679 (vs. 679, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16,compile-stats] 140841 (vs. 131158, 7.38%↑) 3180856 (vs. 3180856, 0.00%) 54433775 (vs. 54433775, 0.00%) 703 (vs. 703, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 359468 (vs. 332755, 8.03%↑) 7144540 (vs. 7144540, 0.00%) 35551114 (vs. 35551114, 0.00%) 1053 (vs. 1053, 0.00%)
MobileNetV2_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 20488 (vs. 20245, 1.20%↑) 189941 (vs. 189941, 0.00%) 14185150 (vs. 14185150, 0.00%) 171 (vs. 171, 0.00%)
MobileNetV3Small_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 31350 (vs. 29413, 6.59%↑) 277813 (vs. 277813, 0.00%) 10508286 (vs. 10508286, 0.00%) 208 (vs. 208, 0.00%)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment