Skip to content

Instantly share code, notes, and snippets.

@zamazan4ik
Created September 21, 2024 22:15
Show Gist options
  • Save zamazan4ik/663751b2d5336b2c54b60a9b1d470cae to your computer and use it in GitHub Desktop.
Save zamazan4ik/663751b2d5336b2c54b60a9b1d470cae to your computer and use it in GitHub Desktop.
raptorq: PGO optimized compared to Release
Running benches/codec_benchmark.rs (target/x86_64-unknown-linux-gnu/release/deps/codec_benchmark-14169a542b3a3089)
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, eithe
Benchmarking Symbol mulassign_scalar()/
Benchmarking Symbol mulassign_scalar()/: Warming up for 3.0000 s
Benchmarking Symbol mulassign_scalar()/: Collecting 100 samples in estimated 5.0001 s (197M iterations)
Benchmarking Symbol mulassign_scalar()/: Analyzing
Symbol mulassign_scalar()/
time: [25.532 ns 25.695 ns 25.868 ns]
thrpt: [18.434 GiB/s 18.558 GiB/s 18.676 GiB/s]
change:
time: [+4.2815% +4.8353% +5.4044%] (p = 0.00 < 0.05)
thrpt: [-5.1273% -4.6122% -4.1057%]
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
Benchmarking Symbol +=/
Benchmarking Symbol +=/: Warming up for 3.0000 s
Benchmarking Symbol +=/: Collecting 100 samples in estimated 5.0000 s (212M iterations)
Benchmarking Symbol +=/: Analyzing
Symbol +=/ time: [23.546 ns 23.575 ns 23.603 ns]
thrpt: [20.202 GiB/s 20.227 GiB/s 20.251 GiB/s]
change:
time: [-1.2118% -0.8179% -0.5025%] (p = 0.00 < 0.05)
thrpt: [+0.5051% +0.8246% +1.2266%]
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
Benchmarking Symbol FMA/
Benchmarking Symbol FMA/: Warming up for 3.0000 s
Benchmarking Symbol FMA/: Collecting 100 samples in estimated 5.0000 s (183M iterations)
Benchmarking Symbol FMA/: Analyzing
Symbol FMA/ time: [27.313 ns 27.321 ns 27.330 ns]
thrpt: [17.448 GiB/s 17.453 GiB/s 17.458 GiB/s]
change:
time: [+1.4084% +1.5004% +1.5770%] (p = 0.00 < 0.05)
thrpt: [-1.5525% -1.4782% -1.3889%]
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high severe
Benchmarking encode 10KB/
Benchmarking encode 10KB/: Warming up for 3.0000 s
Benchmarking encode 10KB/: Collecting 100 samples in estimated 5.1749 s (131k iterations)
Benchmarking encode 10KB/: Analyzing
encode 10KB/ time: [39.294 µs 39.301 µs 39.308 µs]
thrpt: [248.44 MiB/s 248.49 MiB/s 248.53 MiB/s]
change:
time: [-7.3076% -7.1243% -6.9828%] (p = 0.00 < 0.05)
thrpt: [+7.5070% +7.6707% +7.8837%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
Benchmarking roundtrip 10KB/
Benchmarking roundtrip 10KB/: Warming up for 3.0000 s
Benchmarking roundtrip 10KB/: Collecting 100 samples in estimated 5.1495 s (126k iterations)
Benchmarking roundtrip 10KB/: Analyzing
roundtrip 10KB/ time: [40.865 µs 40.874 µs 40.883 µs]
thrpt: [238.87 MiB/s 238.92 MiB/s 238.97 MiB/s]
change:
time: [-7.3409% -7.2531% -7.1825%] (p = 0.00 < 0.05)
thrpt: [+7.7383% +7.8203% +7.9224%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
Benchmarking roundtrip repair 10KB/
Benchmarking roundtrip repair 10KB/: Warming up for 3.0000 s
Benchmarking roundtrip repair 10KB/: Collecting 100 samples in estimated 5.0356 s (56k iterations)
Benchmarking roundtrip repair 10KB/: Analyzing
roundtrip repair 10KB/ time: [90.860 µs 90.969 µs 91.179 µs]
thrpt: [107.10 MiB/s 107.35 MiB/s 107.48 MiB/s]
change:
time: [-6.8234% -6.6312% -6.4014%] (p = 0.00 < 0.05)
thrpt: [+6.8392% +7.1022% +7.3230%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
Running benches/decode_benchmark.rs (target/x86_64-unknown-linux-gnu/release/deps/decode_benchmark-e20c3ef0e020a1ac)
Symbol size: 1280 bytes
symbol count = 10, decoded 127 MB in 0.440secs using 0.0% overhead, throughput: 2327.1Mbit/s
symbol count = 100, decoded 127 MB in 0.339secs using 0.0% overhead, throughput: 3019.0Mbit/s
symbol count = 250, decoded 127 MB in 0.319secs using 0.0% overhead, throughput: 3206.7Mbit/s
symbol count = 500, decoded 127 MB in 0.306secs using 0.0% overhead, throughput: 3335.0Mbit/s
symbol count = 1000, decoded 126 MB in 0.317secs using 0.0% overhead, throughput: 3203.9Mbit/s
symbol count = 2000, decoded 126 MB in 0.340secs using 0.0% overhead, throughput: 2987.1Mbit/s
symbol count = 5000, decoded 122 MB in 0.366secs using 0.0% overhead, throughput: 2668.2Mbit/s
symbol count = 10000, decoded 122 MB in 0.432secs using 0.0% overhead, throughput: 2260.6Mbit/s
symbol count = 20000, decoded 122 MB in 0.590secs using 0.0% overhead, throughput: 1655.2Mbit/s
symbol count = 50000, decoded 122 MB in 0.838secs using 0.0% overhead, throughput: 1165.3Mbit/s
symbol count = 10, decoded 127 MB in 0.434secs using 5.0% overhead, throughput: 2359.3Mbit/s
symbol count = 100, decoded 127 MB in 0.343secs using 5.0% overhead, throughput: 2983.8Mbit/s
symbol count = 250, decoded 127 MB in 0.324secs using 5.0% overhead, throughput: 3157.3Mbit/s
symbol count = 500, decoded 127 MB in 0.308secs using 5.0% overhead, throughput: 3313.3Mbit/s
symbol count = 1000, decoded 126 MB in 0.325secs using 5.0% overhead, throughput: 3125.0Mbit/s
symbol count = 2000, decoded 126 MB in 0.337secs using 5.0% overhead, throughput: 3013.7Mbit/s
symbol count = 5000, decoded 122 MB in 0.374secs using 5.0% overhead, throughput: 2611.1Mbit/s
symbol count = 10000, decoded 122 MB in 0.472secs using 5.0% overhead, throughput: 2069.0Mbit/s
symbol count = 20000, decoded 122 MB in 0.634secs using 5.0% overhead, throughput: 1540.3Mbit/s
symbol count = 50000, decoded 122 MB in 1.023secs using 5.0% overhead, throughput: 954.6Mbit/s
Running benches/encode_benchmark.rs (target/x86_64-unknown-linux-gnu/release/deps/encode_benchmark-c98ccc72b678f27c)
Symbol size: 1280 bytes (without pre-built plan)
symbol count = 10, encoded 127 MB in 0.297secs, throughput: 3447.6Mbit/s
symbol count = 100, encoded 127 MB in 0.224secs, throughput: 4568.9Mbit/s
symbol count = 250, encoded 127 MB in 0.252secs, throughput: 4059.3Mbit/s
symbol count = 500, encoded 127 MB in 0.253secs, throughput: 4033.6Mbit/s
symbol count = 1000, encoded 126 MB in 0.260secs, throughput: 3906.2Mbit/s
symbol count = 2000, encoded 126 MB in 0.277secs, throughput: 3666.5Mbit/s
symbol count = 5000, encoded 122 MB in 0.296secs, throughput: 3299.2Mbit/s
symbol count = 10000, encoded 122 MB in 0.367secs, throughput: 2660.9Mbit/s
symbol count = 20000, encoded 122 MB in 0.485secs, throughput: 2013.5Mbit/s
symbol count = 50000, encoded 122 MB in 0.661secs, throughput: 1477.4Mbit/s
Symbol size: 1280 bytes (with pre-built plan)
symbol count = 10, encoded 127 MB in 0.151secs, throughput: 6781.0Mbit/s
symbol count = 100, encoded 127 MB in 0.104secs, throughput: 9840.7Mbit/s
symbol count = 250, encoded 127 MB in 0.114secs, throughput: 8973.2Mbit/s
symbol count = 500, encoded 127 MB in 0.115secs, throughput: 8874.0Mbit/s
symbol count = 1000, encoded 126 MB in 0.119secs, throughput: 8534.7Mbit/s
symbol count = 2000, encoded 126 MB in 0.126secs, throughput: 8060.5Mbit/s
symbol count = 5000, encoded 122 MB in 0.135secs, throughput: 7233.8Mbit/s
symbol count = 10000, encoded 122 MB in 0.164secs, throughput: 5954.6Mbit/s
symbol count = 20000, encoded 122 MB in 0.210secs, throughput: 4650.3Mbit/s
symbol count = 50000, encoded 122 MB in 0.364secs, throughput: 2682.9Mbit/s
Running benches/matrix_sparsity.rs (target/x86_64-unknown-linux-gnu/release/deps/matrix_sparsity-f6b45b7f73e4b7e8)
Row density for 27x27: min=0 max=18 p50=7 p80=18 p90=18 p95=18 p99=18
Original density for 27x27: 231 of 729 (31.687%)
Initial memory usage: 3KB
Optimized decoder mul ops: 269 (26.9 per symbol), add ops: 414 (41.4 per symbol)
By phase mul ops: [129, 140, 0, 0, 0], add ops: [175, 152, 21, 45, 21]
Row density for 128x128: min=0 max=119 p50=5 p80=20 p90=20 p95=119 p99=119
Original density for 128x128: 2077 of 16384 (12.677%)
Initial memory usage: 14KB
Optimized decoder mul ops: 1253 (12.4 per symbol), add ops: 2918 (28.9 per symbol)
By phase mul ops: [1016, 237, 0, 0, 0], add ops: [1585, 356, 350, 277, 350]
Row density for 1071x1071: min=0 max=1062 p50=5 p80=9 p90=29 p95=52 p99=53
Original density for 1071x1071: 20933 of 1147041 (1.825%)
Initial memory usage: 125KB
Optimized decoder mul ops: 10600 (10.6 per symbol), add ops: 30302 (30.2 per symbol)
By phase mul ops: [9937, 663, 0, 0, 0], add ops: [16799, 2435, 4305, 2458, 4305]
Row density for 10269x10269: min=0 max=10251 p50=5 p80=8 p90=15 p95=32 p99=126
Original density for 10269x10269: 214641 of 105452361 (0.204%)
Initial memory usage: 1373KB
Optimized decoder mul ops: 112049 (11.2 per symbol), add ops: 328540 (32.8 per symbol)
By phase mul ops: [109861, 2188, 0, 0, 0], add ops: [178357, 19887, 53259, 23778, 53259]
Row density for 41104x41104: min=0 max=41019 p50=5 p80=8 p90=14 p95=29 p99=177
Original density for 41104x41104: 1025396 of 1689538816 (0.061%)
Initial memory usage: 6305KB
Optimized decoder mul ops: 612014 (15.1 per symbol), add ops: 1554467 (38.5 per symbol)
By phase mul ops: [605623, 6391, 0, 0, 0], add ops: [881919, 87897, 244445, 95761, 244445]
Row density for 57326x57326: min=0 max=57163 p50=5 p80=8 p90=13 p95=27 p99=189
Original density for 57326x57326: 1484776 of 3286270276 (0.045%)
Initial memory usage: 9285KB
Optimized decoder mul ops: 910038 (16.1 per symbol), add ops: 2223234 (39.4 per symbol)
By phase mul ops: [901621, 8417, 0, 0, 0], add ops: [1285203, 137388, 333568, 133507, 333568]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment