Created
September 21, 2024 22:15
-
-
Save zamazan4ik/663751b2d5336b2c54b60a9b1d470cae to your computer and use it in GitHub Desktop.
raptorq: PGO optimized compared to Release
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Running benches/codec_benchmark.rs (target/x86_64-unknown-linux-gnu/release/deps/codec_benchmark-14169a542b3a3089) | |
WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0. | |
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, eithe | |
Benchmarking Symbol mulassign_scalar()/ | |
Benchmarking Symbol mulassign_scalar()/: Warming up for 3.0000 s | |
Benchmarking Symbol mulassign_scalar()/: Collecting 100 samples in estimated 5.0001 s (197M iterations) | |
Benchmarking Symbol mulassign_scalar()/: Analyzing | |
Symbol mulassign_scalar()/ | |
time: [25.532 ns 25.695 ns 25.868 ns] | |
thrpt: [18.434 GiB/s 18.558 GiB/s 18.676 GiB/s] | |
change: | |
time: [+4.2815% +4.8353% +5.4044%] (p = 0.00 < 0.05) | |
thrpt: [-5.1273% -4.6122% -4.1057%] | |
Performance has regressed. | |
Found 3 outliers among 100 measurements (3.00%) | |
3 (3.00%) high mild | |
Benchmarking Symbol +=/ | |
Benchmarking Symbol +=/: Warming up for 3.0000 s | |
Benchmarking Symbol +=/: Collecting 100 samples in estimated 5.0000 s (212M iterations) | |
Benchmarking Symbol +=/: Analyzing | |
Symbol +=/ time: [23.546 ns 23.575 ns 23.603 ns] | |
thrpt: [20.202 GiB/s 20.227 GiB/s 20.251 GiB/s] | |
change: | |
time: [-1.2118% -0.8179% -0.5025%] (p = 0.00 < 0.05) | |
thrpt: [+0.5051% +0.8246% +1.2266%] | |
Change within noise threshold. | |
Found 2 outliers among 100 measurements (2.00%) | |
1 (1.00%) high mild | |
1 (1.00%) high severe | |
Benchmarking Symbol FMA/ | |
Benchmarking Symbol FMA/: Warming up for 3.0000 s | |
Benchmarking Symbol FMA/: Collecting 100 samples in estimated 5.0000 s (183M iterations) | |
Benchmarking Symbol FMA/: Analyzing | |
Symbol FMA/ time: [27.313 ns 27.321 ns 27.330 ns] | |
thrpt: [17.448 GiB/s 17.453 GiB/s 17.458 GiB/s] | |
change: | |
time: [+1.4084% +1.5004% +1.5770%] (p = 0.00 < 0.05) | |
thrpt: [-1.5525% -1.4782% -1.3889%] | |
Performance has regressed. | |
Found 3 outliers among 100 measurements (3.00%) | |
3 (3.00%) high severe | |
Benchmarking encode 10KB/ | |
Benchmarking encode 10KB/: Warming up for 3.0000 s | |
Benchmarking encode 10KB/: Collecting 100 samples in estimated 5.1749 s (131k iterations) | |
Benchmarking encode 10KB/: Analyzing | |
encode 10KB/ time: [39.294 µs 39.301 µs 39.308 µs] | |
thrpt: [248.44 MiB/s 248.49 MiB/s 248.53 MiB/s] | |
change: | |
time: [-7.3076% -7.1243% -6.9828%] (p = 0.00 < 0.05) | |
thrpt: [+7.5070% +7.6707% +7.8837%] | |
Performance has improved. | |
Found 9 outliers among 100 measurements (9.00%) | |
5 (5.00%) high mild | |
4 (4.00%) high severe | |
Benchmarking roundtrip 10KB/ | |
Benchmarking roundtrip 10KB/: Warming up for 3.0000 s | |
Benchmarking roundtrip 10KB/: Collecting 100 samples in estimated 5.1495 s (126k iterations) | |
Benchmarking roundtrip 10KB/: Analyzing | |
roundtrip 10KB/ time: [40.865 µs 40.874 µs 40.883 µs] | |
thrpt: [238.87 MiB/s 238.92 MiB/s 238.97 MiB/s] | |
change: | |
time: [-7.3409% -7.2531% -7.1825%] (p = 0.00 < 0.05) | |
thrpt: [+7.7383% +7.8203% +7.9224%] | |
Performance has improved. | |
Found 7 outliers among 100 measurements (7.00%) | |
3 (3.00%) high mild | |
4 (4.00%) high severe | |
Benchmarking roundtrip repair 10KB/ | |
Benchmarking roundtrip repair 10KB/: Warming up for 3.0000 s | |
Benchmarking roundtrip repair 10KB/: Collecting 100 samples in estimated 5.0356 s (56k iterations) | |
Benchmarking roundtrip repair 10KB/: Analyzing | |
roundtrip repair 10KB/ time: [90.860 µs 90.969 µs 91.179 µs] | |
thrpt: [107.10 MiB/s 107.35 MiB/s 107.48 MiB/s] | |
change: | |
time: [-6.8234% -6.6312% -6.4014%] (p = 0.00 < 0.05) | |
thrpt: [+6.8392% +7.1022% +7.3230%] | |
Performance has improved. | |
Found 6 outliers among 100 measurements (6.00%) | |
2 (2.00%) high mild | |
4 (4.00%) high severe | |
Running benches/decode_benchmark.rs (target/x86_64-unknown-linux-gnu/release/deps/decode_benchmark-e20c3ef0e020a1ac) | |
Symbol size: 1280 bytes | |
symbol count = 10, decoded 127 MB in 0.440secs using 0.0% overhead, throughput: 2327.1Mbit/s | |
symbol count = 100, decoded 127 MB in 0.339secs using 0.0% overhead, throughput: 3019.0Mbit/s | |
symbol count = 250, decoded 127 MB in 0.319secs using 0.0% overhead, throughput: 3206.7Mbit/s | |
symbol count = 500, decoded 127 MB in 0.306secs using 0.0% overhead, throughput: 3335.0Mbit/s | |
symbol count = 1000, decoded 126 MB in 0.317secs using 0.0% overhead, throughput: 3203.9Mbit/s | |
symbol count = 2000, decoded 126 MB in 0.340secs using 0.0% overhead, throughput: 2987.1Mbit/s | |
symbol count = 5000, decoded 122 MB in 0.366secs using 0.0% overhead, throughput: 2668.2Mbit/s | |
symbol count = 10000, decoded 122 MB in 0.432secs using 0.0% overhead, throughput: 2260.6Mbit/s | |
symbol count = 20000, decoded 122 MB in 0.590secs using 0.0% overhead, throughput: 1655.2Mbit/s | |
symbol count = 50000, decoded 122 MB in 0.838secs using 0.0% overhead, throughput: 1165.3Mbit/s | |
symbol count = 10, decoded 127 MB in 0.434secs using 5.0% overhead, throughput: 2359.3Mbit/s | |
symbol count = 100, decoded 127 MB in 0.343secs using 5.0% overhead, throughput: 2983.8Mbit/s | |
symbol count = 250, decoded 127 MB in 0.324secs using 5.0% overhead, throughput: 3157.3Mbit/s | |
symbol count = 500, decoded 127 MB in 0.308secs using 5.0% overhead, throughput: 3313.3Mbit/s | |
symbol count = 1000, decoded 126 MB in 0.325secs using 5.0% overhead, throughput: 3125.0Mbit/s | |
symbol count = 2000, decoded 126 MB in 0.337secs using 5.0% overhead, throughput: 3013.7Mbit/s | |
symbol count = 5000, decoded 122 MB in 0.374secs using 5.0% overhead, throughput: 2611.1Mbit/s | |
symbol count = 10000, decoded 122 MB in 0.472secs using 5.0% overhead, throughput: 2069.0Mbit/s | |
symbol count = 20000, decoded 122 MB in 0.634secs using 5.0% overhead, throughput: 1540.3Mbit/s | |
symbol count = 50000, decoded 122 MB in 1.023secs using 5.0% overhead, throughput: 954.6Mbit/s | |
Running benches/encode_benchmark.rs (target/x86_64-unknown-linux-gnu/release/deps/encode_benchmark-c98ccc72b678f27c) | |
Symbol size: 1280 bytes (without pre-built plan) | |
symbol count = 10, encoded 127 MB in 0.297secs, throughput: 3447.6Mbit/s | |
symbol count = 100, encoded 127 MB in 0.224secs, throughput: 4568.9Mbit/s | |
symbol count = 250, encoded 127 MB in 0.252secs, throughput: 4059.3Mbit/s | |
symbol count = 500, encoded 127 MB in 0.253secs, throughput: 4033.6Mbit/s | |
symbol count = 1000, encoded 126 MB in 0.260secs, throughput: 3906.2Mbit/s | |
symbol count = 2000, encoded 126 MB in 0.277secs, throughput: 3666.5Mbit/s | |
symbol count = 5000, encoded 122 MB in 0.296secs, throughput: 3299.2Mbit/s | |
symbol count = 10000, encoded 122 MB in 0.367secs, throughput: 2660.9Mbit/s | |
symbol count = 20000, encoded 122 MB in 0.485secs, throughput: 2013.5Mbit/s | |
symbol count = 50000, encoded 122 MB in 0.661secs, throughput: 1477.4Mbit/s | |
Symbol size: 1280 bytes (with pre-built plan) | |
symbol count = 10, encoded 127 MB in 0.151secs, throughput: 6781.0Mbit/s | |
symbol count = 100, encoded 127 MB in 0.104secs, throughput: 9840.7Mbit/s | |
symbol count = 250, encoded 127 MB in 0.114secs, throughput: 8973.2Mbit/s | |
symbol count = 500, encoded 127 MB in 0.115secs, throughput: 8874.0Mbit/s | |
symbol count = 1000, encoded 126 MB in 0.119secs, throughput: 8534.7Mbit/s | |
symbol count = 2000, encoded 126 MB in 0.126secs, throughput: 8060.5Mbit/s | |
symbol count = 5000, encoded 122 MB in 0.135secs, throughput: 7233.8Mbit/s | |
symbol count = 10000, encoded 122 MB in 0.164secs, throughput: 5954.6Mbit/s | |
symbol count = 20000, encoded 122 MB in 0.210secs, throughput: 4650.3Mbit/s | |
symbol count = 50000, encoded 122 MB in 0.364secs, throughput: 2682.9Mbit/s | |
Running benches/matrix_sparsity.rs (target/x86_64-unknown-linux-gnu/release/deps/matrix_sparsity-f6b45b7f73e4b7e8) | |
Row density for 27x27: min=0 max=18 p50=7 p80=18 p90=18 p95=18 p99=18 | |
Original density for 27x27: 231 of 729 (31.687%) | |
Initial memory usage: 3KB | |
Optimized decoder mul ops: 269 (26.9 per symbol), add ops: 414 (41.4 per symbol) | |
By phase mul ops: [129, 140, 0, 0, 0], add ops: [175, 152, 21, 45, 21] | |
Row density for 128x128: min=0 max=119 p50=5 p80=20 p90=20 p95=119 p99=119 | |
Original density for 128x128: 2077 of 16384 (12.677%) | |
Initial memory usage: 14KB | |
Optimized decoder mul ops: 1253 (12.4 per symbol), add ops: 2918 (28.9 per symbol) | |
By phase mul ops: [1016, 237, 0, 0, 0], add ops: [1585, 356, 350, 277, 350] | |
Row density for 1071x1071: min=0 max=1062 p50=5 p80=9 p90=29 p95=52 p99=53 | |
Original density for 1071x1071: 20933 of 1147041 (1.825%) | |
Initial memory usage: 125KB | |
Optimized decoder mul ops: 10600 (10.6 per symbol), add ops: 30302 (30.2 per symbol) | |
By phase mul ops: [9937, 663, 0, 0, 0], add ops: [16799, 2435, 4305, 2458, 4305] | |
Row density for 10269x10269: min=0 max=10251 p50=5 p80=8 p90=15 p95=32 p99=126 | |
Original density for 10269x10269: 214641 of 105452361 (0.204%) | |
Initial memory usage: 1373KB | |
Optimized decoder mul ops: 112049 (11.2 per symbol), add ops: 328540 (32.8 per symbol) | |
By phase mul ops: [109861, 2188, 0, 0, 0], add ops: [178357, 19887, 53259, 23778, 53259] | |
Row density for 41104x41104: min=0 max=41019 p50=5 p80=8 p90=14 p95=29 p99=177 | |
Original density for 41104x41104: 1025396 of 1689538816 (0.061%) | |
Initial memory usage: 6305KB | |
Optimized decoder mul ops: 612014 (15.1 per symbol), add ops: 1554467 (38.5 per symbol) | |
By phase mul ops: [605623, 6391, 0, 0, 0], add ops: [881919, 87897, 244445, 95761, 244445] | |
Row density for 57326x57326: min=0 max=57163 p50=5 p80=8 p90=13 p95=27 p99=189 | |
Original density for 57326x57326: 1484776 of 3286270276 (0.045%) | |
Initial memory usage: 9285KB | |
Optimized decoder mul ops: 910038 (16.1 per symbol), add ops: 2223234 (39.4 per symbol) | |
By phase mul ops: [901621, 8417, 0, 0, 0], add ops: [1285203, 137388, 333568, 133507, 333568] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment