We test the runtimes of simple compute shaders reading from one 3D texture using some kind of filter, and writing back to another texture. The local work group size of the compute shader is varied for some arbitrary set of work group sizes, and the effect of different internal texture formats are studied.
All tests are performed using 512x512x512
3D textures. At this size memory throughput and latency will be the primary bottleneck, so any extra calculations should have negligible impact on the timings.
All timings are measured by averaging the frame time across 128 frames, with a 128 frame warmup, with vsync disabled. Using queries might provide more stable numbers.
The work group sizes are:
[8, 8, 8]
[32, 32, 1]
[32, 1, 32]
[16, 16, 1]
[16, 1, 16]
[16, 16, 4]
[16, 4, 16]
[4, 16, 16]
[4, 2, 16]
[16, 2, 4]
[16, 4, 2]
[4, 16, 2]
[8, 2, 16]
[8, 16, 2]
[128, 4, 1]
[256, 4, 1]
[128, 1, 4]
[256, 1, 4]
[128, 1, 1]
[256, 1, 1]
[512, 1, 1]
And the internal texture formats used are:
R16F
R32F
RG16F
RG32F
RGBA16F
RGBA32F
It is expected that timings on similarly sized chunks should be roughly the same, and the timings should roughly scale linearly with the size of the chunks. R16F should, in an ideal world, be 8x faster than RGBA32F etc., although this will not be the case in practice. In particular, small sized (2 and 4 bytes) reads and write timings seem to vary much more than larger ones (8 and 16 bytes).
Note: The first column is the average frame time, and the second column is the standard deviation (variation in each measurement) of the frame times. Some measurements are more uncertain than others. Better timings will be provided in the future.
Note: Explicitly stating the format of the image write variable did not seem to matter much in practice, so I kept it at rgba32f
for all instances for simplicity.
- Texture write
- Texture copy
- 3-stencil 1D filter X
- 3-stencil 1D filter Y
- 3-stencil 1D filter Z
- 5-stencil 2D filter
- 9-stencil 2D filter
- 7-stencil 3D filter
- 27-stencil 3D filter
#version 440 core
layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;
void main() {
ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
imageStore(write_image, index, vec4(index, 1.0));
}
1.898 0.156 R16F [8, 8, 8]
2.065 0.124 R16F [32, 32, 1]
2.059 0.133 R16F [32, 1, 32]
1.732 0.124 R16F [16, 16, 1]
1.729 0.122 R16F [16, 1, 16]
2.062 0.161 R16F [16, 16, 4]
2.076 0.149 R16F [16, 4, 16]
2.073 0.155 R16F [4, 16, 16]
1.903 0.105 R16F [4, 2, 16]
1.903 0.107 R16F [16, 2, 4]
1.904 0.109 R16F [16, 4, 2]
1.907 0.129 R16F [4, 16, 2]
1.712 0.102 R16F [8, 2, 16]
1.711 0.103 R16F [8, 16, 2]
1.848 0.111 R16F [128, 4, 1]
2.046 0.111 R16F [256, 4, 1]
1.853 0.105 R16F [128, 1, 4]
2.040 0.113 R16F [256, 1, 4]
1.908 0.112 R16F [128, 1, 1]
1.718 0.121 R16F [256, 1, 1]
1.851 0.118 R16F [512, 1, 1]
2.619 0.144 R32F [8, 8, 8]
2.620 0.149 R32F [32, 32, 1]
2.613 0.143 R32F [32, 1, 32]
2.616 0.145 R32F [16, 16, 1]
2.617 0.144 R32F [16, 1, 16]
2.616 0.143 R32F [16, 16, 4]
2.614 0.143 R32F [16, 4, 16]
2.615 0.142 R32F [4, 16, 16]
2.614 0.138 R32F [4, 2, 16]
2.612 0.141 R32F [16, 2, 4]
2.615 0.146 R32F [16, 4, 2]
2.616 0.143 R32F [4, 16, 2]
2.614 0.143 R32F [8, 2, 16]
2.615 0.157 R32F [8, 16, 2]
2.616 0.145 R32F [128, 4, 1]
2.621 0.144 R32F [256, 4, 1]
2.617 0.141 R32F [128, 1, 4]
2.616 0.144 R32F [256, 1, 4]
2.625 0.156 R32F [128, 1, 1]
2.619 0.146 R32F [256, 1, 1]
2.620 0.139 R32F [512, 1, 1]
2.617 0.143 RG16F [8, 8, 8]
2.622 0.140 RG16F [32, 32, 1]
2.618 0.142 RG16F [32, 1, 32]
2.618 0.141 RG16F [16, 16, 1]
2.619 0.143 RG16F [16, 1, 16]
2.628 0.140 RG16F [16, 16, 4]
2.619 0.142 RG16F [16, 4, 16]
2.619 0.146 RG16F [4, 16, 16]
2.620 0.149 RG16F [4, 2, 16]
2.617 0.144 RG16F [16, 2, 4]
2.617 0.139 RG16F [16, 4, 2]
2.618 0.141 RG16F [4, 16, 2]
2.616 0.139 RG16F [8, 2, 16]
2.621 0.140 RG16F [8, 16, 2]
2.618 0.135 RG16F [128, 4, 1]
2.622 0.143 RG16F [256, 4, 1]
2.617 0.138 RG16F [128, 1, 4]
2.613 0.136 RG16F [256, 1, 4]
2.618 0.140 RG16F [128, 1, 1]
2.621 0.143 RG16F [256, 1, 1]
2.618 0.140 RG16F [512, 1, 1]
5.007 0.167 RG32F [8, 8, 8]
5.010 0.168 RG32F [32, 32, 1]
5.017 0.175 RG32F [32, 1, 32]
5.092 0.206 RG32F [16, 16, 1]
5.031 0.177 RG32F [16, 1, 16]
5.036 0.182 RG32F [16, 16, 4]
5.035 0.181 RG32F [16, 4, 16]
5.030 0.174 RG32F [4, 16, 16]
5.048 0.185 RG32F [4, 2, 16]
5.030 0.180 RG32F [16, 2, 4]
5.034 0.182 RG32F [16, 4, 2]
5.029 0.185 RG32F [4, 16, 2]
5.052 0.191 RG32F [8, 2, 16]
5.052 0.201 RG32F [8, 16, 2]
5.035 0.187 RG32F [128, 4, 1]
5.062 0.216 RG32F [256, 4, 1]
5.046 0.192 RG32F [128, 1, 4]
5.013 0.172 RG32F [256, 1, 4]
5.094 0.206 RG32F [128, 1, 1]
5.094 0.215 RG32F [256, 1, 1]
5.096 0.210 RG32F [512, 1, 1]
5.021 0.171 RGBA16F [8, 8, 8]
5.033 0.184 RGBA16F [32, 32, 1]
5.034 0.179 RGBA16F [32, 1, 32]
5.027 0.172 RGBA16F [16, 16, 1]
5.020 0.177 RGBA16F [16, 1, 16]
5.027 0.174 RGBA16F [16, 16, 4]
5.016 0.176 RGBA16F [16, 4, 16]
5.004 0.166 RGBA16F [4, 16, 16]
5.018 0.175 RGBA16F [4, 2, 16]
5.006 0.169 RGBA16F [16, 2, 4]
5.017 0.169 RGBA16F [16, 4, 2]
5.013 0.170 RGBA16F [4, 16, 2]
5.017 0.174 RGBA16F [8, 2, 16]
5.010 0.174 RGBA16F [8, 16, 2]
5.093 0.257 RGBA16F [128, 4, 1]
5.049 0.229 RGBA16F [256, 4, 1]
5.002 0.205 RGBA16F [128, 1, 4]
4.999 0.214 RGBA16F [256, 1, 4]
5.009 0.206 RGBA16F [128, 1, 1]
5.014 0.217 RGBA16F [256, 1, 1]
5.011 0.211 RGBA16F [512, 1, 1]
9.875 0.290 RGBA32F [8, 8, 8]
9.867 0.275 RGBA32F [32, 32, 1]
10.159 0.239 RGBA32F [32, 1, 32]
9.861 0.263 RGBA32F [16, 16, 1]
9.886 0.278 RGBA32F [16, 1, 16]
9.860 0.277 RGBA32F [16, 16, 4]
9.872 0.274 RGBA32F [16, 4, 16]
9.854 0.277 RGBA32F [4, 16, 16]
9.828 0.277 RGBA32F [4, 2, 16]
9.876 0.271 RGBA32F [16, 2, 4]
9.875 0.276 RGBA32F [16, 4, 2]
9.793 0.244 RGBA32F [4, 16, 2]
9.800 0.246 RGBA32F [8, 2, 16]
9.809 0.244 RGBA32F [8, 16, 2]
9.872 0.280 RGBA32F [128, 4, 1]
9.858 0.262 RGBA32F [256, 4, 1]
9.852 0.258 RGBA32F [128, 1, 4]
9.861 0.277 RGBA32F [256, 1, 4]
9.916 0.290 RGBA32F [128, 1, 1]
9.879 0.262 RGBA32F [256, 1, 1]
9.876 0.251 RGBA32F [512, 1, 1]
#version 440 core
layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;
uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;
void main() {
ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
vec4 t = texelFetch(read_sampler, index, 0);
imageStore(write_image, index, t);
}
3.573 0.178 R16F [8, 8, 8]
3.510 0.148 R16F [32, 32, 1]
3.208 0.195 R16F [32, 1, 32]
3.271 0.170 R16F [16, 16, 1]
3.580 0.174 R16F [16, 1, 16]
3.625 0.173 R16F [16, 16, 4]
3.529 0.188 R16F [16, 4, 16]
3.704 0.201 R16F [4, 16, 16]
3.230 0.152 R16F [4, 2, 16]
3.710 0.173 R16F [16, 2, 4]
3.378 0.145 R16F [16, 4, 2]
3.393 0.157 R16F [4, 16, 2]
3.318 0.158 R16F [8, 2, 16]
3.431 0.150 R16F [8, 16, 2]
3.282 0.145 R16F [128, 4, 1]
3.459 0.159 R16F [256, 4, 1]
4.193 6.952 R16F [128, 1, 4]
3.770 0.148 R16F [256, 1, 4]
3.230 0.131 R16F [128, 1, 1]
3.281 0.128 R16F [256, 1, 1]
3.349 0.151 R16F [512, 1, 1]
5.517 0.176 R32F [8, 8, 8]
5.366 0.176 R32F [32, 32, 1]
5.212 0.171 R32F [32, 1, 32]
5.366 0.194 R32F [16, 16, 1]
5.338 0.196 R32F [16, 1, 16]
5.535 0.171 R32F [16, 16, 4]
5.331 0.180 R32F [16, 4, 16]
5.675 0.182 R32F [4, 16, 16]
5.311 0.173 R32F [4, 2, 16]
5.463 0.168 R32F [16, 2, 4]
5.463 0.234 R32F [16, 4, 2]
5.519 0.246 R32F [4, 16, 2]
5.321 0.194 R32F [8, 2, 16]
5.462 0.175 R32F [8, 16, 2]
5.404 0.188 R32F [128, 4, 1]
5.387 0.178 R32F [256, 4, 1]
5.533 0.233 R32F [128, 1, 4]
5.450 0.204 R32F [256, 1, 4]
5.398 0.188 R32F [128, 1, 1]
5.384 0.193 R32F [256, 1, 1]
5.378 0.186 R32F [512, 1, 1]
5.583 0.231 RG16F [8, 8, 8]
5.409 0.197 RG16F [32, 32, 1]
5.279 0.233 RG16F [32, 1, 32]
5.383 0.198 RG16F [16, 16, 1]
5.332 0.182 RG16F [16, 1, 16]
5.547 0.187 RG16F [16, 16, 4]
5.346 0.179 RG16F [16, 4, 16]
5.700 0.183 RG16F [4, 16, 16]
5.338 0.210 RG16F [4, 2, 16]
5.484 0.174 RG16F [16, 2, 4]
5.449 0.175 RG16F [16, 4, 2]
5.452 0.184 RG16F [4, 16, 2]
5.364 0.211 RG16F [8, 2, 16]
5.479 0.206 RG16F [8, 16, 2]
5.380 0.170 RG16F [128, 4, 1]
5.382 0.169 RG16F [256, 4, 1]
5.448 0.168 RG16F [128, 1, 4]
5.410 0.177 RG16F [256, 1, 4]
5.360 0.162 RG16F [128, 1, 1]
5.382 0.232 RG16F [256, 1, 1]
5.375 0.178 RG16F [512, 1, 1]
10.770 0.178 RG32F [8, 8, 8]
10.535 0.188 RG32F [32, 32, 1]
10.440 0.218 RG32F [32, 1, 32]
10.530 0.185 RG32F [16, 16, 1]
10.414 0.182 RG32F [16, 1, 16]
10.929 0.194 RG32F [16, 16, 4]
10.509 0.207 RG32F [16, 4, 16]
10.664 0.188 RG32F [4, 16, 16]
10.470 0.189 RG32F [4, 2, 16]
10.605 0.179 RG32F [16, 2, 4]
10.594 0.178 RG32F [16, 4, 2]
10.790 0.199 RG32F [4, 16, 2]
10.475 0.183 RG32F [8, 2, 16]
10.775 0.189 RG32F [8, 16, 2]
10.531 0.229 RG32F [128, 4, 1]
10.521 0.182 RG32F [256, 4, 1]
10.662 0.176 RG32F [128, 1, 4]
10.606 0.182 RG32F [256, 1, 4]
10.538 0.206 RG32F [128, 1, 1]
10.562 0.207 RG32F [256, 1, 1]
10.562 0.204 RG32F [512, 1, 1]
10.719 0.177 RGBA16F [8, 8, 8]
10.534 0.185 RGBA16F [32, 32, 1]
10.364 0.180 RGBA16F [32, 1, 32]
10.526 0.189 RGBA16F [16, 16, 1]
10.415 0.178 RGBA16F [16, 1, 16]
10.874 0.182 RGBA16F [16, 16, 4]
10.476 0.182 RGBA16F [16, 4, 16]
10.658 0.182 RGBA16F [4, 16, 16]
10.467 0.184 RGBA16F [4, 2, 16]
10.604 0.188 RGBA16F [16, 2, 4]
10.593 0.189 RGBA16F [16, 4, 2]
10.756 0.371 RGBA16F [4, 16, 2]
10.476 0.181 RGBA16F [8, 2, 16]
10.761 0.178 RGBA16F [8, 16, 2]
10.516 0.172 RGBA16F [128, 4, 1]
10.523 0.186 RGBA16F [256, 4, 1]
10.649 0.185 RGBA16F [128, 1, 4]
10.618 0.183 RGBA16F [256, 1, 4]
10.525 0.201 RGBA16F [128, 1, 1]
10.518 0.177 RGBA16F [256, 1, 1]
10.526 0.193 RGBA16F [512, 1, 1]
20.926 0.073 RGBA32F [8, 8, 8]
20.789 0.110 RGBA32F [32, 32, 1]
31.603 0.284 RGBA32F [32, 1, 32]
20.785 0.117 RGBA32F [16, 16, 1]
20.831 0.118 RGBA32F [16, 1, 16]
21.077 0.102 RGBA32F [16, 16, 4]
20.707 0.098 RGBA32F [16, 4, 16]
20.835 0.266 RGBA32F [4, 16, 16]
20.837 0.263 RGBA32F [4, 2, 16]
20.705 0.086 RGBA32F [16, 2, 4]
20.901 0.209 RGBA32F [16, 4, 2]
20.890 0.074 RGBA32F [4, 16, 2]
20.690 0.085 RGBA32F [8, 2, 16]
20.843 0.064 RGBA32F [8, 16, 2]
20.704 0.106 RGBA32F [128, 4, 1]
20.699 0.083 RGBA32F [256, 4, 1]
20.799 0.076 RGBA32F [128, 1, 4]
20.912 0.310 RGBA32F [256, 1, 4]
20.696 0.187 RGBA32F [128, 1, 1]
20.661 0.116 RGBA32F [256, 1, 1]
20.702 0.125 RGBA32F [512, 1, 1]
#version 440 core
layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;
uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;
void main() {
ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
vec4 t1 = texelFetch(read_sampler, index-ivec3(1, 0, 0), 0);
vec4 t2 = texelFetch(read_sampler, index, 0);
vec4 t3 = texelFetch(read_sampler, index+ivec3(1, 0, 0), 0);
imageStore(write_image, index, 0.25*(t1 + 2*t2 + t3));
}
4.003 0.147 R16F [8, 8, 8]
4.327 0.163 R16F [32, 32, 1]
4.192 0.163 R16F [32, 1, 32]
3.680 0.145 R16F [16, 16, 1]
4.170 0.166 R16F [16, 1, 16]
4.383 0.161 R16F [16, 16, 4]
4.270 0.153 R16F [16, 4, 16]
4.336 0.165 R16F [4, 16, 16]
3.674 0.332 R16F [4, 2, 16]
4.185 0.263 R16F [16, 2, 4]
3.748 0.260 R16F [16, 4, 2]
3.720 0.164 R16F [4, 16, 2]
3.797 0.230 R16F [8, 2, 16]
3.841 0.260 R16F [8, 16, 2]
3.811 0.272 R16F [128, 4, 1]
4.113 0.232 R16F [256, 4, 1]
4.218 0.161 R16F [128, 1, 4]
4.380 0.151 R16F [256, 1, 4]
3.643 0.167 R16F [128, 1, 1]
3.728 0.245 R16F [256, 1, 1]
3.840 0.167 R16F [512, 1, 1]
5.713 0.271 R32F [8, 8, 8]
5.544 0.270 R32F [32, 32, 1]
6.379 6.625 R32F [32, 1, 32]
5.505 0.255 R32F [16, 16, 1]
5.969 0.209 R32F [16, 1, 16]
5.636 0.204 R32F [16, 16, 4]
5.442 0.253 R32F [16, 4, 16]
6.047 0.436 R32F [4, 16, 16]
5.434 0.284 R32F [4, 2, 16]
5.503 0.350 R32F [16, 2, 4]
5.509 0.233 R32F [16, 4, 2]
5.567 0.251 R32F [4, 16, 2]
5.493 0.250 R32F [8, 2, 16]
5.572 0.225 R32F [8, 16, 2]
5.449 0.246 R32F [128, 4, 1]
5.434 0.217 R32F [256, 4, 1]
5.505 0.214 R32F [128, 1, 4]
5.492 0.196 R32F [256, 1, 4]
5.443 0.198 R32F [128, 1, 1]
5.440 0.250 R32F [256, 1, 1]
5.438 0.229 R32F [512, 1, 1]
5.658 0.244 RG16F [8, 8, 8]
5.468 0.204 RG16F [32, 32, 1]
5.785 0.202 RG16F [32, 1, 32]
5.381 0.176 RG16F [16, 16, 1]
5.954 0.175 RG16F [16, 1, 16]
5.599 0.192 RG16F [16, 16, 4]
5.384 0.185 RG16F [16, 4, 16]
5.954 0.173 RG16F [4, 16, 16]
5.348 0.177 RG16F [4, 2, 16]
5.467 0.182 RG16F [16, 2, 4]
5.451 0.176 RG16F [16, 4, 2]
5.507 0.178 RG16F [4, 16, 2]
5.408 0.185 RG16F [8, 2, 16]
5.501 0.177 RG16F [8, 16, 2]
5.395 0.191 RG16F [128, 4, 1]
5.454 0.216 RG16F [256, 4, 1]
5.490 0.211 RG16F [128, 1, 4]
5.482 0.199 RG16F [256, 1, 4]
5.389 0.187 RG16F [128, 1, 1]
5.387 0.189 RG16F [256, 1, 1]
5.394 0.263 RG16F [512, 1, 1]
10.817 0.175 RG32F [8, 8, 8]
10.584 0.234 RG32F [32, 32, 1]
10.287 0.206 RG32F [32, 1, 32]
10.533 0.346 RG32F [16, 16, 1]
10.449 0.251 RG32F [16, 1, 16]
11.136 0.287 RG32F [16, 16, 4]
10.482 0.198 RG32F [16, 4, 16]
10.730 0.303 RG32F [4, 16, 16]
10.337 0.220 RG32F [4, 2, 16]
10.630 0.231 RG32F [16, 2, 4]
10.576 0.194 RG32F [16, 4, 2]
10.783 0.214 RG32F [4, 16, 2]
10.414 0.243 RG32F [8, 2, 16]
10.794 0.208 RG32F [8, 16, 2]
10.558 0.232 RG32F [128, 4, 1]
10.537 0.208 RG32F [256, 4, 1]
10.672 0.203 RG32F [128, 1, 4]
10.659 0.210 RG32F [256, 1, 4]
10.512 0.194 RG32F [128, 1, 1]
10.523 0.210 RG32F [256, 1, 1]
10.527 0.216 RG32F [512, 1, 1]
10.812 0.180 RGBA16F [8, 8, 8]
10.570 0.189 RGBA16F [32, 32, 1]
10.312 0.253 RGBA16F [32, 1, 32]
10.510 0.210 RGBA16F [16, 16, 1]
10.415 0.196 RGBA16F [16, 1, 16]
11.011 0.210 RGBA16F [16, 16, 4]
10.460 0.183 RGBA16F [16, 4, 16]
10.655 0.186 RGBA16F [4, 16, 16]
10.374 0.216 RGBA16F [4, 2, 16]
10.616 0.208 RGBA16F [16, 2, 4]
10.605 0.197 RGBA16F [16, 4, 2]
10.788 0.191 RGBA16F [4, 16, 2]
10.445 0.209 RGBA16F [8, 2, 16]
10.786 0.186 RGBA16F [8, 16, 2]
10.500 0.213 RGBA16F [128, 4, 1]
10.502 0.188 RGBA16F [256, 4, 1]
10.680 0.189 RGBA16F [128, 1, 4]
10.640 0.179 RGBA16F [256, 1, 4]
10.506 0.214 RGBA16F [128, 1, 1]
10.504 0.196 RGBA16F [256, 1, 1]
10.505 0.194 RGBA16F [512, 1, 1]
20.974 0.115 RGBA32F [8, 8, 8]
20.833 0.073 RGBA32F [32, 32, 1]
30.154 0.122 RGBA32F [32, 1, 32]
20.728 0.093 RGBA32F [16, 16, 1]
20.582 0.086 RGBA32F [16, 1, 16]
21.177 0.143 RGBA32F [16, 16, 4]
20.817 0.145 RGBA32F [16, 4, 16]
20.885 0.109 RGBA32F [4, 16, 16]
20.650 0.107 RGBA32F [4, 2, 16]
20.722 0.103 RGBA32F [16, 2, 4]
20.828 0.072 RGBA32F [16, 4, 2]
20.781 0.177 RGBA32F [4, 16, 2]
20.519 0.155 RGBA32F [8, 2, 16]
20.689 0.076 RGBA32F [8, 16, 2]
20.660 0.143 RGBA32F [128, 4, 1]
20.610 0.106 RGBA32F [256, 4, 1]
20.821 0.099 RGBA32F [128, 1, 4]
20.838 0.132 RGBA32F [256, 1, 4]
20.699 0.131 RGBA32F [128, 1, 1]
20.604 0.098 RGBA32F [256, 1, 1]
20.625 0.474 RGBA32F [512, 1, 1]
#version 440 core
layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;
uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;
void main() {
ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
vec4 t1 = texelFetch(read_sampler, index-ivec3(0, 1, 0), 0);
vec4 t2 = texelFetch(read_sampler, index, 0);
vec4 t3 = texelFetch(read_sampler, index+ivec3(0, 1, 0), 0);
imageStore(write_image, index, 0.25*(t1 + 2*t2 + t3));
}
4.079 0.165 R16F [8, 8, 8]
4.338 0.154 R16F [32, 32, 1]
4.209 0.163 R16F [32, 1, 32]
3.726 0.160 R16F [16, 16, 1]
4.210 0.149 R16F [16, 1, 16]
4.437 0.151 R16F [16, 16, 4]
4.377 0.158 R16F [16, 4, 16]
4.461 0.169 R16F [4, 16, 16]
4.174 0.156 R16F [4, 2, 16]
4.093 0.144 R16F [16, 2, 4]
3.729 0.154 R16F [16, 4, 2]
3.802 0.149 R16F [4, 16, 2]
4.198 0.146 R16F [8, 2, 16]
3.852 0.158 R16F [8, 16, 2]
3.847 0.161 R16F [128, 4, 1]
4.195 0.171 R16F [256, 4, 1]
4.259 0.155 R16F [128, 1, 4]
4.950 6.184 R16F [256, 1, 4]
3.699 0.156 R16F [128, 1, 1]
3.759 0.152 R16F [256, 1, 1]
3.941 0.166 R16F [512, 1, 1]
5.489 0.203 R32F [8, 8, 8]
5.476 0.183 R32F [32, 32, 1]
5.430 0.184 R32F [32, 1, 32]
5.419 0.186 R32F [16, 16, 1]
5.736 0.178 R32F [16, 1, 16]
5.672 0.188 R32F [16, 16, 4]
5.417 0.173 R32F [16, 4, 16]
5.774 0.204 R32F [4, 16, 16]
5.515 0.183 R32F [4, 2, 16]
5.504 0.176 R32F [16, 2, 4]
5.499 0.183 R32F [16, 4, 2]
5.564 0.175 R32F [4, 16, 2]
5.408 0.177 R32F [8, 2, 16]
5.532 0.176 R32F [8, 16, 2]
5.416 0.185 R32F [128, 4, 1]
5.460 0.164 R32F [256, 4, 1]
5.513 0.246 R32F [128, 1, 4]
5.498 0.177 R32F [256, 1, 4]
5.409 0.158 R32F [128, 1, 1]
5.408 0.173 R32F [256, 1, 1]
5.420 0.176 R32F [512, 1, 1]
5.476 0.182 RG16F [8, 8, 8]
5.471 0.188 RG16F [32, 32, 1]
5.445 0.766 RG16F [32, 1, 32]
5.407 0.199 RG16F [16, 16, 1]
5.763 0.269 RG16F [16, 1, 16]
5.676 0.259 RG16F [16, 16, 4]
5.401 0.196 RG16F [16, 4, 16]
5.772 0.223 RG16F [4, 16, 16]
5.528 0.242 RG16F [4, 2, 16]
5.497 0.182 RG16F [16, 2, 4]
5.488 0.200 RG16F [16, 4, 2]
5.657 0.332 RG16F [4, 16, 2]
5.411 0.223 RG16F [8, 2, 16]
5.536 0.195 RG16F [8, 16, 2]
5.422 0.184 RG16F [128, 4, 1]
5.453 0.202 RG16F [256, 4, 1]
5.485 0.203 RG16F [128, 1, 4]
5.499 0.197 RG16F [256, 1, 4]
5.408 0.196 RG16F [128, 1, 1]
5.409 0.206 RG16F [256, 1, 1]
5.417 0.208 RG16F [512, 1, 1]
10.699 0.240 RG32F [8, 8, 8]
10.565 0.218 RG32F [32, 32, 1]
12.087 0.210 RG32F [32, 1, 32]
10.550 0.219 RG32F [16, 16, 1]
10.539 0.282 RG32F [16, 1, 16]
10.772 0.205 RG32F [16, 16, 4]
10.806 0.305 RG32F [16, 4, 16]
12.837 0.192 RG32F [4, 16, 16]
10.520 0.263 RG32F [4, 2, 16]
10.684 0.246 RG32F [16, 2, 4]
10.611 0.213 RG32F [16, 4, 2]
10.811 0.430 RG32F [4, 16, 2]
10.533 0.249 RG32F [8, 2, 16]
10.638 0.215 RG32F [8, 16, 2]
10.541 0.206 RG32F [128, 4, 1]
10.543 0.263 RG32F [256, 4, 1]
10.709 0.199 RG32F [128, 1, 4]
10.661 0.260 RG32F [256, 1, 4]
10.533 0.308 RG32F [128, 1, 1]
10.549 0.267 RG32F [256, 1, 1]
10.572 0.222 RG32F [512, 1, 1]
10.697 0.214 RGBA16F [8, 8, 8]
10.617 0.283 RGBA16F [32, 32, 1]
12.152 0.233 RGBA16F [32, 1, 32]
10.515 0.213 RGBA16F [16, 16, 1]
10.517 0.274 RGBA16F [16, 1, 16]
10.746 0.196 RGBA16F [16, 16, 4]
10.753 0.207 RGBA16F [16, 4, 16]
12.835 0.207 RGBA16F [4, 16, 16]
10.519 0.257 RGBA16F [4, 2, 16]
10.663 0.192 RGBA16F [16, 2, 4]
10.627 0.203 RGBA16F [16, 4, 2]
10.729 0.302 RGBA16F [4, 16, 2]
10.592 0.310 RGBA16F [8, 2, 16]
10.729 0.261 RGBA16F [8, 16, 2]
10.568 0.260 RGBA16F [128, 4, 1]
10.582 0.225 RGBA16F [256, 4, 1]
10.746 0.266 RGBA16F [128, 1, 4]
10.643 0.211 RGBA16F [256, 1, 4]
10.547 0.243 RGBA16F [128, 1, 1]
10.513 0.195 RGBA16F [256, 1, 1]
10.571 0.261 RGBA16F [512, 1, 1]
23.959 0.150 RGBA32F [8, 8, 8]
20.694 0.109 RGBA32F [32, 32, 1]
57.408 0.156 RGBA32F [32, 1, 32]
20.672 0.087 RGBA32F [16, 16, 1]
24.954 0.125 RGBA32F [16, 1, 16]
21.663 0.163 RGBA32F [16, 16, 4]
34.343 0.135 RGBA32F [16, 4, 16]
25.197 0.198 RGBA32F [4, 16, 16]
28.447 0.141 RGBA32F [4, 2, 16]
20.743 0.228 RGBA32F [16, 2, 4]
20.786 0.072 RGBA32F [16, 4, 2]
20.714 0.060 RGBA32F [4, 16, 2]
28.122 0.094 RGBA32F [8, 2, 16]
20.760 0.073 RGBA32F [8, 16, 2]
20.731 0.085 RGBA32F [128, 4, 1]
20.739 0.069 RGBA32F [256, 4, 1]
20.926 0.073 RGBA32F [128, 1, 4]
20.871 0.080 RGBA32F [256, 1, 4]
20.672 0.100 RGBA32F [128, 1, 1]
20.686 0.074 RGBA32F [256, 1, 1]
20.704 0.069 RGBA32F [512, 1, 1]
#version 440 core
layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;
uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;
void main() {
ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
vec4 t1 = texelFetch(read_sampler, index-ivec3(0, 0, 1), 0);
vec4 t2 = texelFetch(read_sampler, index, 0);
vec4 t3 = texelFetch(read_sampler, index+ivec3(0, 0, 1), 0);
imageStore(write_image, index, 0.25*(t1 + 2*t2 + t3));
}
4.167 0.165 R16F [8, 8, 8]
4.688 0.188 R16F [32, 32, 1]
4.184 0.150 R16F [32, 1, 32]
4.225 0.208 R16F [16, 16, 1]
4.111 0.148 R16F [16, 1, 16]
4.664 0.147 R16F [16, 16, 4]
4.366 0.140 R16F [16, 4, 16]
4.447 0.146 R16F [4, 16, 16]
3.800 0.144 R16F [4, 2, 16]
4.220 0.133 R16F [16, 2, 4]
4.266 0.156 R16F [16, 4, 2]
4.310 0.165 R16F [4, 16, 2]
3.945 0.148 R16F [8, 2, 16]
4.383 0.175 R16F [8, 16, 2]
4.405 0.212 R16F [128, 4, 1]
4.705 0.176 R16F [256, 4, 1]
4.341 0.163 R16F [128, 1, 4]
4.601 0.177 R16F [256, 1, 4]
4.183 0.181 R16F [128, 1, 1]
4.233 0.193 R16F [256, 1, 1]
4.358 0.173 R16F [512, 1, 1]
6.155 0.195 R32F [8, 8, 8]
10.654 6.695 R32F [32, 32, 1]
5.712 0.182 R32F [32, 1, 32]
10.185 0.192 R32F [16, 16, 1]
6.187 0.184 R32F [16, 1, 16]
6.740 0.196 R32F [16, 16, 4]
5.676 0.200 R32F [16, 4, 16]
6.160 0.258 R32F [4, 16, 16]
5.753 0.219 R32F [4, 2, 16]
6.676 0.208 R32F [16, 2, 4]
7.772 0.207 R32F [16, 4, 2]
7.742 0.185 R32F [4, 16, 2]
5.659 0.172 R32F [8, 2, 16]
7.745 0.167 R32F [8, 16, 2]
10.152 0.167 R32F [128, 4, 1]
10.158 0.175 R32F [256, 4, 1]
6.618 0.182 R32F [128, 1, 4]
6.620 0.175 R32F [256, 1, 4]
10.193 0.173 R32F [128, 1, 1]
10.204 0.178 R32F [256, 1, 1]
10.216 0.180 R32F [512, 1, 1]
6.136 0.181 RG16F [8, 8, 8]
10.212 0.177 RG16F [32, 32, 1]
5.693 0.162 RG16F [32, 1, 32]
10.154 0.181 RG16F [16, 16, 1]
6.180 0.169 RG16F [16, 1, 16]
6.724 0.185 RG16F [16, 16, 4]
5.662 0.183 RG16F [16, 4, 16]
6.089 0.187 RG16F [4, 16, 16]
5.686 0.185 RG16F [4, 2, 16]
6.620 0.179 RG16F [16, 2, 4]
7.719 0.194 RG16F [16, 4, 2]
7.751 0.181 RG16F [4, 16, 2]
5.687 0.204 RG16F [8, 2, 16]
7.760 0.194 RG16F [8, 16, 2]
10.156 0.178 RG16F [128, 4, 1]
10.166 0.181 RG16F [256, 4, 1]
6.624 0.188 RG16F [128, 1, 4]
6.624 0.185 RG16F [256, 1, 4]
10.299 0.218 RG16F [128, 1, 1]
10.266 0.200 RG16F [256, 1, 1]
10.227 0.183 RG16F [512, 1, 1]
11.890 0.172 RG32F [8, 8, 8]
20.151 0.057 RG32F [32, 32, 1]
11.462 0.189 RG32F [32, 1, 32]
20.045 0.044 RG32F [16, 16, 1]
11.589 0.187 RG32F [16, 1, 16]
13.064 0.156 RG32F [16, 16, 4]
11.018 0.176 RG32F [16, 4, 16]
11.176 0.803 RG32F [4, 16, 16]
10.986 0.188 RG32F [4, 2, 16]
12.855 0.159 RG32F [16, 2, 4]
15.148 0.124 RG32F [16, 4, 2]
15.235 0.137 RG32F [4, 16, 2]
11.072 0.188 RG32F [8, 2, 16]
15.298 0.128 RG32F [8, 16, 2]
20.046 0.080 RG32F [128, 4, 1]
19.978 0.043 RG32F [256, 4, 1]
12.891 0.149 RG32F [128, 1, 4]
12.853 0.156 RG32F [256, 1, 4]
20.136 0.070 RG32F [128, 1, 1]
20.132 0.073 RG32F [256, 1, 1]
20.083 0.057 RG32F [512, 1, 1]
11.919 0.192 RGBA16F [8, 8, 8]
20.137 0.047 RGBA16F [32, 32, 1]
11.386 0.179 RGBA16F [32, 1, 32]
20.005 0.084 RGBA16F [16, 16, 1]
11.558 0.189 RGBA16F [16, 1, 16]
13.074 0.160 RGBA16F [16, 16, 4]
11.014 0.175 RGBA16F [16, 4, 16]
11.194 0.164 RGBA16F [4, 16, 16]
11.006 0.198 RGBA16F [4, 2, 16]
12.883 0.167 RGBA16F [16, 2, 4]
15.135 0.112 RGBA16F [16, 4, 2]
15.184 0.105 RGBA16F [4, 16, 2]
11.035 0.186 RGBA16F [8, 2, 16]
15.227 0.120 RGBA16F [8, 16, 2]
20.003 0.054 RGBA16F [128, 4, 1]
20.018 0.072 RGBA16F [256, 4, 1]
12.938 0.169 RGBA16F [128, 1, 4]
12.893 0.164 RGBA16F [256, 1, 4]
20.024 0.049 RGBA16F [128, 1, 1]
20.062 0.083 RGBA16F [256, 1, 1]
20.088 0.071 RGBA16F [512, 1, 1]
23.258 0.081 RGBA32F [8, 8, 8]
39.570 0.086 RGBA32F [32, 32, 1]
34.507 0.144 RGBA32F [32, 1, 32]
39.456 0.074 RGBA32F [16, 16, 1]
21.960 0.053 RGBA32F [16, 1, 16]
25.446 0.211 RGBA32F [16, 16, 4]
21.855 0.152 RGBA32F [16, 4, 16]
21.957 0.187 RGBA32F [4, 16, 16]
22.197 0.153 RGBA32F [4, 2, 16]
25.247 0.120 RGBA32F [16, 2, 4]
29.737 0.219 RGBA32F [16, 4, 2]
29.744 0.062 RGBA32F [4, 16, 2]
21.855 0.082 RGBA32F [8, 2, 16]
29.758 0.067 RGBA32F [8, 16, 2]
39.328 0.074 RGBA32F [128, 4, 1]
39.696 0.090 RGBA32F [256, 4, 1]
25.533 0.085 RGBA32F [128, 1, 4]
25.449 0.089 RGBA32F [256, 1, 4]
39.519 0.107 RGBA32F [128, 1, 1]
39.498 0.076 RGBA32F [256, 1, 1]
39.499 0.099 RGBA32F [512, 1, 1]
#version 440 core
layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;
uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;
void main() {
ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
vec4 t1 = texelFetch(read_sampler, index-ivec3(0, 1, 0), 0);
vec4 t2 = texelFetch(read_sampler, index-ivec3(1, 0, 0), 0);
vec4 t3 = texelFetch(read_sampler, index, 0);
vec4 t4 = texelFetch(read_sampler, index+ivec3(1, 0, 0), 0);
vec4 t5 = texelFetch(read_sampler, index+ivec3(0, 1, 0), 0);
imageStore(write_image, index, 0.125*(t1 + t2 + 4*t3 + t4 + t5));
}
5.104 0.158 R16F [8, 8, 8]
5.452 0.164 R16F [32, 32, 1]
5.334 0.154 R16F [32, 1, 32]
4.813 0.152 R16F [16, 16, 1]
5.121 0.164 R16F [16, 1, 16]
5.664 0.157 R16F [16, 16, 4]
5.739 0.156 R16F [16, 4, 16]
5.706 0.150 R16F [4, 16, 16]
4.997 0.144 R16F [4, 2, 16]
5.037 0.155 R16F [16, 2, 4]
4.861 0.152 R16F [16, 4, 2]
4.886 0.149 R16F [4, 16, 2]
4.998 0.154 R16F [8, 2, 16]
4.906 0.150 R16F [8, 16, 2]
4.960 0.151 R16F [128, 4, 1]
5.264 0.168 R16F [256, 4, 1]
5.155 0.156 R16F [128, 1, 4]
5.484 0.168 R16F [256, 1, 4]
4.761 0.148 R16F [128, 1, 1]
4.791 0.164 R16F [256, 1, 1]
4.992 0.170 R16F [512, 1, 1]
6.137 0.168 R32F [8, 8, 8]
6.512 0.184 R32F [32, 32, 1]
6.382 0.180 R32F [32, 1, 32]
5.851 0.180 R32F [16, 16, 1]
6.826 0.199 R32F [16, 1, 16]
6.664 0.176 R32F [16, 16, 4]
6.219 0.183 R32F [16, 4, 16]
7.171 0.187 R32F [4, 16, 16]
6.213 0.175 R32F [4, 2, 16]
5.818 0.166 R32F [16, 2, 4]
5.878 0.172 R32F [16, 4, 2]
6.141 0.180 R32F [4, 16, 2]
6.157 0.178 R32F [8, 2, 16]
6.084 0.175 R32F [8, 16, 2]
6.028 0.171 R32F [128, 4, 1]
6.429 0.174 R32F [256, 4, 1]
6.031 0.173 R32F [128, 1, 4]
6.309 0.181 R32F [256, 1, 4]
5.972 0.172 R32F [128, 1, 1]
6.012 0.168 R32F [256, 1, 1]
6.220 0.172 R32F [512, 1, 1]
6.131 0.174 RG16F [8, 8, 8]
6.506 0.188 RG16F [32, 32, 1]
6.388 0.178 RG16F [32, 1, 32]
5.850 0.181 RG16F [16, 16, 1]
6.827 0.189 RG16F [16, 1, 16]
6.667 0.185 RG16F [16, 16, 4]
6.224 0.180 RG16F [16, 4, 16]
7.172 0.180 RG16F [4, 16, 16]
6.212 0.169 RG16F [4, 2, 16]
5.817 0.174 RG16F [16, 2, 4]
5.879 0.176 RG16F [16, 4, 2]
6.150 0.179 RG16F [4, 16, 2]
6.163 0.172 RG16F [8, 2, 16]
6.091 0.196 RG16F [8, 16, 2]
6.028 0.186 RG16F [128, 4, 1]
6.434 0.176 RG16F [256, 4, 1]
6.034 0.180 RG16F [128, 1, 4]
6.309 0.173 RG16F [256, 1, 4]
5.975 0.179 RG16F [128, 1, 1]
6.012 0.171 RG16F [256, 1, 1]
6.237 0.185 RG16F [512, 1, 1]
10.669 0.191 RG32F [8, 8, 8]
10.685 0.190 RG32F [32, 32, 1]
11.355 0.186 RG32F [32, 1, 32]
10.548 0.184 RG32F [16, 16, 1]
11.391 0.172 RG32F [16, 1, 16]
11.129 0.175 RG32F [16, 16, 4]
10.524 0.184 RG32F [16, 4, 16]
12.825 0.210 RG32F [4, 16, 16]
10.668 0.198 RG32F [4, 2, 16]
10.590 0.185 RG32F [16, 2, 4]
10.584 0.187 RG32F [16, 4, 2]
10.677 0.186 RG32F [4, 16, 2]
10.508 0.192 RG32F [8, 2, 16]
10.716 0.184 RG32F [8, 16, 2]
10.545 0.189 RG32F [128, 4, 1]
10.533 0.182 RG32F [256, 4, 1]
10.668 0.174 RG32F [128, 1, 4]
10.608 0.202 RG32F [256, 1, 4]
10.531 0.185 RG32F [128, 1, 1]
10.528 0.180 RG32F [256, 1, 1]
10.516 0.183 RG32F [512, 1, 1]
10.661 0.183 RGBA16F [8, 8, 8]
10.675 0.198 RGBA16F [32, 32, 1]
11.355 0.202 RGBA16F [32, 1, 32]
10.548 0.189 RGBA16F [16, 16, 1]
11.382 0.174 RGBA16F [16, 1, 16]
11.129 0.179 RGBA16F [16, 16, 4]
10.528 0.205 RGBA16F [16, 4, 16]
12.825 0.153 RGBA16F [4, 16, 16]
10.667 0.187 RGBA16F [4, 2, 16]
10.590 0.200 RGBA16F [16, 2, 4]
10.585 0.180 RGBA16F [16, 4, 2]
10.678 0.196 RGBA16F [4, 16, 2]
10.506 0.189 RGBA16F [8, 2, 16]
10.709 0.188 RGBA16F [8, 16, 2]
10.542 0.194 RGBA16F [128, 4, 1]
10.532 0.188 RGBA16F [256, 4, 1]
10.664 0.190 RGBA16F [128, 1, 4]
10.607 0.180 RGBA16F [256, 1, 4]
10.534 0.178 RGBA16F [128, 1, 1]
10.519 0.188 RGBA16F [256, 1, 1]
10.519 0.196 RGBA16F [512, 1, 1]
23.525 0.068 RGBA32F [8, 8, 8]
20.762 0.057 RGBA32F [32, 32, 1]
55.123 0.086 RGBA32F [32, 1, 32]
20.667 0.099 RGBA32F [16, 16, 1]
22.857 0.067 RGBA32F [16, 1, 16]
21.478 0.063 RGBA32F [16, 16, 4]
33.865 0.088 RGBA32F [16, 4, 16]
25.081 0.052 RGBA32F [4, 16, 16]
26.124 0.088 RGBA32F [4, 2, 16]
20.719 0.065 RGBA32F [16, 2, 4]
20.753 0.047 RGBA32F [16, 4, 2]
20.654 0.059 RGBA32F [4, 16, 2]
26.513 0.099 RGBA32F [8, 2, 16]
20.726 0.051 RGBA32F [8, 16, 2]
20.635 0.053 RGBA32F [128, 4, 1]
20.632 0.057 RGBA32F [256, 4, 1]
20.902 0.065 RGBA32F [128, 1, 4]
23.612 0.068 RGBA32F [256, 1, 4]
20.714 0.051 RGBA32F [128, 1, 1]
20.798 0.046 RGBA32F [256, 1, 1]
24.009 0.031 RGBA32F [512, 1, 1]
#version 440 core
layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;
uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;
void main() {
ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
vec4 t1 = texelFetch(read_sampler, index+ivec3(-1, -1, 0), 0);
vec4 t2 = texelFetch(read_sampler, index+ivec3( 0, -1, 0), 0);
vec4 t3 = texelFetch(read_sampler, index+ivec3(+1, -1, 0), 0);
vec4 t4 = texelFetch(read_sampler, index+ivec3(-1, 0, 0), 0);
vec4 t5 = texelFetch(read_sampler, index+ivec3( 0, 0, 0), 0);
vec4 t6 = texelFetch(read_sampler, index+ivec3(+1, 0, 0), 0);
vec4 t7 = texelFetch(read_sampler, index+ivec3(-1, +1, 0), 0);
vec4 t8 = texelFetch(read_sampler, index+ivec3( 0, +1, 0), 0);
vec4 t9 = texelFetch(read_sampler, index+ivec3(+1, +1, 0), 0);
imageStore(write_image, index, (1.0/16.0)*(t1 + 2*t2 + t3 + 2*t4 + 4*t5 + 2*t6 + t7 + 2*t8 + t9));
}
7.612 0.183 R16F [8, 8, 8]
7.798 0.190 R16F [32, 32, 1]
7.812 0.171 R16F [32, 1, 32]
7.331 0.171 R16F [16, 16, 1]
7.333 0.168 R16F [16, 1, 16]
7.890 0.182 R16F [16, 16, 4]
7.478 0.166 R16F [16, 4, 16]
7.861 0.185 R16F [4, 16, 16]
7.265 0.182 R16F [4, 2, 16]
7.329 0.172 R16F [16, 2, 4]
7.289 0.170 R16F [16, 4, 2]
7.327 0.190 R16F [4, 16, 2]
7.355 0.198 R16F [8, 2, 16]
7.445 0.219 R16F [8, 16, 2]
7.601 0.229 R16F [128, 4, 1]
7.860 0.215 R16F [256, 4, 1]
7.648 0.221 R16F [128, 1, 4]
7.858 0.193 R16F [256, 1, 4]
7.222 0.195 R16F [128, 1, 1]
7.322 0.203 R16F [256, 1, 1]
7.585 0.173 R16F [512, 1, 1]
8.426 0.184 R32F [8, 8, 8]
8.720 0.284 R32F [32, 32, 1]
8.240 0.227 R32F [32, 1, 32]
8.703 0.195 R32F [16, 16, 1]
8.322 0.264 R32F [16, 1, 16]
8.721 0.227 R32F [16, 16, 4]
8.334 0.230 R32F [16, 4, 16]
9.333 0.197 R32F [4, 16, 16]
8.538 0.183 R32F [4, 2, 16]
8.160 0.201 R32F [16, 2, 4]
8.633 0.294 R32F [16, 4, 2]
8.577 0.241 R32F [4, 16, 2]
8.471 0.261 R32F [8, 2, 16]
8.558 0.231 R32F [8, 16, 2]
8.625 0.163 R32F [128, 4, 1]
8.710 0.272 R32F [256, 4, 1]
8.515 0.223 R32F [128, 1, 4]
8.527 0.194 R32F [256, 1, 4]
8.676 0.308 R32F [128, 1, 1]
8.662 0.171 R32F [256, 1, 1]
8.646 0.191 R32F [512, 1, 1]
8.403 0.175 RG16F [8, 8, 8]
8.692 0.202 RG16F [32, 32, 1]
8.186 0.170 RG16F [32, 1, 32]
8.742 0.259 RG16F [16, 16, 1]
8.329 0.220 RG16F [16, 1, 16]
8.746 0.273 RG16F [16, 16, 4]
8.248 0.196 RG16F [16, 4, 16]
9.294 0.157 RG16F [4, 16, 16]
8.565 0.201 RG16F [4, 2, 16]
8.189 0.213 RG16F [16, 2, 4]
8.667 0.253 RG16F [16, 4, 2]
8.592 0.283 RG16F [4, 16, 2]
8.397 0.186 RG16F [8, 2, 16]
8.552 0.234 RG16F [8, 16, 2]
8.671 0.193 RG16F [128, 4, 1]
8.698 0.199 RG16F [256, 4, 1]
8.519 0.227 RG16F [128, 1, 4]
8.475 0.161 RG16F [256, 1, 4]
8.635 0.151 RG16F [128, 1, 1]
8.637 0.166 RG16F [256, 1, 1]
8.701 0.223 RG16F [512, 1, 1]
11.206 0.149 RG32F [8, 8, 8]
11.417 0.169 RG32F [32, 32, 1]
12.442 0.227 RG32F [32, 1, 32]
11.812 0.202 RG32F [16, 16, 1]
13.048 0.223 RG32F [16, 1, 16]
12.869 0.186 RG32F [16, 16, 4]
11.033 0.212 RG32F [16, 4, 16]
13.954 0.225 RG32F [4, 16, 16]
13.155 0.212 RG32F [4, 2, 16]
11.075 0.186 RG32F [16, 2, 4]
11.504 0.226 RG32F [16, 4, 2]
11.302 0.236 RG32F [4, 16, 2]
12.346 0.215 RG32F [8, 2, 16]
11.305 0.297 RG32F [8, 16, 2]
11.294 0.246 RG32F [128, 4, 1]
11.014 0.303 RG32F [256, 4, 1]
11.692 0.205 RG32F [128, 1, 4]
11.754 0.263 RG32F [256, 1, 4]
11.551 0.313 RG32F [128, 1, 1]
11.716 0.239 RG32F [256, 1, 1]
11.441 0.231 RG32F [512, 1, 1]
11.318 0.266 RGBA16F [8, 8, 8]
11.446 0.247 RGBA16F [32, 32, 1]
12.464 0.231 RGBA16F [32, 1, 32]
11.842 0.245 RGBA16F [16, 16, 1]
13.018 0.212 RGBA16F [16, 1, 16]
12.840 0.231 RGBA16F [16, 16, 4]
11.098 0.280 RGBA16F [16, 4, 16]
13.943 0.268 RGBA16F [4, 16, 16]
13.171 0.238 RGBA16F [4, 2, 16]
11.148 0.239 RGBA16F [16, 2, 4]
11.528 0.251 RGBA16F [16, 4, 2]
11.326 0.317 RGBA16F [4, 16, 2]
12.289 0.198 RGBA16F [8, 2, 16]
11.269 0.252 RGBA16F [8, 16, 2]
11.247 0.209 RGBA16F [128, 4, 1]
10.885 0.220 RGBA16F [256, 4, 1]
11.721 0.277 RGBA16F [128, 1, 4]
11.773 0.295 RGBA16F [256, 1, 4]
11.439 0.161 RGBA16F [128, 1, 1]
11.609 0.160 RGBA16F [256, 1, 1]
11.391 0.189 RGBA16F [512, 1, 1]
21.437 0.120 RGBA32F [8, 8, 8]
24.676 0.177 RGBA32F [32, 32, 1]
52.367 0.278 RGBA32F [32, 1, 32]
21.062 0.258 RGBA32F [16, 16, 1]
26.020 0.295 RGBA32F [16, 1, 16]
24.131 0.223 RGBA32F [16, 16, 4]
30.333 0.228 RGBA32F [16, 4, 16]
25.640 0.194 RGBA32F [4, 16, 16]
23.129 0.212 RGBA32F [4, 2, 16]
21.113 0.317 RGBA32F [16, 2, 4]
21.084 0.315 RGBA32F [16, 4, 2]
20.929 0.174 RGBA32F [4, 16, 2]
23.290 0.240 RGBA32F [8, 2, 16]
21.013 0.232 RGBA32F [8, 16, 2]
23.660 0.171 RGBA32F [128, 4, 1]
23.203 0.111 RGBA32F [256, 4, 1]
26.930 0.112 RGBA32F [128, 1, 4]
35.940 0.146 RGBA32F [256, 1, 4]
25.518 0.135 RGBA32F [128, 1, 1]
29.479 0.154 RGBA32F [256, 1, 1]
35.506 0.163 RGBA32F [512, 1, 1]
#version 440 core
layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;
uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;
void main() {
ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
vec4 t1 = texelFetch(read_sampler, index+ivec3( 0, 0, -1), 0);
vec4 t2 = texelFetch(read_sampler, index+ivec3( 0, -1, 0), 0);
vec4 t3 = texelFetch(read_sampler, index+ivec3(-1, 0, 0), 0);
vec4 t4 = texelFetch(read_sampler, index+ivec3( 0, 0, 0), 0);
vec4 t5 = texelFetch(read_sampler, index+ivec3(+1, 0, 0), 0);
vec4 t6 = texelFetch(read_sampler, index+ivec3( 0, +1, 0), 0);
vec4 t7 = texelFetch(read_sampler, index+ivec3( 0, 0, +1), 0);
imageStore(write_image, index, t1+t2+t3+t4+t5+t6+t7);
}
6.435 0.188 R16F [8, 8, 8]
6.583 0.252 R16F [32, 32, 1]
6.533 0.170 R16F [32, 1, 32]
6.142 0.192 R16F [16, 16, 1]
6.290 0.171 R16F [16, 1, 16]
6.914 0.207 R16F [16, 16, 4]
6.407 0.205 R16F [16, 4, 16]
6.729 0.162 R16F [4, 16, 16]
6.175 0.178 R16F [4, 2, 16]
6.226 0.206 R16F [16, 2, 4]
6.296 0.299 R16F [16, 4, 2]
6.326 0.264 R16F [4, 16, 2]
6.174 0.212 R16F [8, 2, 16]
6.415 0.187 R16F [8, 16, 2]
6.340 0.195 R16F [128, 4, 1]
6.500 0.198 R16F [256, 4, 1]
6.361 0.198 R16F [128, 1, 4]
6.739 0.154 R16F [256, 1, 4]
6.094 0.193 R16F [128, 1, 1]
6.140 0.218 R16F [256, 1, 1]
6.365 0.189 R16F [512, 1, 1]
7.382 0.214 R32F [8, 8, 8]
10.918 0.187 R32F [32, 32, 1]
7.897 0.391 R32F [32, 1, 32]
10.409 0.283 R32F [16, 16, 1]
7.891 0.265 R32F [16, 1, 16]
8.387 0.191 R32F [16, 16, 4]
7.356 0.189 R32F [16, 4, 16]
8.501 0.238 R32F [4, 16, 16]
7.503 0.236 R32F [4, 2, 16]
7.399 0.265 R32F [16, 2, 4]
8.215 0.241 R32F [16, 4, 2]
8.559 0.251 R32F [4, 16, 2]
7.418 0.197 R32F [8, 2, 16]
8.491 0.218 R32F [8, 16, 2]
10.319 0.195 R32F [128, 4, 1]
10.725 0.229 R32F [256, 4, 1]
7.564 0.206 R32F [128, 1, 4]
8.122 0.178 R32F [256, 1, 4]
10.342 0.230 R32F [128, 1, 1]
10.356 0.254 R32F [256, 1, 1]
10.475 0.220 R32F [512, 1, 1]
7.367 0.214 RG16F [8, 8, 8]
10.905 0.269 RG16F [32, 32, 1]
7.739 0.215 RG16F [32, 1, 32]
10.310 0.193 RG16F [16, 16, 1]
7.867 0.238 RG16F [16, 1, 16]
8.405 0.215 RG16F [16, 16, 4]
7.397 0.235 RG16F [16, 4, 16]
8.487 0.187 RG16F [4, 16, 16]
7.474 0.170 RG16F [4, 2, 16]
7.388 0.198 RG16F [16, 2, 4]
8.197 0.194 RG16F [16, 4, 2]
8.526 0.174 RG16F [4, 16, 2]
7.450 0.210 RG16F [8, 2, 16]
8.502 0.203 RG16F [8, 16, 2]
10.348 0.199 RG16F [128, 4, 1]
10.749 0.170 RG16F [256, 4, 1]
7.605 0.174 RG16F [128, 1, 4]
8.149 0.193 RG16F [256, 1, 4]
10.344 0.183 RG16F [128, 1, 1]
10.361 0.188 RG16F [256, 1, 1]
10.471 0.173 RG16F [512, 1, 1]
11.884 0.185 RG32F [8, 8, 8]
20.197 0.195 RG32F [32, 32, 1]
14.985 0.129 RG32F [32, 1, 32]
20.009 0.443 RG32F [16, 16, 1]
12.605 0.324 RG32F [16, 1, 16]
13.655 0.187 RG32F [16, 16, 4]
11.118 0.233 RG32F [16, 4, 16]
13.773 0.191 RG32F [4, 16, 16]
11.213 0.185 RG32F [4, 2, 16]
12.893 0.254 RG32F [16, 2, 4]
15.266 0.273 RG32F [16, 4, 2]
15.211 0.223 RG32F [4, 16, 2]
11.204 0.232 RG32F [8, 2, 16]
15.443 0.313 RG32F [8, 16, 2]
20.023 0.152 RG32F [128, 4, 1]
20.105 0.214 RG32F [256, 4, 1]
12.929 0.219 RG32F [128, 1, 4]
13.282 0.164 RG32F [256, 1, 4]
20.029 0.136 RG32F [128, 1, 1]
20.044 0.189 RG32F [256, 1, 1]
20.220 0.261 RG32F [512, 1, 1]
12.052 0.239 RGBA16F [8, 8, 8]
20.296 0.274 RGBA16F [32, 32, 1]
14.948 0.169 RGBA16F [32, 1, 32]
20.002 0.099 RGBA16F [16, 16, 1]
12.571 0.194 RGBA16F [16, 1, 16]
13.628 0.148 RGBA16F [16, 16, 4]
11.086 0.212 RGBA16F [16, 4, 16]
13.747 0.157 RGBA16F [4, 16, 16]
11.219 0.203 RGBA16F [4, 2, 16]
12.843 0.181 RGBA16F [16, 2, 4]
15.167 0.175 RGBA16F [16, 4, 2]
15.283 0.233 RGBA16F [4, 16, 2]
11.423 0.259 RGBA16F [8, 2, 16]
15.398 0.253 RGBA16F [8, 16, 2]
20.011 0.119 RGBA16F [128, 4, 1]
20.186 0.208 RGBA16F [256, 4, 1]
13.105 0.268 RGBA16F [128, 1, 4]
13.279 0.151 RGBA16F [256, 1, 4]
20.110 0.245 RGBA16F [128, 1, 1]
20.139 0.151 RGBA16F [256, 1, 1]
20.232 0.258 RGBA16F [512, 1, 1]
26.664 0.247 RGBA32F [8, 8, 8]
40.048 0.298 RGBA32F [32, 32, 1]
58.822 0.215 RGBA32F [32, 1, 32]
39.535 0.238 RGBA32F [16, 16, 1]
26.678 0.220 RGBA32F [16, 1, 16]
26.197 0.112 RGBA32F [16, 16, 4]
31.270 0.111 RGBA32F [16, 4, 16]
26.354 0.242 RGBA32F [4, 16, 16]
27.780 0.219 RGBA32F [4, 2, 16]
25.236 0.207 RGBA32F [16, 2, 4]
29.670 0.099 RGBA32F [16, 4, 2]
29.739 0.130 RGBA32F [4, 16, 2]
27.502 0.156 RGBA32F [8, 2, 16]
29.772 0.228 RGBA32F [8, 16, 2]
39.432 0.086 RGBA32F [128, 4, 1]
39.470 0.136 RGBA32F [256, 4, 1]
26.273 0.087 RGBA32F [128, 1, 4]
32.123 0.093 RGBA32F [256, 1, 4]
40.648 0.215 RGBA32F [128, 1, 1]
40.724 0.213 RGBA32F [256, 1, 1]
41.236 0.318 RGBA32F [512, 1, 1]
#version 440 core
layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;
uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;
void main() {
ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
vec4 t1 = texelFetch(read_sampler, index+ivec3(-1, -1, -1), 0);
vec4 t2 = texelFetch(read_sampler, index+ivec3( 0, -1, -1), 0);
vec4 t3 = texelFetch(read_sampler, index+ivec3(+1, -1, -1), 0);
vec4 t4 = texelFetch(read_sampler, index+ivec3(-1, 0, -1), 0);
vec4 t5 = texelFetch(read_sampler, index+ivec3( 0, 0, -1), 0);
vec4 t6 = texelFetch(read_sampler, index+ivec3(+1, 0, -1), 0);
vec4 t7 = texelFetch(read_sampler, index+ivec3(-1, +1, -1), 0);
vec4 t8 = texelFetch(read_sampler, index+ivec3( 0, +1, -1), 0);
vec4 t9 = texelFetch(read_sampler, index+ivec3(+1, +1, -1), 0);
vec4 t10 = texelFetch(read_sampler, index+ivec3(-1, -1, 0), 0);
vec4 t11 = texelFetch(read_sampler, index+ivec3( 0, -1, 0), 0);
vec4 t12 = texelFetch(read_sampler, index+ivec3(+1, -1, 0), 0);
vec4 t13 = texelFetch(read_sampler, index+ivec3(-1, 0, 0), 0);
vec4 t14 = texelFetch(read_sampler, index+ivec3( 0, 0, 0), 0);
vec4 t15 = texelFetch(read_sampler, index+ivec3(+1, 0, 0), 0);
vec4 t16 = texelFetch(read_sampler, index+ivec3(-1, +1, 0), 0);
vec4 t17 = texelFetch(read_sampler, index+ivec3( 0, +1, 0), 0);
vec4 t18 = texelFetch(read_sampler, index+ivec3(+1, +1, 0), 0);
vec4 t19 = texelFetch(read_sampler, index+ivec3(-1, -1, +1), 0);
vec4 t20 = texelFetch(read_sampler, index+ivec3( 0, -1, +1), 0);
vec4 t21 = texelFetch(read_sampler, index+ivec3(+1, -1, +1), 0);
vec4 t22 = texelFetch(read_sampler, index+ivec3(-1, 0, +1), 0);
vec4 t23 = texelFetch(read_sampler, index+ivec3( 0, 0, +1), 0);
vec4 t24 = texelFetch(read_sampler, index+ivec3(+1, 0, +1), 0);
vec4 t25 = texelFetch(read_sampler, index+ivec3(-1, +1, +1), 0);
vec4 t26 = texelFetch(read_sampler, index+ivec3( 0, +1, +1), 0);
vec4 t27 = texelFetch(read_sampler, index+ivec3(+1, +1, +1), 0);
imageStore(write_image, index, t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15+t16+t17+t18+t19+t20+t21+t22+t23+t24+t25+t26+t27);
}
19.471 0.053 R16F [8, 8, 8]
19.214 0.085 R16F [32, 32, 1]
19.010 0.054 R16F [32, 1, 32]
18.250 0.056 R16F [16, 16, 1]
18.527 0.060 R16F [16, 1, 16]
19.610 0.180 R16F [16, 16, 4]
19.346 0.092 R16F [16, 4, 16]
19.547 0.095 R16F [4, 16, 16]
18.286 0.095 R16F [4, 2, 16]
18.202 0.043 R16F [16, 2, 4]
18.296 0.066 R16F [16, 4, 2]
18.397 0.040 R16F [4, 16, 2]
18.627 0.055 R16F [8, 2, 16]
18.861 0.256 R16F [8, 16, 2]
18.548 0.057 R16F [128, 4, 1]
19.192 0.094 R16F [256, 4, 1]
18.985 0.051 R16F [128, 1, 4]
19.231 0.042 R16F [256, 1, 4]
18.103 0.040 R16F [128, 1, 1]
18.236 0.060 R16F [256, 1, 1]
18.677 0.150 R16F [512, 1, 1]
20.099 0.075 R32F [8, 8, 8]
20.574 0.073 R32F [32, 32, 1]
19.790 0.055 R32F [32, 1, 32]
20.051 0.055 R32F [16, 16, 1]
19.224 0.219 R32F [16, 1, 16]
20.196 0.556 R32F [16, 16, 4]
19.587 0.070 R32F [16, 4, 16]
20.668 0.085 R32F [4, 16, 16]
19.444 0.356 R32F [4, 2, 16]
18.888 0.072 R32F [16, 2, 4]
18.934 0.052 R32F [16, 4, 2]
19.253 0.065 R32F [4, 16, 2]
19.262 0.051 R32F [8, 2, 16]
19.929 0.047 R32F [8, 16, 2]
20.520 0.051 R32F [128, 4, 1]
20.436 0.040 R32F [256, 4, 1]
19.771 0.056 R32F [128, 1, 4]
19.976 0.076 R32F [256, 1, 4]
19.350 0.073 R32F [128, 1, 1]
19.989 0.083 R32F [256, 1, 1]
20.525 0.085 R32F [512, 1, 1]
20.305 0.055 RG16F [8, 8, 8]
20.827 0.085 RG16F [32, 32, 1]
19.885 0.074 RG16F [32, 1, 32]
20.289 0.197 RG16F [16, 16, 1]
19.302 0.044 RG16F [16, 1, 16]
20.386 0.068 RG16F [16, 16, 4]
19.803 0.068 RG16F [16, 4, 16]
20.741 0.050 RG16F [4, 16, 16]
19.272 0.053 RG16F [4, 2, 16]
18.801 0.061 RG16F [16, 2, 4]
18.953 0.092 RG16F [16, 4, 2]
19.186 0.059 RG16F [4, 16, 2]
19.220 0.087 RG16F [8, 2, 16]
19.948 0.102 RG16F [8, 16, 2]
20.488 0.060 RG16F [128, 4, 1]
20.426 0.058 RG16F [256, 4, 1]
19.782 0.062 RG16F [128, 1, 4]
19.972 0.137 RG16F [256, 1, 4]
19.305 0.039 RG16F [128, 1, 1]
20.000 0.063 RG16F [256, 1, 1]
20.296 0.085 RG16F [512, 1, 1]
22.543 0.093 RG32F [8, 8, 8]
25.244 0.064 RG32F [32, 32, 1]
38.323 0.076 RG32F [32, 1, 32]
24.397 0.071 RG32F [16, 16, 1]
30.501 0.071 RG32F [16, 1, 16]
23.389 0.100 RG32F [16, 16, 4]
21.724 0.172 RG32F [16, 4, 16]
24.561 0.137 RG32F [4, 16, 16]
28.405 0.179 RG32F [4, 2, 16]
21.194 0.254 RG32F [16, 2, 4]
21.453 0.063 RG32F [16, 4, 2]
22.132 0.110 RG32F [4, 16, 2]
22.337 0.080 RG32F [8, 2, 16]
22.534 0.052 RG32F [8, 16, 2]
23.815 0.081 RG32F [128, 4, 1]
24.331 0.068 RG32F [256, 4, 1]
25.852 0.053 RG32F [128, 1, 4]
32.502 0.095 RG32F [256, 1, 4]
25.990 0.064 RG32F [128, 1, 1]
27.928 0.105 RG32F [256, 1, 1]
27.420 0.067 RG32F [512, 1, 1]
22.620 0.062 RGBA16F [8, 8, 8]
25.231 0.065 RGBA16F [32, 32, 1]
38.369 0.094 RGBA16F [32, 1, 32]
24.485 0.062 RGBA16F [16, 16, 1]
30.553 0.073 RGBA16F [16, 1, 16]
23.312 0.052 RGBA16F [16, 16, 4]
21.613 0.054 RGBA16F [16, 4, 16]
24.438 0.067 RGBA16F [4, 16, 16]
28.292 0.059 RGBA16F [4, 2, 16]
21.006 0.060 RGBA16F [16, 2, 4]
21.573 0.073 RGBA16F [16, 4, 2]
22.178 0.088 RGBA16F [4, 16, 2]
22.279 0.217 RGBA16F [8, 2, 16]
22.647 0.059 RGBA16F [8, 16, 2]
23.864 0.078 RGBA16F [128, 4, 1]
24.394 0.072 RGBA16F [256, 4, 1]
25.863 0.074 RGBA16F [128, 1, 4]
32.399 0.092 RGBA16F [256, 1, 4]
26.014 0.067 RGBA16F [128, 1, 1]
27.845 0.067 RGBA16F [256, 1, 1]
27.354 0.051 RGBA16F [512, 1, 1]
263.142 20.231 RGBA32F [8, 8, 8]
539.418 27.280 RGBA32F [32, 32, 1]
896.605 32.594 RGBA32F [32, 1, 32]
538.636 28.211 RGBA32F [16, 16, 1]
245.417 20.118 RGBA32F [16, 1, 16]
280.745 19.339 RGBA32F [16, 16, 4]
335.301 22.527 RGBA32F [16, 4, 16]
318.864 22.657 RGBA32F [4, 16, 16]
258.222 20.055 RGBA32F [4, 2, 16]
269.406 20.556 RGBA32F [16, 2, 4]
358.750 24.258 RGBA32F [16, 4, 2]
378.604 24.653 RGBA32F [4, 16, 2]
241.925 19.392 RGBA32F [8, 2, 16]
365.230 23.817 RGBA32F [8, 16, 2]
530.502 26.973 RGBA32F [128, 4, 1]
532.646 27.821 RGBA32F [256, 4, 1]
267.446 19.633 RGBA32F [128, 1, 4]
268.398 19.261 RGBA32F [256, 1, 4]
532.141 28.114 RGBA32F [128, 1, 1]
531.302 27.661 RGBA32F [256, 1, 1]
531.910 29.231 RGBA32F [512, 1, 1]