Created
March 24, 2016 20:41
-
-
Save gokul-uf/e738f0ade9be279300a8 to your computer and use it in GitHub Desktop.
Profiler Output without local_subtensor_merge on GoogleNet using Lasagne
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/home/gokul/Theano/theano/compile/pfunc.py:479: UserWarning: config.profile_optimizer requires config.profile to be set to True as well | |
output_keys=output_keys) | |
Using gpu device 0: GeForce GT 750M (CNMeM is disabled, CuDNN 4004) | |
/home/gokul/Theano/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module. | |
"downsample module has been moved to the theano.tensor.signal.pool module.") | |
Function profiling | |
================== | |
Message: lasagne_googlenet.py:135 | |
Time in 0 calls to Function.__call__: 0.000000e+00s | |
Total compile time: 3.531861e+02s | |
Number of Apply nodes: 1788 | |
Theano Optimizer time: 4.029457e+01s | |
Theano validate time: 4.504317e+00s | |
Theano Linker time (includes C, CUDA code generation/compiling): 3.111209e+02s | |
Import time 2.162204e-01s | |
Time in all call to theano.grad() 7.573941e-01s | |
Time since theano import 371.493s | |
Optimizer Profile | |
----------------- | |
SeqOptimizer time 40.294s for 6324/1788 nodes before/after optimization | |
10.777s for callback | |
4.504s for fgraph.validate() | |
time - (name, class, index) - validate time | |
10.399793s - ('canonicalize', 'EquilibriumOptimizer', 4) - 0.066s | |
EquilibriumOptimizer canonicalize | |
time 10.399s for 5 passes | |
nb nodes (start, end, max) 4784 3345 4784 | |
time io_toposort 0.381s | |
time in local optimizers 4.540s | |
time in global optimizers 0.000s | |
time in final optimizers 4.288s | |
time in cleanup optimizers 0.921s | |
0 - 6.933s 2291 (4.081s in global opts, 0.044s io_toposort) - 4784 nodes - ('MergeOptimizer', 822) ('local_dimshuffle_lift', 364) ('local_upcast_elemwise_constant_inputs', 319) ('local_add_canonizer', 272) ('local_cut_gpu_host_gpu', 163) ... | |
1 - 1.621s 1214 (0.091s in global opts, 0.045s io_toposort) - 4558 nodes - ('local_subtensor_make_vector', 520) ('MergeOptimizer', 249) ('local_add_canonizer', 138) ('local_mul_canonizer', 137) ('local_intdiv_by_one', 130) ... | |
2 - 0.590s 296 (0.041s in global opts, 0.056s io_toposort) - 3728 nodes - ('local_subtensor_make_vector', 194) ('local_add_canonizer', 48) ('MergeOptimizer', 45) ('local_cut_gpu_host_gpu', 9) | |
3 - 0.732s 241 (0.040s in global opts, 0.037s io_toposort) - 3385 nodes - ('local_add_canonizer', 124) ('MergeOptimizer', 117) | |
4 - 0.524s 0 (0.037s in global opts, 0.199s io_toposort) - 3345 nodes - | |
times - times applied - nb node created - name: | |
4.288s - 2 - 0 - topo_constant_folding | |
1.157s - 582 - 882 - local_add_canonizer | |
0.921s - 1233 - 5 - MergeOptimizer | |
0.733s - 254 - 499 - local_mul_canonizer | |
0.431s - 319 - 957 - local_upcast_elemwise_constant_inputs | |
0.418s - 364 - 674 - local_dimshuffle_lift | |
0.261s - 39 - 0 - local_useless_switch | |
0.242s - 720 - 97 - local_subtensor_make_vector | |
0.190s - 172 - 0 - local_cut_gpu_host_gpu | |
0.176s - 137 - 812 - local_shape_to_shape_i | |
0.120s - 1 - 6 - local_greedy_distributor | |
0.093s - 130 - 0 - local_intdiv_by_one | |
0.082s - 5 - 7 - local_fill_sink | |
0.079s - 58 - 0 - local_useless_elemwise | |
0.075s - 12 - 12 - local_useless_slice | |
0.004s - 2 - 4 - local_subtensor_lift | |
0.003s - 4 - 6 - local_neg_to_mul | |
0.003s - 8 - 0 - local_useless_fill | |
0.473s - in 63 optimization that where not used (display only those with a runtime > 0) | |
0.092s - local_mul_zero | |
0.092s - local_one_minus_erf2 | |
0.061s - local_func_inv | |
0.036s - local_track_shape_i | |
0.035s - local_one_minus_erf | |
0.024s - local_useless_elemwise_comparison | |
0.021s - local_mul_switch_sink | |
0.020s - local_IncSubtensor_serialize | |
0.019s - local_expm1 | |
0.019s - local_fill_cut | |
0.017s - local_useless_subtensor | |
0.016s - local_cast_cast | |
0.004s - local_div_switch_sink | |
0.002s - local_abs_lift | |
0.002s - local_subtensor_of_alloc | |
0.002s - local_subtensor_of_dot | |
0.002s - local_sum_prod_all_to_none | |
0.002s - local_sum_prod_div_dimshuffle | |
0.001s - local_incsubtensor_of_zeros | |
0.001s - local_useless_inc_subtensor | |
0.001s - local_lift_transpose_through_dot | |
0.001s - local_cut_useless_reduce | |
0.001s - local_op_of_op | |
0.001s - local_reduce_join | |
0.000s - local_join_empty | |
0.000s - local_dimshuffle_no_inplace_at_canonicalize | |
0.000s - local_pow_canonicalize | |
0.000s - local_0_dot_x | |
0.000s - local_setsubtensor_of_constants | |
0.000s - local_useless_inc_subtensor_alloc | |
0.000s - local_join_make_vector | |
0.000s - local_scalar_tensor_scalar | |
0.000s - local_join_1 | |
0.000s - local_useless_split | |
0.000s - f | |
0.000s - local_useless_alloc | |
0.000s - local_merge_alloc | |
0.000s - local_useless_reshape | |
0.000s - local_reshape_lift | |
0.000s - local_tensor_scalar_tensor | |
Global, final and clean up optimizers | |
Iter 0 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (4607, 4558, 49) | |
init io_toposort 0.19864988327 | |
loop time 3.88232302666 | |
callback_time 0.321128368378 | |
MergeOptimizer | |
nb fail= 0 merged= 1610 constant= 726 | |
time replace=0.60 validate=0.02 callback=0.44 | |
Iter 1 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3733, 3728, 5) | |
init io_toposort 0.0340428352356 | |
loop time 0.0554389953613 | |
callback_time 0.0345137119293 | |
MergeOptimizer | |
nb fail= 0 merged= 344 constant= 228 | |
time replace=0.22 validate=0.00 callback=0.17 | |
Iter 2 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3385, 3385, 0) | |
init io_toposort 0.0363101959229 | |
loop time 0.00421690940857 | |
callback_time 0.0 | |
MergeOptimizer | |
nb fail= 0 merged= 45 constant= 45 | |
time replace=0.02 validate=0.00 callback=0.01 | |
Iter 3 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3345, 3345, 0) | |
init io_toposort 0.0359070301056 | |
loop time 0.00407600402832 | |
callback_time 0.0 | |
MergeOptimizer | |
nb fail= 0 merged= 224 constant= 114 | |
time replace=0.08 validate=0.00 callback=0.06 | |
Iter 4 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3345, 3345, 0) | |
init io_toposort 0.0328199863434 | |
loop time 0.00390100479126 | |
callback_time 0.0 | |
MergeOptimizer | |
nb fail= 0 merged= 0 constant= 0 | |
time replace=0.00 validate=0.00 callback=0.00 | |
9.525620s - ('gpu_opt', 'SeqOptimizer', 14) - 0.048s | |
SeqOptimizer gpu_opt time 9.526s for 3341/2586 nodes before/after optimization | |
2.406s for callback | |
0.048s for fgraph.validate() | |
9.083087s - ('gpu_local_optimizations', 'EquilibriumOptimizer', 1) - 0.046s | |
EquilibriumOptimizer gpu_local_optimizations | |
time 9.083s for 7 passes | |
nb nodes (start, end, max) 3343 2962 3343 | |
time io_toposort 0.411s | |
time in local optimizers 8.191s | |
time in global optimizers 0.000s | |
time in final optimizers 0.303s | |
time in cleanup optimizers 0.000s | |
0 - 6.006s 1322 (0.192s in global opts, 0.034s io_toposort) - 3343 nodes - ('constant_folding', 462) ('local_gpu_elemwise_1', 444) ('local_gpu_elemwise_0', 279) ('local_gpu_dimshuffle_0', 59) ('local_dnn_convw_alpha_merge', 57) ... | |
1 - 0.762s 270 (0.008s in global opts, 0.221s io_toposort) - 3175 nodes - ('local_gpu_elemwise_1', 156) ('local_gpu_elemwise_0', 97) ('local_gpu_split', 6) ('constant_folding', 3) ('local_pool_dnn_grad_stride', 3) ... | |
2 - 1.468s 408 (0.086s in global opts, 0.029s io_toposort) - 2988 nodes - ('constant_folding', 163) ('local_gpu_elemwise_0', 104) ('local_gpu_elemwise_1', 63) ('local_gpu_careduce', 60) ('local_gpu_incsubtensor', 5) ... | |
3 - 0.477s 108 (0.013s in global opts, 0.039s io_toposort) - 3079 nodes - ('local_gpu_elemwise_1', 63) ('constant_folding', 21) ('local_gpu_join', 9) ('local_gpu_subtensor', 5) ('local_gpu_incsubtensor', 4) ... | |
4 - 0.127s 19 (0.002s in global opts, 0.027s io_toposort) - 2953 nodes - ('local_gpu_incsubtensor', 6) ('constant_folding', 3) ('local_pool_dnn_alternative', 3) ('local_gpu_contiguous_gpu_contiguous', 3) ('local_gpualloc', 2) ... | |
5 - 0.143s 21 (0.002s in global opts, 0.028s io_toposort) - 2962 nodes - ('constant_folding', 6) ('local_gpu_subtensor', 6) ('local_gpu_elemwise_1', 6) ('local_gpualloc_memset_0', 2) ('MergeOptimizer', 1) | |
6 - 0.099s 0 (0.000s in global opts, 0.032s io_toposort) - 2962 nodes - | |
times - times applied - nb node created - name: | |
5.108s - 658 - 0 - constant_folding | |
1.199s - 733 - 1698 - local_gpu_elemwise_1 | |
0.842s - 484 - 1428 - local_gpu_elemwise_0 | |
0.303s - 6 - 0 - MergeOptimizer | |
0.170s - 57 - 456 - local_dnn_convw_alpha_merge | |
0.146s - 60 - 160 - local_gpu_careduce | |
0.126s - 9 - 46 - local_gpu_split | |
0.072s - 64 - 70 - local_gpu_dimshuffle_0 | |
0.060s - 11 - 18 - local_gpu_subtensor | |
0.042s - 18 - 36 - local_dnn_convi_output_merge | |
0.041s - 9 - 18 - local_gpu_join | |
0.023s - 15 - 37 - local_gpu_incsubtensor | |
0.020s - 2 - 4 - local_gpu_reshape | |
0.011s - 3 - 12 - local_pool_dnn_alternative | |
0.007s - 3 - 21 - local_pool_dnn_grad_stride | |
0.007s - 2 - 6 - local_gpu_dot22scalar | |
0.006s - 2 - 6 - local_gpu_dot22 | |
0.005s - 4 - 12 - local_gpualloc | |
0.003s - 3 - 0 - local_gpu_contiguous_gpu_contiguous | |
0.002s - 1 - 5 - local_gpu_crossentorpy_softmax_argmax_1hot_with_bias | |
0.002s - 3 - 3 - local_gpualloc_memset_0 | |
0.002s - 1 - 5 - local_gpu_crossentorpy_softmax_1hot_with_bias_dx | |
0.297s - in 51 optimization that where not used (display only those with a runtime > 0) | |
0.039s - local_elemwise_alloc | |
0.037s - local_log_softmax_dnn | |
0.036s - local_track_shape_i | |
0.027s - local_dnn_conv_output_merge | |
0.026s - local_dnn_conv_alpha_merge | |
0.020s - local_useless_elemwise | |
0.017s - local_dnn_convw_output_merge | |
0.017s - local_dnn_convi_alpha_merge | |
0.010s - gpu_sparse_block_gemv_opt | |
0.006s - gpu_sparse_block_outer_opt | |
0.005s - local_gpu_dot_to_dot22 | |
0.005s - local_gpu_ger | |
0.005s - local_gpu_gemv | |
0.004s - local_gpu_batched_dot | |
0.004s - local_gpu_conv | |
0.004s - local_gpu_lazy_ifelse | |
0.004s - local_gpu_gemm | |
0.004s - gpuScanOptimization | |
0.004s - local_gpu_specifyShape_0 | |
0.004s - local_gpu_solve | |
0.004s - local_gpu_eye | |
0.003s - local_conv2d_gpu_conv | |
0.003s - local_gpu_flatten | |
0.003s - local_gpu_advanced_incsubtensor1 | |
0.003s - local_gpu_advanced_subtensor1 | |
0.003s - local_gpu_allocempty | |
0.001s - local_gpu_elemwise_careduce | |
0.000s - local_subtensor_make_vector | |
0.000s - f | |
0.000s - local_gpujoin_1 | |
0.000s - local_gpu_downsample_factor_max | |
0.000s - local_gpu_downsample_factor_max_grad | |
Global, final and clean up optimizers | |
Iter 0 | |
MergeOptimizer | |
nb fail= 0 merged= 854 constant= 473 | |
time replace=0.19 validate=0.01 callback=0.15 | |
Iter 1 | |
MergeOptimizer | |
nb fail= 0 merged= 80 constant= 9 | |
time replace=0.01 validate=0.00 callback=0.00 | |
Iter 2 | |
MergeOptimizer | |
nb fail= 0 merged= 202 constant= 158 | |
time replace=0.09 validate=0.00 callback=0.07 | |
Iter 3 | |
MergeOptimizer | |
nb fail= 0 merged= 42 constant= 28 | |
time replace=0.01 validate=0.00 callback=0.01 | |
Iter 4 | |
MergeOptimizer | |
nb fail= 0 merged= 12 constant= 11 | |
time replace=0.00 validate=0.00 callback=0.00 | |
Iter 5 | |
MergeOptimizer | |
nb fail= 0 merged= 6 constant= 6 | |
time replace=0.00 validate=0.00 callback=0.00 | |
Iter 6 | |
MergeOptimizer | |
nb fail= 0 merged= 0 constant= 0 | |
time replace=0.00 validate=0.00 callback=0.00 | |
0.441609s - ('gpu_cut_transfers', 'EquilibriumOptimizer', 2) - 0.003s | |
EquilibriumOptimizer gpu_cut_transfers | |
time 0.442s for 2 passes | |
nb nodes (start, end, max) 2962 2586 2962 | |
time io_toposort 0.254s | |
time in local optimizers 0.170s | |
time in global optimizers 0.000s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.406s 189 (0.000s in global opts, 0.230s io_toposort) - 2962 nodes - ('local_cut_gpu_host_gpu', 189) | |
1 - 0.035s 0 (0.000s in global opts, 0.024s io_toposort) - 2586 nodes - | |
times - times applied - nb node created - name: | |
0.161s - 189 - 0 - local_cut_gpu_host_gpu | |
0.009s - in 1 optimization that where not used (display only those with a runtime > 0) | |
0.009s - constant_folding | |
0.000908s - ('InputToGpuOptimizer', 'InputToGpuOptimizer', 0) - 0.000s | |
9.196867s - ('inplace_elemwise_optimizer', 'FromFunctionOptimizer', 38) - 1.416s | |
3.167935s - ('local_dnn_conv_inplace', 'TopoOptimizer', 33) - 2.829s | |
TopoOptimizer local_dnn_conv_inplace | |
nb_node (start, end, changed) (1760, 1788, 170) | |
init io_toposort 0.0148651599884 | |
loop time 3.15292692184 | |
callback_time 2.94574427605 | |
1.487668s - ('gpu_elemwise_fusion', 'FusionOptimizer', 18) - 0.004s | |
FusionOptimizer | |
nb_iter 3 | |
nb_replacement 352 | |
nb_inconsistency_replace 0 | |
validate_time 0.00405836105347 | |
callback_time 0.229281663895 | |
time_toposort 0.0550458431244 | |
1.233950s - ('specialize', 'EquilibriumOptimizer', 11) - 0.000s | |
EquilibriumOptimizer specialize | |
time 1.234s for 3 passes | |
nb nodes (start, end, max) 3349 3341 3349 | |
time io_toposort 0.109s | |
time in local optimizers 0.413s | |
time in global optimizers 0.165s | |
time in final optimizers 0.476s | |
time in cleanup optimizers 0.000s | |
0 - 0.615s 7 (0.424s in global opts, 0.031s io_toposort) - 3349 nodes - ('local_mul_to_sqr', 2) ('local_div_to_inv', 1) ('local_softmax_grad_to_crossentropy_with_softmax_grad', 1) ('local_softmax_with_bias', 1) ('topo_constant_folding', 1) ... | |
1 - 0.335s 2 (0.134s in global opts, 0.039s io_toposort) - 3342 nodes - ('crossentropy_to_crossentropy_with_softmax_with_bias', 1) ('local_useless_crossentropy_softmax_1hot_with_bias_dx_alloc', 1) | |
2 - 0.284s 0 (0.083s in global opts, 0.039s io_toposort) - 3341 nodes - | |
times - times applied - nb node created - name: | |
0.476s - 1 - 0 - topo_constant_folding | |
0.165s - 1 - 1 - crossentropy_to_crossentropy_with_softmax_with_bias | |
0.007s - 2 - 2 - local_mul_to_sqr | |
0.001s - 1 - 1 - local_div_to_inv | |
0.001s - 1 - 1 - local_useless_crossentropy_softmax_1hot_with_bias_dx_alloc | |
0.001s - 1 - 1 - local_softmax_grad_to_crossentropy_with_softmax_grad | |
0.001s - 1 - 1 - local_softmax_with_bias | |
0.001s - 1 - 0 - local_subtensor_make_vector | |
0.401s - in 62 optimization that where not used (display only those with a runtime > 0) | |
0.102s - local_add_specialize | |
0.047s - local_mul_specialize | |
0.045s - local_one_minus_erf2 | |
0.031s - local_elemwise_alloc | |
0.028s - local_useless_elemwise | |
0.027s - local_func_inv | |
0.017s - local_one_minus_erf | |
0.015s - local_track_shape_i | |
0.015s - local_abs_merge | |
0.012s - local_mul_switch_sink | |
0.011s - local_useless_elemwise_comparison | |
0.008s - local_expm1 | |
0.008s - local_elemwise_sub_zeros | |
0.007s - local_logsoftmax | |
0.007s - local_cast_cast | |
0.007s - local_useless_switch | |
0.006s - local_alloc_unary | |
0.001s - local_useless_subtensor | |
0.001s - local_sum_prod_mul_by_scalar | |
0.001s - local_pow_specialize | |
0.001s - local_reduce_broadcastable | |
0.001s - local_sum_prod_div_dimshuffle | |
0.001s - local_useless_inc_subtensor | |
0.001s - local_useless_slice | |
0.001s - local_dimshuffle_lift | |
0.000s - local_opt_alloc | |
0.000s - local_join_empty | |
0.000s - local_grad_log_erfc_neg | |
0.000s - local_useless_alloc | |
0.000s - local_join_make_vector | |
0.000s - local_useless_inc_subtensor_alloc | |
0.000s - local_subtensor_of_alloc | |
0.000s - local_scalar_tensor_scalar | |
0.000s - local_subtensor_of_dot | |
0.000s - local_useless_split | |
0.000s - local_join_1 | |
0.000s - local_merge_alloc | |
0.000s - local_logsoftmax_grad | |
Global, final and clean up optimizers | |
Iter 0 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3344, 3343, 1) | |
init io_toposort 0.0301430225372 | |
loop time 0.355483055115 | |
callback_time 0.000345945358276 | |
Iter 1 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3341, 3341, 0) | |
init io_toposort 0.0428228378296 | |
loop time 0.00455713272095 | |
callback_time 0.0 | |
Iter 2 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3341, 3341, 0) | |
init io_toposort 0.0389051437378 | |
loop time 0.00414204597473 | |
callback_time 0.0 | |
0.615536s - ('merge1', 'MergeOptimizer', 0) - 0.029s | |
MergeOptimizer | |
nb fail= 0 merged= 3298 constant= 1703 | |
time replace=0.38 validate=0.03 callback=0.19 | |
0.532832s - ('elemwise_fusion', 'SeqOptimizer', 17) - 0.001s | |
SeqOptimizer elemwise_fusion time 0.533s for 2586/2420 nodes before/after optimization | |
0.137s for callback | |
0.001s for fgraph.validate() | |
0.417839s - ('composite_elemwise_fusion', 'FusionOptimizer', 1) - 0.001s | |
FusionOptimizer | |
nb_iter 2 | |
nb_replacement 107 | |
nb_inconsistency_replace 0 | |
validate_time 0.00120878219604 | |
callback_time 0.110847473145 | |
time_toposort 0.0463998317719 | |
0.114981s - ('local_add_mul_fusion', 'FusionOptimizer', 0) - 0.000s | |
FusionOptimizer | |
nb_iter 3 | |
nb_replacement 18 | |
nb_inconsistency_replace 0 | |
validate_time 0.000219821929932 | |
callback_time 0.0261061191559 | |
time_toposort 0.0736041069031 | |
0.475244s - ('scan_eqopt2', 'EquilibriumOptimizer', 9) - 0.000s | |
EquilibriumOptimizer scan_eqopt2 | |
time 0.475s for 1 passes | |
nb nodes (start, end, max) 3349 3349 3349 | |
time io_toposort 0.037s | |
time in local optimizers 0.000s | |
time in global optimizers 0.433s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.475s 0 (0.433s in global opts, 0.037s io_toposort) - 3349 nodes - | |
Global, final and clean up optimizers | |
Iter 0 | |
TopoOptimizer constant_folding_for_scan2 | |
nb_node (start, end, changed) (3349, 3349, 0) | |
init io_toposort 0.0356850624084 | |
loop time 0.00393390655518 | |
callback_time 0.0 | |
TopoOptimizer scanOp_remove_constants_and_unused_inputs1 | |
nb_node (start, end, changed) (3349, 3349, 0) | |
init io_toposort 0.188410043716 | |
loop time 0.00373697280884 | |
callback_time 0.0 | |
TopoOptimizer scanop_remove_constants_and_unused_inputs2 | |
nb_node (start, end, changed) (3349, 3349, 0) | |
init io_toposort 0.0367538928986 | |
loop time 0.00363206863403 | |
callback_time 0.0 | |
TopoOptimizer scanOp_merge_inouts | |
nb_node (start, end, changed) (3349, 3349, 0) | |
init io_toposort 0.0370938777924 | |
loop time 0.00395393371582 | |
callback_time 0.0 | |
TopoOptimizer scanOp_remove_constants_and_unused_inputs3 | |
nb_node (start, end, changed) (3349, 3349, 0) | |
init io_toposort 0.0391750335693 | |
loop time 0.00397610664368 | |
callback_time 0.0 | |
0.466038s - ('BlasOpt', 'SeqOptimizer', 10) - 0.000s | |
SeqOptimizer BlasOpt time 0.466s for 3349/3349 nodes before/after optimization | |
0.001s for callback | |
0.000s for fgraph.validate() | |
0.205265s - ('use_c_blas', 'TopoOptimizer', 4) - 0.000s | |
TopoOptimizer use_c_blas | |
nb_node (start, end, changed) (3349, 3349, 0) | |
init io_toposort 0.192625999451 | |
loop time 0.0125570297241 | |
callback_time 0.0 | |
0.093847s - ('gemm_optimizer', 'GemmOptimizer', 1) - 0.000s | |
GemmOptimizer | |
nb_iter 1 | |
nb_replacement 0 | |
nb_replacement_didn_t_remove 0 | |
nb_inconsistency_make 0 | |
nb_inconsistency_replace 0 | |
time_canonicalize 0.0312497615814 | |
time_factor_can 0 | |
time_factor_list 0 | |
time_toposort 0.0369219779968 | |
validate_time 0.0 | |
callback_time 0.0 | |
0.043994s - ('local_dot_to_dot22', 'TopoOptimizer', 0) - 0.000s | |
TopoOptimizer local_dot_to_dot22 | |
nb_node (start, end, changed) (3349, 3349, 3) | |
init io_toposort 0.0389251708984 | |
loop time 0.0050060749054 | |
callback_time 0.000382900238037 | |
0.043500s - ('use_scipy_ger', 'TopoOptimizer', 5) - 0.000s | |
TopoOptimizer scipy_blas | |
nb_node (start, end, changed) (3349, 3349, 0) | |
init io_toposort 0.0391268730164 | |
loop time 0.00428605079651 | |
callback_time 0.0 | |
0.040933s - ('local_dot22_to_dot22scalar', 'TopoOptimizer', 2) - 0.000s | |
TopoOptimizer local_dot22_to_dot22scalar | |
nb_node (start, end, changed) (3349, 3349, 1) | |
init io_toposort 0.0308468341827 | |
loop time 0.010027885437 | |
callback_time 0.000476837158203 | |
0.038460s - ('local_gemm_to_gemv', 'EquilibriumOptimizer', 3) - 0.000s | |
EquilibriumOptimizer local_gemm_to_gemv | |
time 0.038s for 1 passes | |
nb nodes (start, end, max) 3349 3349 3349 | |
time io_toposort 0.032s | |
time in local optimizers 0.000s | |
time in global optimizers 0.000s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.038s 0 (0.000s in global opts, 0.032s io_toposort) - 3349 nodes - | |
0.418858s - ('scan_eqopt1', 'EquilibriumOptimizer', 2) - 0.000s | |
EquilibriumOptimizer scan_eqopt1 | |
time 0.419s for 1 passes | |
nb nodes (start, end, max) 4784 4784 4784 | |
time io_toposort 0.043s | |
time in local optimizers 0.000s | |
time in global optimizers 0.368s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.419s 0 (0.368s in global opts, 0.043s io_toposort) - 4784 nodes - | |
Global, final and clean up optimizers | |
Iter 0 | |
SeqOptimizer all_pushout_opt time 0.368s for 4784/4784 nodes before/after optimization | |
0.000s for callback | |
0.000s for fgraph.validate() | |
0.179664s - ('remove_constants_and_unused_inputs_scan', 'TopoOptimizer', 0) - 0.000s | |
TopoOptimizer scanOp_remove_constants_and_unused_inputs0 | |
nb_node (start, end, changed) (4784, 4784, 0) | |
init io_toposort 0.174652814865 | |
loop time 0.00494480133057 | |
callback_time 0.0 | |
0.050260s - ('scanOp_pushout_nonseqs_ops', 'PushOutNonSeqScan', 1) - 0.000s | |
0.047774s - ('scanOp_pushout_seqs_ops', 'PushOutSeqScan', 2) - 0.000s | |
0.045382s - ('scanOp_pushout_output', 'PushOutScanOutput', 4) - 0.000s | |
0.045186s - ('scan_pushout_dot1', 'PushOutDot1', 3) - 0.000s | |
0.387094s - ('ShapeOpt', 'ShapeOptimizer', 1) - 0.000s | |
0.368549s - ('stabilize', 'EquilibriumOptimizer', 6) - 0.000s | |
EquilibriumOptimizer stabilize | |
time 0.368s for 2 passes | |
nb nodes (start, end, max) 3345 3349 3349 | |
time io_toposort 0.065s | |
time in local optimizers 0.120s | |
time in global optimizers 0.079s | |
time in final optimizers 0.083s | |
time in cleanup optimizers 0.000s | |
0 - 0.190s 4 (0.078s in global opts, 0.035s io_toposort) - 3345 nodes - ('local_fill_to_alloc', 4) | |
1 - 0.178s 0 (0.083s in global opts, 0.030s io_toposort) - 3349 nodes - | |
times - times applied - nb node created - name: | |
0.004s - 4 - 8 - local_fill_to_alloc | |
0.278s - in 39 optimization that where not used (display only those with a runtime > 0) | |
0.083s - topo_constant_folding | |
0.079s - crossentropy_to_crossentropy_with_softmax_with_bias | |
0.040s - local_greedy_distributor | |
0.027s - local_one_minus_erf2 | |
0.025s - local_sigm_times_exp | |
0.010s - local_one_minus_erf | |
0.007s - local_useless_elemwise_comparison | |
0.007s - local_expm1 | |
0.001s - local_incsubtensor_of_zeros | |
0.000s - local_exp_over_1_plus_exp | |
0.000s - local_grad_log_erfc_neg | |
0.000s - local_setsubtensor_of_constants | |
0.000s - local_0_dot_x | |
0.000s - local_subtensor_of_dot | |
0.000s - local_useless_inc_subtensor_alloc | |
0.000s - local_useless_alloc | |
0.000s - local_merge_alloc | |
0.000s - local_useless_reshape | |
0.000s - local_reshape_lift | |
Global, final and clean up optimizers | |
Iter 0 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3349, 3349, 0) | |
init io_toposort 0.0384531021118 | |
loop time 0.00705313682556 | |
callback_time 0.0 | |
Iter 1 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3349, 3349, 0) | |
init io_toposort 0.0329899787903 | |
loop time 0.00385785102844 | |
callback_time 0.0 | |
0.342618s - ('scanOp_make_inplace', 'ScanInplaceOptimizer', 42) - 0.000s | |
0.341554s - ('add_destroy_handler', 'AddDestroyHandler', 21) - 0.000s | |
0.206516s - ('crossentropy_to_crossentropy_with_softmax', 'FromFunctionOptimizer', 12) - 0.000s | |
0.198069s - ('local_IncSubtensor_serialize', 'TopoOptimizer', 3) - 0.000s | |
TopoOptimizer pre_local_IncSubtensor_serialize | |
nb_node (start, end, changed) (4784, 4784, 20) | |
init io_toposort 0.167459011078 | |
loop time 0.0305531024933 | |
callback_time 0.00435876846313 | |
0.143042s - ('local_inplace_setsubtensor', 'TopoOptimizer', 26) - 0.110s | |
TopoOptimizer local_inplace_setsubtensor | |
nb_node (start, end, changed) (1760, 1760, 13) | |
init io_toposort 0.0189619064331 | |
loop time 0.123964071274 | |
callback_time 0.114995002747 | |
0.107311s - ('gpu_after_fusion', 'SeqOptimizer', 20) - 0.000s | |
SeqOptimizer gpu_after_fusion time 0.107s for 1760/1760 nodes before/after optimization | |
0.001s for callback | |
0.000s for fgraph.validate() | |
0.056971s - ('gpu_local_optimizations', 'EquilibriumOptimizer', 1) - 0.000s | |
EquilibriumOptimizer gpu_local_optimizations | |
time 0.057s for 1 passes | |
nb nodes (start, end, max) 1762 1762 1762 | |
time io_toposort 0.018s | |
time in local optimizers 0.028s | |
time in global optimizers 0.000s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.057s 0 (0.000s in global opts, 0.018s io_toposort) - 1762 nodes - | |
Global, final and clean up optimizers | |
Iter 0 | |
MergeOptimizer | |
nb fail= 0 merged= 0 constant= 0 | |
time replace=0.00 validate=0.00 callback=0.00 | |
0.049493s - ('gpu_cut_transfers', 'EquilibriumOptimizer', 2) - 0.000s | |
EquilibriumOptimizer gpu_cut_transfers | |
time 0.049s for 2 passes | |
nb nodes (start, end, max) 1762 1760 1762 | |
time io_toposort 0.032s | |
time in local optimizers 0.006s | |
time in global optimizers 0.000s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.026s 2 (0.000s in global opts, 0.017s io_toposort) - 1762 nodes - ('local_cut_gpu_host_gpu', 2) | |
1 - 0.023s 0 (0.000s in global opts, 0.015s io_toposort) - 1760 nodes - | |
times - times applied - nb node created - name: | |
0.001s - 2 - 0 - local_cut_gpu_host_gpu | |
0.005s - in 1 optimization that where not used (display only those with a runtime > 0) | |
0.005s - constant_folding | |
0.000835s - ('InputToGpuOptimizer', 'InputToGpuOptimizer', 0) - 0.000s | |
0.062530s - ('inplace_elemwise_optimizer', 'FromFunctionOptimizer', 41) - 0.000s | |
0.047725s - ('local_elemwise_alloc', 'TopoOptimizer', 8) - 0.000s | |
TopoOptimizer local_elemwise_alloc | |
nb_node (start, end, changed) (3349, 3349, 0) | |
init io_toposort 0.0376000404358 | |
loop time 0.010046005249 | |
callback_time 0.0 | |
0.044432s - ('gpu_scanOp_make_inplace', 'ScanInplaceOptimizer', 39) - 0.000s | |
0.041611s - ('uncanonicalize', 'EquilibriumOptimizer', 13) - 0.000s | |
EquilibriumOptimizer uncanonicalize | |
time 0.041s for 1 passes | |
nb nodes (start, end, max) 3341 3341 3341 | |
time io_toposort 0.036s | |
time in local optimizers 0.000s | |
time in global optimizers 0.000s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.041s 0 (0.000s in global opts, 0.036s io_toposort) - 3341 nodes - | |
0.038787s - ('local_fill_to_alloc', 'TopoOptimizer', 7) - 0.000s | |
TopoOptimizer local_fill_to_alloc | |
nb_node (start, end, changed) (3349, 3349, 0) | |
init io_toposort 0.0316600799561 | |
loop time 0.00705099105835 | |
callback_time 0.0 | |
0.034755s - ('specialize_device', 'EquilibriumOptimizer', 15) - 0.000s | |
EquilibriumOptimizer specialize_device | |
time 0.035s for 1 passes | |
nb nodes (start, end, max) 2586 2586 2586 | |
time io_toposort 0.023s | |
time in local optimizers 0.006s | |
time in global optimizers 0.000s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.035s 0 (0.000s in global opts, 0.023s io_toposort) - 2586 nodes - | |
0.028404s - ('gpua_elemwise_fusion', 'FusionOptimizer', 37) - 0.000s | |
FusionOptimizer | |
nb_iter 1 | |
nb_replacement 0 | |
nb_inconsistency_replace 0 | |
validate_time 0.0 | |
callback_time 0.0 | |
time_toposort 0.0268249511719 | |
0.026123s - ('AbstractConvCheck', 'TopoOptimizer', 16) - 0.000s | |
TopoOptimizer AbstractConvCheck | |
nb_node (start, end, changed) (2586, 2586, 0) | |
init io_toposort 0.0230870246887 | |
loop time 0.00297999382019 | |
callback_time 0.0 | |
0.023765s - ('local_dnna_conv_inplace', 'TopoOptimizer', 34) - 0.000s | |
TopoOptimizer local_dnna_conv_inplace | |
nb_node (start, end, changed) (1788, 1788, 0) | |
init io_toposort 0.0204739570618 | |
loop time 0.00311684608459 | |
callback_time 0.0 | |
0.023228s - ('InplaceGpuBlasOpt', 'TopoOptimizer', 30) - 0.000s | |
TopoOptimizer InplaceGpuBlasOpt | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0176610946655 | |
loop time 0.00538897514343 | |
callback_time 0.0 | |
0.022926s - ('blas_opt_inplace', 'TopoOptimizer', 29) - 0.000s | |
TopoOptimizer InplaceBlasOpt | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0177640914917 | |
loop time 0.00498390197754 | |
callback_time 0.0 | |
0.020625s - ('local_inplace_sparse_block_outer', 'TopoOptimizer', 28) - 0.000s | |
TopoOptimizer local_inplace_sparse_block_outer | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0186450481415 | |
loop time 0.00185012817383 | |
callback_time 0.0 | |
0.020140s - ('local_gemm16_inplace', 'TopoOptimizer', 35) - 0.000s | |
TopoOptimizer local_gemm16_inplace | |
nb_node (start, end, changed) (1788, 1788, 0) | |
init io_toposort 0.0182089805603 | |
loop time 0.00174903869629 | |
callback_time 0.0 | |
0.019994s - ('cond_make_inplace', 'TopoOptimizer', 43) - 0.000s | |
TopoOptimizer cond_make_inplace | |
nb_node (start, end, changed) (1788, 1788, 0) | |
init io_toposort 0.0180940628052 | |
loop time 0.00182604789734 | |
callback_time 0.0 | |
0.019595s - ('dimshuffle_as_view', 'TopoOptimizer', 22) - 0.000s | |
TopoOptimizer dimshuffle_as_view | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0177409648895 | |
loop time 0.00179815292358 | |
callback_time 0.0 | |
0.019544s - ('local_destructive', 'TopoOptimizer', 44) - 0.000s | |
TopoOptimizer CURAND_destructive | |
nb_node (start, end, changed) (1788, 1788, 0) | |
init io_toposort 0.0177500247955 | |
loop time 0.00173497200012 | |
callback_time 0.0 | |
0.019496s - ('local_inplace_gpu_sparse_block_outer', 'TopoOptimizer', 24) - 0.000s | |
TopoOptimizer local_inplace_gpu_sparse_block_outer | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0176432132721 | |
loop time 0.00175499916077 | |
callback_time 0.0 | |
0.019217s - ('make_ger_destructive', 'TopoOptimizer', 36) - 0.000s | |
TopoOptimizer make_scipy_blas_destructive | |
nb_node (start, end, changed) (1788, 1788, 0) | |
init io_toposort 0.0170040130615 | |
loop time 0.00216507911682 | |
callback_time 0.0 | |
0.019198s - ('gpuablas_opt_inplace', 'TopoOptimizer', 31) - 0.000s | |
TopoOptimizer InplaceGpuaBlasOpt | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0162858963013 | |
loop time 0.00274205207825 | |
callback_time 0.0 | |
0.019189s - ('random_make_inplace', 'TopoOptimizer', 45) - 0.000s | |
TopoOptimizer random_make_inplace | |
nb_node (start, end, changed) (1788, 1788, 0) | |
init io_toposort 0.017333984375 | |
loop time 0.00179004669189 | |
callback_time 0.0 | |
0.018404s - ('local_inplace_sparse_block_gemv', 'TopoOptimizer', 27) - 0.000s | |
TopoOptimizer local_inplace_sparse_block_gemv | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0163660049438 | |
loop time 0.00191783905029 | |
callback_time 0.0 | |
0.018152s - ('inplace_elemwise_optimizer', 'FromFunctionOptimizer', 40) - 0.000s | |
0.017685s - ('mrg_random_make_inplace', 'TopoOptimizer', 46) - 0.000s | |
TopoOptimizer random_make_inplace_mrg | |
nb_node (start, end, changed) (1788, 1788, 0) | |
init io_toposort 0.0156719684601 | |
loop time 0.00196599960327 | |
callback_time 0.0 | |
0.017610s - ('local_inplace_gpu_sparse_block_gemv', 'TopoOptimizer', 23) - 0.000s | |
TopoOptimizer local_inplace_gpu_sparse_block_gemv | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0160031318665 | |
loop time 0.00151109695435 | |
callback_time 0.0 | |
0.017429s - ('c_blas_destructive', 'TopoOptimizer', 32) - 0.000s | |
TopoOptimizer c_blas_destructive | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0148959159851 | |
loop time 0.00249600410461 | |
callback_time 0.0 | |
0.017334s - ('local_inplace_incsubtensor1', 'TopoOptimizer', 25) - 0.000s | |
TopoOptimizer local_inplace_incsubtensor1 | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0155780315399 | |
loop time 0.00165390968323 | |
callback_time 0.0 | |
0.000752s - ('merge3', 'MergeOptimizer', 47) - 0.000s | |
MergeOptimizer | |
nb fail= 0 merged= 0 constant= 0 | |
time replace=0.00 validate=0.00 callback=0.00 | |
0.000578s - ('merge2', 'MergeOptimizer', 19) - 0.000s | |
MergeOptimizer | |
nb fail= 0 merged= 3 constant= 0 | |
time replace=0.00 validate=0.00 callback=0.00 | |
0.000054s - ('merge1.2', 'MergeOptimizer', 5) - 0.000s | |
MergeOptimizer | |
nb fail= 0 merged= 0 constant= 0 | |
time replace=0.00 validate=0.00 callback=0.00 | |
Here are tips to potentially make your code run faster | |
(if you think of new ones, suggest them on the mailing list). | |
Test them first, as they are not guaranteed to always provide a speedup. | |
Sorry, no tip for today. | |
compiling |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment