Created
January 1, 2017 06:22
-
-
Save rachtsingh/f742a32c58b92fc4371595e3022b90dd to your computer and use it in GitHub Desktop.
Batch norm with no memory optimization on the nn.BatchNormalization modules
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading data from 'data/small-train.t7'... | |
* vocabulary size: source = 50004; target = 50004 | |
* additional features: source = 0; target = 0 | |
* maximum sequence length: source = 50; target = 51 | |
* number of training sentences: 100000 | |
* maximum batch size: 64 | |
Building model... | |
* using input feeding | |
Initializing parameters... | |
* number of parameters: 84834004 | |
Preparing memory optimization... | |
* sharing 58% of output/gradInput tensors memory between clones | |
Start training... | |
Epoch 1 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 568 ; Perplexity 32510.26 | |
Epoch 1 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 696 ; Perplexity 17496.41 | |
Epoch 1 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 767 ; Perplexity 11462.83 | |
Epoch 1 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 807 ; Perplexity 8655.52 | |
Epoch 1 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 824 ; Perplexity 6919.21 | |
Epoch 1 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 832 ; Perplexity 5663.48 | |
Epoch 1 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 848 ; Perplexity 4643.32 | |
Epoch 1 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 860 ; Perplexity 3965.22 | |
Epoch 1 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 870 ; Perplexity 3407.11 | |
Epoch 1 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 879 ; Perplexity 2967.13 | |
Epoch 1 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 891 ; Perplexity 2591.89 | |
Epoch 1 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 894 ; Perplexity 2351.04 | |
Epoch 1 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 896 ; Perplexity 2141.00 | |
Epoch 1 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 897 ; Perplexity 1960.14 | |
Epoch 1 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 900 ; Perplexity 1807.57 | |
Epoch 1 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 903 ; Perplexity 1673.53 | |
Epoch 1 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 907 ; Perplexity 1548.82 | |
Epoch 1 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 908 ; Perplexity 1451.75 | |
Epoch 1 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 910 ; Perplexity 1359.06 | |
Epoch 1 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 912 ; Perplexity 1277.61 | |
Epoch 1 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 913 ; Perplexity 1206.43 | |
Epoch 1 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 914 ; Perplexity 1142.65 | |
Epoch 1 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 915 ; Perplexity 1083.88 | |
Epoch 1 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 914 ; Perplexity 1031.65 | |
Epoch 1 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 917 ; Perplexity 982.63 | |
Epoch 1 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 918 ; Perplexity 940.26 | |
Epoch 1 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 919 ; Perplexity 900.46 | |
Epoch 1 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 920 ; Perplexity 863.49 | |
Epoch 1 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 922 ; Perplexity 827.18 | |
Epoch 1 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 922 ; Perplexity 796.47 | |
Epoch 1 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 922 ; Perplexity 768.12 | |
Validation perplexity: 180.53028700487 | |
Saving checkpoint to 'models/batch_clones_2_epoch1_180.53.t7'... | |
Epoch 2 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 967 ; Perplexity 214.64 | |
Epoch 2 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 965 ; Perplexity 213.82 | |
Epoch 2 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 207.01 | |
Epoch 2 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 959 ; Perplexity 202.22 | |
Epoch 2 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 957 ; Perplexity 197.79 | |
Epoch 2 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 194.84 | |
Epoch 2 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 952 ; Perplexity 191.98 | |
Epoch 2 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 955 ; Perplexity 188.35 | |
Epoch 2 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 185.09 | |
Epoch 2 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 181.83 | |
Epoch 2 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 178.44 | |
Epoch 2 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 176.28 | |
Epoch 2 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 173.65 | |
Epoch 2 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 170.90 | |
Epoch 2 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 168.55 | |
Epoch 2 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 165.95 | |
Epoch 2 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 163.67 | |
Epoch 2 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 161.13 | |
Epoch 2 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 159.46 | |
Epoch 2 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 157.83 | |
Epoch 2 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 944 ; Perplexity 156.19 | |
Epoch 2 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 941 ; Perplexity 154.66 | |
Epoch 2 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 942 ; Perplexity 153.04 | |
Epoch 2 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 943 ; Perplexity 151.18 | |
Epoch 2 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 942 ; Perplexity 149.61 | |
Epoch 2 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 944 ; Perplexity 148.25 | |
Epoch 2 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 146.80 | |
Epoch 2 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 944 ; Perplexity 145.17 | |
Epoch 2 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 944 ; Perplexity 143.73 | |
Epoch 2 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 142.27 | |
Epoch 2 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 140.86 | |
Validation perplexity: 83.079661384295 | |
Saving checkpoint to 'models/batch_clones_2_epoch2_83.08.t7'... | |
Epoch 3 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 932 ; Perplexity 87.34 | |
Epoch 3 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 940 ; Perplexity 87.64 | |
Epoch 3 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 933 ; Perplexity 86.48 | |
Epoch 3 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 933 ; Perplexity 86.08 | |
Epoch 3 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 933 ; Perplexity 85.23 | |
Epoch 3 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 931 ; Perplexity 84.49 | |
Epoch 3 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 932 ; Perplexity 83.60 | |
Epoch 3 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 936 ; Perplexity 83.37 | |
Epoch 3 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 938 ; Perplexity 82.81 | |
Epoch 3 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 944 ; Perplexity 82.62 | |
Epoch 3 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 82.44 | |
Epoch 3 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 82.01 | |
Epoch 3 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 952 ; Perplexity 81.81 | |
Epoch 3 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 951 ; Perplexity 81.63 | |
Epoch 3 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 81.26 | |
Epoch 3 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 81.09 | |
Epoch 3 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 80.58 | |
Epoch 3 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 80.23 | |
Epoch 3 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 80.03 | |
Epoch 3 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 79.71 | |
Epoch 3 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 79.40 | |
Epoch 3 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 79.09 | |
Epoch 3 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 78.87 | |
Epoch 3 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 78.54 | |
Epoch 3 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 78.18 | |
Epoch 3 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 77.78 | |
Epoch 3 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 77.49 | |
Epoch 3 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 77.10 | |
Epoch 3 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 76.77 | |
Epoch 3 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 76.44 | |
Epoch 3 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 76.13 | |
Validation perplexity: 56.82965026429 | |
Saving checkpoint to 'models/batch_clones_2_epoch3_56.83.t7'... | |
Epoch 4 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 985 ; Perplexity 51.95 | |
Epoch 4 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 978 ; Perplexity 53.01 | |
Epoch 4 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 966 ; Perplexity 53.63 | |
Epoch 4 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 965 ; Perplexity 54.28 | |
Epoch 4 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 957 ; Perplexity 54.04 | |
Epoch 4 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 955 ; Perplexity 54.30 | |
Epoch 4 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 53.95 | |
Epoch 4 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 53.93 | |
Epoch 4 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 53.90 | |
Epoch 4 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 53.65 | |
Epoch 4 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 53.75 | |
Epoch 4 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 951 ; Perplexity 53.78 | |
Epoch 4 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 951 ; Perplexity 53.89 | |
Epoch 4 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 53.81 | |
Epoch 4 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 53.86 | |
Epoch 4 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 53.95 | |
Epoch 4 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 53.69 | |
Epoch 4 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 53.72 | |
Epoch 4 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 53.67 | |
Epoch 4 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 53.49 | |
Epoch 4 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 53.41 | |
Epoch 4 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 53.31 | |
Epoch 4 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 53.29 | |
Epoch 4 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 53.26 | |
Epoch 4 ; Iteration 1250/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 53.23 | |
Epoch 4 ; Iteration 1300/1587 ; Learning rate 1.0000 ; Source tokens/s 945 ; Perplexity 53.06 | |
Epoch 4 ; Iteration 1350/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 53.04 | |
Epoch 4 ; Iteration 1400/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 53.00 | |
Epoch 4 ; Iteration 1450/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 52.89 | |
Epoch 4 ; Iteration 1500/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 52.87 | |
Epoch 4 ; Iteration 1550/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 52.75 | |
Validation perplexity: 45.867913380411 | |
Saving checkpoint to 'models/batch_clones_2_epoch4_45.87.t7'... | |
Epoch 5 ; Iteration 50/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 39.07 | |
Epoch 5 ; Iteration 100/1587 ; Learning rate 1.0000 ; Source tokens/s 962 ; Perplexity 38.63 | |
Epoch 5 ; Iteration 150/1587 ; Learning rate 1.0000 ; Source tokens/s 956 ; Perplexity 39.19 | |
Epoch 5 ; Iteration 200/1587 ; Learning rate 1.0000 ; Source tokens/s 955 ; Perplexity 38.71 | |
Epoch 5 ; Iteration 250/1587 ; Learning rate 1.0000 ; Source tokens/s 952 ; Perplexity 39.69 | |
Epoch 5 ; Iteration 300/1587 ; Learning rate 1.0000 ; Source tokens/s 956 ; Perplexity 39.56 | |
Epoch 5 ; Iteration 350/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 39.42 | |
Epoch 5 ; Iteration 400/1587 ; Learning rate 1.0000 ; Source tokens/s 957 ; Perplexity 39.84 | |
Epoch 5 ; Iteration 450/1587 ; Learning rate 1.0000 ; Source tokens/s 958 ; Perplexity 40.14 | |
Epoch 5 ; Iteration 500/1587 ; Learning rate 1.0000 ; Source tokens/s 956 ; Perplexity 40.27 | |
Epoch 5 ; Iteration 550/1587 ; Learning rate 1.0000 ; Source tokens/s 953 ; Perplexity 40.00 | |
Epoch 5 ; Iteration 600/1587 ; Learning rate 1.0000 ; Source tokens/s 953 ; Perplexity 40.17 | |
Epoch 5 ; Iteration 650/1587 ; Learning rate 1.0000 ; Source tokens/s 950 ; Perplexity 40.12 | |
Epoch 5 ; Iteration 700/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 40.01 | |
Epoch 5 ; Iteration 750/1587 ; Learning rate 1.0000 ; Source tokens/s 946 ; Perplexity 40.18 | |
Epoch 5 ; Iteration 800/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 40.23 | |
Epoch 5 ; Iteration 850/1587 ; Learning rate 1.0000 ; Source tokens/s 951 ; Perplexity 40.28 | |
Epoch 5 ; Iteration 900/1587 ; Learning rate 1.0000 ; Source tokens/s 949 ; Perplexity 40.29 | |
Epoch 5 ; Iteration 950/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 40.27 | |
Epoch 5 ; Iteration 1000/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 40.27 | |
Epoch 5 ; Iteration 1050/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 40.20 | |
Epoch 5 ; Iteration 1100/1587 ; Learning rate 1.0000 ; Source tokens/s 948 ; Perplexity 40.20 | |
Epoch 5 ; Iteration 1150/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 40.11 | |
Epoch 5 ; Iteration 1200/1587 ; Learning rate 1.0000 ; Source tokens/s 947 ; Perplexity 40.12 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment