Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. "Net2Net: Accelerating Learning via Knowledge Transfer." (2016). [http://arxiv.org/abs/1511.05641]
-
Knowledge transfer scheme, from one model to another. Knowledge transfer throug copying weights.
-
Aim is to reduce training time and give the 'student' model and good starting point to begin its learning process. Student model begins to learn, from where the teacher left off. [critical in real life when several models are stored. Espcially when 'searching' for the best model]
-
Significant for 'life-long' learning, changes in model data become available (more data for categories or addition of categories)
-
[Interesting remark that BatchNorm removes benefits of FitNets]
===
-
Mentioned need for a function preserving initialization to have the new model no worse than the teacher. Function preserving means that the function being computed in the teacher stays the same, just the parameters change.
-
Net2WiderNet
Expand a layer by adding new units.- For incoming weights to this layer, sample a random weight from the existing weights. In case of going to
n
units fromm
, copy the firstn
weights, and randomly choose from thesen
, for the remainingn-m
weights. - For weights going out of the newly exapnded layer, copy outgoing units from the existing
n
units. However, divide the weights of copied unit by the number of times it was replicated. Since incoming weights are additive, this normalization is required to make the total value remain the same.
- For incoming weights to this layer, sample a random weight from the existing weights. In case of going to