Skip to content

Instantly share code, notes, and snippets.

@vabh
Last active June 23, 2016 16:17
Show Gist options
  • Save vabh/6523c5aff1e3bdb822d2ac8266e7bfa5 to your computer and use it in GitHub Desktop.
Save vabh/6523c5aff1e3bdb822d2ac8266e7bfa5 to your computer and use it in GitHub Desktop.

Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. "Net2Net: Accelerating Learning via Knowledge Transfer." (2016). [http://arxiv.org/abs/1511.05641]

  • Knowledge transfer scheme, from one model to another. Knowledge transfer throug copying weights.

  • Aim is to reduce training time and give the 'student' model and good starting point to begin its learning process. Student model begins to learn, from where the teacher left off. [critical in real life when several models are stored. Espcially when 'searching' for the best model]

  • Significant for 'life-long' learning, changes in model data become available (more data for categories or addition of categories)

  • [Interesting remark that BatchNorm removes benefits of FitNets]

===

  • Mentioned need for a function preserving initialization to have the new model no worse than the teacher. Function preserving means that the function being computed in the teacher stays the same, just the parameters change.

  • Net2WiderNet Expand a layer by adding new units.

    • For incoming weights to this layer, sample a random weight from the existing weights. In case of going to n units from m, copy the first n weights, and randomly choose from these n, for the remaining n-m weights.
    • For weights going out of the newly exapnded layer, copy outgoing units from the existing n units. However, divide the weights of copied unit by the number of times it was replicated. Since incoming weights are additive, this normalization is required to make the total value remain the same.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment