##Sequence to Sequence -- Video to Text
Paper : ICCV 2015 PDF
Download Model: S2VT_VGG_RGB_MODEL (333MB)
This is the S2VT (RGB) model described in the ICCV 2015 paper "Sequence to Sequence -- Video to Text". It uses video frame features from the VGG-16 layer model. This is trained only on the Youtube video dataset.
Sequence to Sequence - Video to Text
S. Venugopalan, M. Rohrbach, J. Donahue, T. Darrell, R. Mooney, K. Saenko
The IEEE International Conference on Computer Vision (ICCV) 2015
Please consider citing the above paper if you use this model.
The METEOR score of this model is 29.2% on the Youtube (MSVD) video test dataset. (refer to Table 2 in the Sequence to Sequence - Video to Text paper).
The models are currently supported by the recurrent
branch of the Caffe fork
by Jeff Donahue and
Subhashini Venugopalan, but are not yet
compatible with master
branch of Caffe.
More details on the code and data can be found on this Project Page.
The prototxts for the network and solver can also be found here: https://github.com/vsubhashini/caffe/tree/recurrent/examples/s2vt