VAEs and GANs are among the few generative models, that have attracted a lot of attention in the past few years. A VAE is good at reconstruction but suffers from blurry images that are the immediate result of pixel-wise MSE in its cost function that imposes an implicit guassian prior. A GAN, on the other hand, replaces this pixel-wise similarity with representations that are learned by its discriminator which is an adversary for the generator. Because of the complex features learned by the discriminator, the generations are quite sharp, but since there is practically no way to know what the discriminator is learning, the generated images may be far from that in the real world. Generator only tries to match the representation of its generation (as seen by the discriminator) with that of the real images.
Keeping in view, the rather complementary features of VAEs and GANs, there have been several attempts at combining these two models to get the best of the both worlds. We propose one such technique.
We start with a VAE, and train it for a few epoches. We then discard the encoder, and replace the generator of the GAN with our trained decoder. Next we train this hybrid model.
We trained all our models on celebA dataset, with centrally cropped images of size 64x64.
VAE was trained for 25 epoches. Then the hybrid model described above was further trained for 25 epoches. Our VAE and GAN architectures closely follow that of the following works, respectively: