We present a new method for improving the performances of variational autoencoder (VAE). In addition to enforcing the deep feature consistent principle thus ensuring the VAE output and its corresponding input images to have similar deep features, we also implement a generative adversarial training mechanism to force the VAE to output realistic and natural images. We present experimental results to show that the VAE trained with our new method outperforms state of the art in generating face images with much clearer and more natural noses, eyes, teeth, hair textures as well as reasonable back- grounds. We also show that our method can learn powerful embeddings of input face images, which can be used to achieve facial attribute manipulation. Moreover we propose a multi-view feature extraction strategy to extract effective image representations, which can be used to achieve state of the art performance in facial attribute prediction.
Figure 1 The overview of our method
Figure 3. Face images generated from 100-dimension latent vector
Figure 4. Face images reconstructed by different models.
3.2 Facial attribute manipulation. we seek to find a way to control a specific attribute of face images. In this paper, we conduct experiments to manipulate the facial attributes in the learned latent space of VAE-WGAN. Figure 5 shows the results for the 6 attributes, i.e., Bald, Black hair, Eyeglass, Male, Smiling, and Mustache. We can see that by adding a smiling vector to the latent representation of a non-smiling man, we can observe the smooth transitions from non-smiling face to smiling face.
Figure 5. Vector arithmetic for visual attributes.