Real-world objects exhibit complex motion composed of multiple independent motion components. For example, while speaking a person simultaneously changes facial expressions, head orientation, and body pose.
In this work, we propose a method for motion disentanglement using a pretrained image GAN. Our approach discovers semantically meaningful motion subspaces in the latent space of widely used style-based GAN architectures. Each discovered subspace controls a single interpretable motion component.
The proposed method requires only a small number of ground-truth video sequences to identify these motion directions. We evaluate the disentanglement properties of the learned motion subspaces on face and car datasets, both quantitatively and qualitatively.
Finally, we demonstrate several downstream applications enabled by the discovered motion subspaces, including: