LipGAN: Speech to Lip Sync Generation

Proposal (PDF)

Motivation

With the increasing amount of multi-media content, there has been multiple research on synthesizing accurate, realistic talking face videos. This technique is broadly applicable to many scenarios such as realistic dubbing in the movie industry, conversational agents, virtual anchors, and gaming. There has become an increasingly large market for dubbing a foreign language onto videos; as the rate of video content creation increases, so does the need for accessibility for viewers across the globe. Providing a natural lip movement and facial expression generation improves the user's experience in these applications. Despite the recent advances and its wide applicability, synthesizing a clear, accurate and human-like performance is still a challenging task. We want to explore the state-of-the-art techniques and their limitations.

In this course project, we explore the problem of lip-syncing a talking face video to match the target speech segment to the lip and facial expression of the person in the video. The primary task is to achieve accurate audio-video synchronisation given a person's face and target audio clip. We can extend it to be speaker-independent model to produce lip-sync in the ''wild'', where videos feature faces that are dynamic and unconstrained.

State-of-the-art Model

Fig.1 LipGAN architecture. Paper Link

Fig.2 From left to right: (a) LipGAN output gif. (b) LipGAN Paper Video Demo

Time Table

Time	Work
Till Feb 24:	Come up with a project proposal document and create web page. Discussion about possible techniques and directions to be explored.
Feb 25 - Mar 07	Set up the running environment for the state-of-the-art methods. Start implementing an existing approach as a baseline.
Mar 08 - Mar 20	Have one working implementation of an existing approach (LipGAN).
Mar 21 - Mar 24	Write Mid-term report.
Mar 25 - Apr 11	Try experimenting with different loss formulations, multi-task framework. Explore alternate methods available in literature.
Apr 12 - Apr 23	Summarize result and ablation studies. Complete all experiments.
Apr 25 - May 5	Complete course project web page.

LipGAN: Speech to Lip Sync Generation

Proposal (PDF)

Team Members