Exploring Talking Head Models With Adjacent Frame Prior for Speech-Preserving Facial Expression Manipulation
Manipulation
Supplementary Material
Comparison results of NED [1] without and with our neighboring EAT [4] model:
Each video displays four columns: reference, source, NED, and NED with neighboring EAT, from left to right.
Angry
Disgusted
Fear
Happy
Neutral
Sad
Surprised
Comparison results of NED [1] without and with neighboring EAMM [5] model:
Each video displays four columns: reference, source, NED, and NED with neighboring EAMM, from left to right.
Angry
Disgusted
Fear
Happy
Neutral
Sad
Surprised
Comparison results of DSM [4] without and with neighboring EAT [4] model:
Each video displays four columns: reference, source, DSM, and DSM with neighboring EAT, from left to right.
Angry
Disgusted
Fear
Happy
Neutral
Sad
Surprised
Comparison results of DSM [4] without and with neighboring EAMM [5] model:
Each video displays four columns: reference, source, DSM, and DSM with neighboring EAMM, from left to right.
Angry
Disgusted
Fear
Happy
Neutral
Sad
Surprised
[1] Papantoniou, Foivos Paraperas, et al. "Neural Emotion Director: Speech-Preserving Semantic Control of Facial
Expressions in" In-the-Wild" Videos." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. 2022.
[2] Wang, Kaisiyuan, et al. "Mead: A large-scale audio-visual dataset for emotional talking-face generation."
European Conference on Computer Vision. Cham: Springer International Publishing, 2020.
[3] Solanki, Girish Kumar, and Anastasios Roussos. "Deep semantic manipulation of facial videos."
European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.
[4] Gan, Yuan, et al. "Efficient emotional adaptation for audio-driven talking-head generation."
Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
[5] Ji, Xinya, et al. "Eamm: One-shot emotional talking face via audio-based emotion-aware motion model."
ACM SIGGRAPH 2022 Conference Proceedings. 2022.