Yongkang Cheng

cyk1990422gmail.com

Hi! I'm currently a first-year PhD student in MLR Group, The MBZUAI. I'm working on Humanoid Interacting and Motion Generation, using multi-modal conditions such as speech, text scripts, keypoints and image. My works are priminaly focused on the avatars and humanoid robots. I received my M.E. from NWAFU, in 2024.6, and B.E. from NJAU, in 2021.6.

Prior to my PhD, I was a research scientist at Agibot X-Lab for Project LinkCraft, Astribot R&D Team for Robot Gesture Generation, and research intern Tencent AILab for motion generation.

Research interests

  • Interacting Humanoid Robot
  • Multi-Modal Generation for Motion
  • Robot Agent

Selected Publications

Header media
Aligning Foundation Model Priors and Diffusion-Based Hand Interactions for Occlusion-Resistant Two-Hand Reconstruction

Gaoge Han Yongkang Cheng Shaoli Huang† Zhe Chen Tongliang Liu

Computer Vision and Pattern Recognition, CVPR, 2026

Header media
ReBaR: Reference-Based Reasoning for Robust Pose Estimation from Monocular Images

Yongkang Cheng Mingjiang Liang Jifeng Ning Gaoge Han WeiLiu Shaoli Huang†

Pattern Recognition, 2025

Header media
HoloGest: Decoupled Diffusion and Motion Priors for Generating Holisticly Expressive Co-speech Gestures

Yongkang Cheng Shaoli Huang†

International Conference on 3D Vision, 3DV, 2025

Header media
DIDiffGes: Decoupled Semi-Implicit Diffusion Models for Real-time Gesture Generation from Speech

Yongkang Cheng Shaoli Huang† Xuelin Chen Jifeng Ning Mingming Gong

The Association for the Advancement of Artificial Intelligence, AAAI, 2025

Header media
Conditional GAN for Enhancing Diffusion Models in Efficient and Authentic Global Gesture Generation from Audios

Yongkang Cheng Shaoli Huang† Jifeng Ning Gaoge Han WeiLiu

Winter Conference on Applications of Computer Vision, WACV, 2025

Header media
RopeTP: Global Human Motion Recovery via Integrating Robust Pose Estimation with Diffusion Trajectory Prior

Mingjiang Liang* Yongkang Cheng* Hualin Liang Shaoli Huang† WeiLiu

Winter Conference on Applications of Computer Vision, WACV, 2024

Header media
SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark

Zhengdi Yu Shaoli Huang Yongkang Cheng Tolga Birdal

European Conference on Computer Vision, ECCV, 2024

Header media
ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance

Yongkang Cheng Mingjiang Liang* Shaoli Huang† WeiLiu Jifeng Ning

International Conference on Multimedia and Expo, ICME, 2024

News and Other Works

  • [Feb. 2026] 1 CCF-A paper is accepted to CVPR 2026.
  • [Oct. 2025] 🏆🏆🏆 LinkCraft 🏆🏆🏆 is coming.
  • [May. 2025] 2 JCR Q1 papers are accepted to Pattern Recognition 2025.
  • [Apr. 2025] 1 CCF-B paper is accepted to ICMR 2025. (Project Leader)
  • [Dec. 2024] 1 CCF-A paper is accepted to AAAI 2025.
  • [Nov. 2024] 1 CCF-C paper is accepted to 3DV 2025.
  • [Aug. 2024] 3 papers are accepted to WACV 2025.
  • [Jul. 2024] 1 CCF-C paper is accepted to MMAsia 2024 (🏆oral🏆).
  • [Jul. 2024] 1 CCF-B paper is accepted to ECCV 2024.
  • [Mar. 2024] 1 CCF-B paper is accepted to ICME 2024.
  • [Jan. 2024] 1 CCF-B paper is accepted to ICASSP 2024.
  • [Dec. 2023] 1 JCR Q1 paper is accepted to TCSVT 2023.
  • [Feb. 2021] 1 paper is accepted to EI 2020.

Experience

Academic Services

  • Reviewer: CVPR, ECCV, ACMMM, WACV, ICME; IJCV, PR