CG-HOI: Contact-Guided 3D Human-Object Interaction Generation (CVPR'24)

1Technical University of Munich

Abstract

We propose CG-HOI, the first method to address the task of generating dynamic 3D human-object interactions (HOIs) from text.
We model the motion of both human and object in an interdependent fashion, as semantically rich human motion rarely happens in isolation without any interactions.
Our key insight is that explicitly modeling contact between the human body surface and object geometry can be used as strong proxy guidance, both during training and inference. Using this guidance to bridge human and object motion enables generating more realistic and physically plausible interaction sequences, where the human body and corresponding object move in a coherent manner.
Our method first learns to model human motion, object motion, and contact in a joint diffusion process, inter-correlated through cross-attention.
We then leverage this learned contact for guidance during inference synthesis of realistic, coherent HOIs. Extensive evaluation shows that our joint contact-based human-object interaction approach generates realistic and physically plausible sequences, and we show two applications highlighting the capabilities of our method.
Conditioned on a given object trajectory, we can generate the corresponding human motion without re-training, demonstrating strong human-object interdependency learning. Our approach is also flexible, and can be applied to static real-world 3D scene scans.


Teaser

Teaser. We present an approach to generate realistic 3D human-object interactions (HOIs), from a text description and given static object geometry to be interacted with (left). Our main insight is to explicitly model contact (visualized as colors on the body mesh, closer contact in red), in tandem with human and object sequences, in a joint diffusion process. In addition to synthesizing HOIs from text, we can also synthesize human motions conditioned on given object trajectories (top right), and generate interactions in static scene scans (bottom right).

Results

Results. Our method produces diverse and realistic 3D human-object interaction sequences, given object geometry and short text description of the action. The sequences depict high-quality human-object interactions by modeling contact, mitigating floating and penetration artifacts.

Video

You can download a high-quality version of this video here.

Paper

Paper

Bibtex

If you find this work useful for your research, please consider citing:

@article{diller2023cghoi,
    title={CG-HOI: Contact-Guided 3D Human-Object Interaction Generation},
    author={Diller, Christian and Dai, Angela},
    booktitle={Proc. Computer Vision and Pattern Recognition (CVPR), IEEE},
    year={2024}
}