Extending text-driven 3D human motion generation with Diffusion Models using LLM paraphrasing

The project involved creating a text-driven 3D human motion generation model using diffusion models. We studied extensively the generalization capabilities of the model across different datasets, that have different textual description styles. We show that by paraphrasing the text descriptions, we can improve the generalization capabilities of the model. We also explore the impact of different augmentations on the model’s performance. Finally, inspired by the success of diffusion models in image generation, we explore the use of ConvUnets with attention mechanisms as the backbone of the model.