360-degree panoramic videos recently attract more interest in both studies and applications, courtesy of the heightened immersive experiences they engender. Due to the expensive cost of capturing 360-degree panoramic videos, generating desirable panoramic videos by given prompts is urgently required. Recently, the emerging text-to-video (T2V) diffusion methods demonstrate notable effectiveness in standard video generation. However, due to the significant gap in content and motion patterns between panoramic and standard videos, these methods encounter challenges in yielding satisfactory 360-degree panoramic videos. In this paper, we propose a controllable panorama video generation pipeline named 360-Degree Video Diffusion model (360DVD) for generating panoramic videos based on the given prompts and motion conditions. Concretely, we introduce a lightweight module dubbed 360-Adapter and assisted 360 Enhancement Techniques to transform pre-trained T2V models for 360-degree video generation. We further propose a new panorama dataset named WEB360 consisting of 360-degree video-text pairs for training 360DVD, addressing the absence of captioned panoramic video datasets. Extensive experiments demonstrate the superiority and effectiveness of 360DVD for panorama video generation.
Overview of 360DVD. 360DVD leverages a trainable 360-Adapter to extend standard T2V models to the panorama domain and is able to generate high-quality panorama videos with given prompts and optional motion conditions. In addition, 360 Enhancement Techniques are proposed for quality improvement in the panorama perspective.
@article{wang2024360dvd,
title={360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model},
author={Qian Wang and Weiqi Li and Chong Mou and Xinhua Cheng and Jian Zhang},
journal={arXiv preprint arXiv:2401.06578},
year={2024}
}