Imagine360: Immersive 360 Video Generation from Perspective Anchor

Jing Tan1*, Shuai Yang2,4*, Tong Wu1✉, Jingwen He1, Yuwei Guo1, Ziwei Liu3, Dahua Lin1,4

1 The Chinese University of Hong Kong, 2 Shanghai Jiao Tong University, 3 S-Lab, Nanyang Technological University 4 Shanghai AI Laboratory,

Imagine360: Immersive 360 Video
Generation from Perspective Anchor

Jing Tan1,4*, Shuai Yang2,4*, Tong Wu1,4✉, Jingwen He1, Yuwei Guo1, Ziwei Liu3,4, Dahua Lin1,4

1 The Chinese University of Hong Kong, 2 Shanghai Jiao Tong University, 3 S-Lab, Nanyang Technological University 4 Shanghai AI Laboratory,

We generate Hyper-Immersive Panoramic Video

from a single perspective anchor

Abstract

360° videos offer a hyper-immersive experience that allows the viewers to explore a dynamic scene from full 360 degrees. To achieve more user-friendly and personalized content creation in 360° video format, we seek to lift standard perspective videos into 360° equirectangular videos. To this end, we introduce Imagine360, the first perspective-to-360° video generation framework that creates high-quality 360° videos with rich and diverse motion patterns from video-based control. Imagine360 learns fine-grained spherical visual and motion patterns from limited 360° video data with several key designs. 1) Firstly we adopt the dual-branch design, including a perspective and a panorama video denoising branch to provide local and global constraints for 360° video generation, with motion module and spatial LoRA layers fine-tuned on extended web 360° videos. 2)Additionally, an antipodal mask is devised to capture long-range motion dependencies, enhancing the reversed camera motion between antipodal pixels across hemispheres. 3) To handle general perspective video inputs, we propose elevation-aware designs that adapt to varying video masking due to changing elevations across frames. Extensive experiments show Imagine360 achieves superior graphics quality and motion coherence among state-of-the-art 360° video generation methods. We believe Imagine360 holds promise for advancing personalized, immersive 360° video creation.


Abstract

360° videos offer a hyper-immersive experience that allows the viewers to explore a dynamic scene from full 360 degrees. To achieve more user-friendly and personalized content creation in 360° video format, we seek to lift standard perspective videos into 360° equirectangular videos. To this end, we introduce Imagine360, the first perspective-to-360° video generation framework that creates high-quality 360° videos with rich and diverse motion patterns from video-based control. Imagine360 learns fine-grained spherical visual and motion patterns from limited 360° video data with several key designs. 1) Firstly we adopt the dual-branch design, including a perspective and a panorama video denoising branch to provide local and global constraints for 360° video generation, with motion module and spatial LoRA layers fine-tuned on extended web 360° videos. 2)Additionally, an antipodal mask is devised to capture long-range motion dependencies, enhancing the reversed camera motion between antipodal pixels across hemispheres. 3) To handle general perspective video inputs, we propose elevation-aware designs that adapt to varying video masking due to changing elevations across frames. Extensive experiments show Imagine360 achieves superior graphics quality and motion coherence among state-of-the-art 360° video generation methods. We believe Imagine360 holds promise for advancing personalized, immersive 360° video creation.


Gallery (click below pano videos to open VR mode)

We highly recommend using a mobile phone to access the website(better use Chrome browser) for device motion tracking, enhancing the immersive quality of the VR interactive experience.

NOTE: The Loading may be a little slow, but your wait will be worth it !!!

PANO VIDEO INPUT VIDEO

a city street covered in snow with cherry blossom trees lining the sidewalks.

a city street covered in snow with cherry blossom trees lining the sidewalks a group of three animals walking down a dirt road a river with rough waters and rapids, with a rocky mountain in the background a white SUV driving down a dirt road in a forested mountain valley a serene pond filled with colorful koi fish Stonehenge in Wiltshire, England, under the green Northern Lights a rocky coastline with a lighthouse standing on a cliff a cute golden retriever wearing sunglasses and running outside in the rain a large glass display case filled with various electronic devices a lighthouse on a cliff overlooking the ocean a man standing on a hill overlooking a stunning mountain lake a red gondola lift in a mountainous region, with a beautiful sunset in the background a red car driving down a country road with a grassy field on the side a robot standing in a desolate, futuristic city a serene snowy landscape with a still lake surrounded by snow-covered trees and mountains in the background

Method Overview

Explanation Overview

Pipeline of Imagine360. Given perspective anchor video guidance, Imagine360 leverages a dual-branch video noising structure, with parallelled panorama and perspective branches to denoise 360° videos with plausible panoramic patterns. Additionally, we devise the cross-domain spherical attention to capture long-range dependencies for reversed antipodal motion. Finally, we introduce elevation-aware designs to handle general video inputs of changing elevations.

Additional Results(Panorama video and Multi-view Projections)

Citation


                      @article{tan2024imagine360,
                        title={Imagine360: Immersive 360 Video Generation from Perspective Anchor},
                        author={Tan, Jing and Yang, Shuai and Wu, Tong and He, Jingwen and Guo, Yuwei and Liu, Ziwei and Lin, Dahua},
                        journal={arXiv preprint arXiv:2412.03552},
                        year={2024}
                      }