Our land is beautiful, with majestic mountains, breathtaking seascapes and peaceful forests. As you soar over intricately detailed three-dimensional landscapes, imagine yourself gazing at this splendor like a bird. Is it possible that computers learn to recreate this kind of visual experience? However, current techniques that combine new perspectives from photos typically allow only a small amount of camera movement. Most previous research can only extrapolate scene content within a narrow range of views corresponding to subtle head movement.
In recent research from Google Research, Cornell Tech and UC Berkeley, they presented a technique for learning how to create unlimited flyover videos of natural situations starting with a single view, where this ability is learned through a collection of images unique, with no need for camera poses or even multiple views of each scene. This method can take a single image and build long camera paths of hundreds of new views with realistic and varied content during testing, although it never saw video during training. This method contrasts with more recent supervised view generation techniques, which require layered multi-view movies and exhibit better performance and synthesis quality.
The fundamental concept is that they gradually learn to generate rollovers. Using single-frame depth prediction techniques, they first compute a depth map from a starting view, such as the first frame in the figure below. After rendering the image in a new camera point of view, as shown in the middle, they use this depth map to create a new image and a new depth map from this point of view.
This intermediate image, however, has holes where they can see past things in areas that weren’t visible in the original image, which is problematic. Also, it’s blurry because pixels from the previous frame are stretched to show larger objects even though they’re now closer to them.
They developed a neural image refinement network to address these issues, which takes an incomplete, low-quality intermediate image and produces a high-quality full image and associated depth map. This synthetic image can then be used as a new starting point to repeat these steps. As the camera moves deeper into the area, the system automatically learns to create additional landscapes, such as mountains, islands, and oceans. This process can be iterated as often as desired as they refine the image and the depth map.
Using the ACID dataset, they trained this render-refine-repeat synthesis technique. They then apply this technique to generate multiple new perspectives that enter the scene along the same camera path as the ground truth video and compare the rendered frames to the corresponding ground truth video frames to extract a training signal .
With such a capability, new types of hardware for video games and virtual reality experiences could be created, such as the ability to relax while flying through endless natural surroundings.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, code and project.
Please Don't Forget To Join Our ML Subreddit
Rishabh Jain, is an intern consultant at MarktechPost. He is currently pursuing a B.tech in Computer Science at IIIT, Hyderabad. He is a machine learning enthusiast and has a keen interest in statistical methods in artificial intelligence and data analysis. He is passionate about developing better algorithms for AI.