Creating Immersive Virtual Reality Scenes Using a Single RGB-D Camera

Po Kong Lai and Robert Laganière
University of Ottawa
Ottawa, ON, Canada
Questions? Drop us a line


Creating experiences for virtual reality (VR) with six degrees of freddom (6-DoF) using visual sensors typically requires a complicated capture system involving multiple sensors. Realistically speaking, an average consumer may not have the space and/or the resources to setup a multi-sensor environment. Furthermore such an environment has poor mobility making it difficult to record content spontaneously at different locations. Our aim is to provide a framework to empower regular consumers to record their own content to be viewed or interacted with in VR at a later date. Being able to reproduce a dynamic scene through a single sensor is therefore of great interest. In order for a scene captured using visual sensors to be considered “immersive” in VR, several properties must hold.

  1. The background of the scene must be displayed to the viewer. While it may be compelling to display just the dynamic objects, without the background there is no context.
  2. Dynamic objects must be consistent with the captured background. If there is no correlation between the dynamic objects being displayed and the background, then the captured scene does not reflect the viewers expectations.
  3. The viewer should retain full rotational and positional freedom (ie: 6-DoF) while experiencing the captured scene. Without this freedom the viewer will experience an inconsistency between their movements and what they are seeing through the VR head mounted display (HMD).
  4. The captured scene, background and dynamic objects, should be as complete as possible.
Failure to achieve these properties will decrease the quality of the immersive experience.

This work aims at producing immersive VR scenes using a single RGB-D camera by satisfying the above properties. An obvious limitation with using a single RGB-D camera is that the fourth property, completely capturing a scene with dynamic objects, is not always possible. A single camera can only record dynamic objects within it’s field of view (FOV), thus any moving objects out of it’s FOV will not be included.

Our proposed approach aims to achieve the first three properties by utilizing a combination of static scene reconstruction, skeleton tracking and general visual object tracking to reconstruct the captured scene such that the background and dynamic objects are displayed coherently in VR.

This work was accepted to the 14th International Conference on Image Analysis and Recognition (ICIAR 2017).

Example output

Static camera results.

Moving camera results.


Here is the dataset we used in our experiments: download


Industry Partners

Thank you to BRINX Software for making this work possible!