Some Results in Augmented Reality
VIVA Research Lab

Principal investigator: Robert Laganiere, Gerhard Roth


Online estimation of The Trifocal Tensor

The trifocal tensor has been used in many video applications such as image-based rendering, 3D modeling and augmented reality. Most existing techniques compute a chain of tensors from a video sequence, which is decomposed into sets of view triplets. Generally, the actual computation of the tensor is not part of the real-time processing loop. These methods also suffer from error accumulation during a long sequence. We propose a keyframe-based approach for online estimation of the tensor in live video. It works with a single camera that moves freely inside the scene of interest. Image features taken from an initial triplet set are tracked across a video sequence. Then, as the camera is moving, the tensor associated with each frame is estimated online.

The paper:

Li J., Laganière R., Roth G.,, "Online Estimation of Trifocal Tensors for Augmenting Live Video," IEEE/ACM Symposium on Mixed and Augmented Reality, Arlington, VA, pp. 182-190, Nov. 2004

.

  • Our3-view system is constructed with a single camera.
  • The camera takes three reference images from different viewpoints.
  • The live video sequence is tracked from one of the reference images.
  • The trifocal tensor of the evolving view triplets is computed at the realtime.

Application to augmented reality

We demonstrate the applicability of our approach to augmented reality. The goal is to automatically insert into live video a computer generated model of an object that is not physically present in the scene.

  • Inspired by the observation that anything on the reference images can be transferred to the video by means of trifocal transfer.
  • Our tensor algorithm is integrated with a ARToolkit method, which consists of the tracking of a square pattern. We propose to replace it on every video frame with an ’invisible’ pattern transferred from the reference images by using the online computed tensor.
  • No camera’s intrinsic parameter is required a prior. And the need for computation of camera pose and scene reconstruction is avoided.


Outline of the approach

Our proposed approach is illustrated in the above figure. The system has, as input, three camera views, denoted by V1, V2, V3 respectively. They contain a square pattern which is purposely placed inside the scene at the capture time. Note that this pattern does not have to be present anymore once the three keyframes are obtained.

The initialization step consists in obtaining both an initial estimate of the tensor and a large set of matched triplets. Several alternatives can be envisaged in order to achieve this goal, including a tensor-based guided-matching and the PVT tool developed by Dr.Gerhard. The feature points of the obtained triplet set that belong to one reference view will constitute the initial set of point to be tracked. Match pairs between the other fixed views will serve as a match pool that will be used, during the process, to update the list of points to be tracked.

Once the initialization process is completed, the online tensor estimation and augmentation process can start. The detected points in one reference view are tracked from one frame to the next. This leads to new positions of the points for which we still have the correspondences in the two fixed ones. Using this updated triplet set, robust and fast estimation of the tensor is achieved. Once a new tensor is obtained, the square pattern specified in the two fixed reference views, V1 and V3, is transferred into the moving camera view to generate a virtual image of this pattern, with which the ARToolKit method is implemented to embed the virtual object.

Obviously, when points are tracked over time, more and more features are unavoidably lost. And if nothing is done, the tracked set will eventually vanish. To overcome this problem, the match set is updated after each tensor estimation. Indeed, using the pool of match pair available in the two fixed views, it becomes possible to transfer new points on the image using the newly estimated tensor. This last step ensures the long term viability of the estimation process. In a multi-camera implementation, points from view close to the current reference views would also be transferred, thus allowing the identification of the view toward which the moving camera is transiting.


In this experiment, a pattern was pasted on the wall when the three reference images were captured.

Click on image to see a sequence where tracked points and the transferred pattern are shown superimposed on every frame.

The augmented sequence will show that the DirectX logo is put on the wall as if it was.

Even a movie can be displayed on the wall.

Other examples:


Three Reference Images. click on image to play a sequence where a teapot is embedded upon a blank sheet

Three Reference Images(The disk is placed tilt on the table). Click on image to see a sequence. Tracked features and transferred disks are shown along the sequence.

Participant: Jia Li, M.Sc.