Video Summarization of Fixed Video Surveillance Cameras


Overview


Video summarization techniques aim to reduce the amount of data available in a video, by condensing temporal information. We focus here on 2 different approches:

  • Video Synopsis, where the objective is to merge non-intersecting trajectory onto the same frames
  • Keyframe Selection, where the objective is to keep the interesting frames of a video


Video Synopsis


The objective of automatic video summarization is to provide users of video monitoring systems with consise summaries of what has been observed by a surveillance camera during a long period of time. In some cases, when the scene is usually quiet, simple motion detection would suffice, but in most cases such simple approach will still produce a too long summary. In this project, we aim at obtaining better summaries by combining the different moving objects and display them together, inside the same frames, whenever possible.

The following image represents an overview of the given approach.

Model Structure

More details on the model can be found on the paper here, as well as on the associated poster here.

Some results
This is the original video
4:00 minutes.
Simple solution 1: using motion detection
Frames without motion are removed. 1:03 minutes
Simple solution 2: fast-forwarding
Only 1 frame every 10 is kept. 0:23 minutes
Our solution: merging moving objects
The moving objects are combined and displayed simultaneously whenever possible. The original temporal ordering is maintained. 0:37 minutes
Our solution: merging moving objects with collisions
Here some collisions between moving objects are allowed in order to increase the temporal compression. 0:14 minutes


Keyframe Selection


Keyframe selection aims at collecting the most interesting/representative frames of a video. As opposed to video synopsis, no modification is made on the frames.

Such method can be divided in 2 parts:

  • Feature Extraction, which consists avec creating a temporal curve from multiple features (such as saliency map, detection map, etc.). This step aims to create a 1-dimension curve from the video.
  • Keyframe selection, which will use the previous 1-dimension curve to find which frames can be selected for the final result.

Some results
This is the original video
4:00 minutes.
Solution 1: using only saliency
Saliency is computed on each frame, and used as feature for the keyframe selection
Solution 2: using only detection label
Object detection is run on each frame, and labels are used as feature for the keyframe selection
Solution 3: using only detection position
Object detection is run on each frame, and positions are used as feature for the keyframe selection
Solution 4: using all features
Saliency, detection position and label are used as features for the keyframe selection

Navigation

Participants

Video Synopsis

  • Marc Decombas
  • Kelvin Moutet
  • Po Kong Lai

Keyframe Selection

  • Pierre Marighetto
  • Max Cohen