Some Results in Video Segementation
VIVA Research Lab, University of Ottawa
School of Computer Science, Carleton University

Investigators: Prosenjit Bose, Robert Laganiere, Anthony Whitehead.


Cut Detection

Reliable shot boundary detection forms the cornerstone for video segmentation applications as shots are considered to be the elementary building blocks that form complete video sequences. Applications such as video abstraction, video retrieval and higher contextual segmentation all presuppose an accurate solution to the shot boundary detection problem. Automatic recovery of these shot boundaries is an imperative primary step, and accuracy is fundamentally necessary.


Sequence from the movie Psycho

The papers:

Whitehead A., Bose J., Laganière R., "Feature based cut detection with automatic threshold selection," Int. Conf. on Image and Video Retrieval, Dublin, Ireland, pp.410-418, July 2004.

Whitehead A., "Fast Feature-based Video Segmentation and Annotations," Int. Symposium on Signal Processing and Applications, Paris, France, July 2003.

You might also want to look at these interesting references.

Approach Outline:

  • We utilize a corner-based feature tracking mechanism to indicate the characteristics of the video frames over time. As we track corner features over time, we detect production features within the video and annotate the sequence depending on the features that are successfully tracked over time versus those that are lost.
  • In the case of a cut, features should not be tracked from frame I to I+1. However, there are cases where the pixel areas in the new frame coincidentally match features that are being tracked. In order to prune these coincidental matches, we examine the minimum spanning tree of the tracked and lost feature sets.
  • Our inter-frame difference metric is the percentage of lost features from frames I to I+1. This corresponds to larger changes in the minimum spanning tree, but is computationally efficient.
  • In order to auto-select a threshold, we examine the frequency of high and low feature loss. We are looking to exploit the fact that the ratio of cuts to non-cuts will be high, and therefore the density of low feature loss frames to high feature loss frames will maintain the same property. As the frame to frame tracking of features is independent of all other video frames, we have n independent observations from an n+1 frame video sequence. We can use the statistical foundations of density estimation to determine what threshold to select.

The data set:

Label* Video Characteristic of video data Genre
A Cartoon clip. Substantial object motion. Cartoon
B Substantial object motion. This clip is taken from a film where a blue filter was used to simulate low lighting conditions. Action
C Black and white movie. Substantial action and motion. Many close proximity cuts. Horror
D High quality digitisation of a television show. Drama
E Low quality digitisation of a television show. Science-Fiction
F Commercial, no cuts, quick motion, many production effects. Meant to show that dissolves are not mistakenly classified as cuts. Commercial
G Commercial sequence from the MOCA Project. Commercial
H Video abstract from the MOCA Project. Comedy/Drama
I News Sequence from the MOCA Project. News/Documentary
J Trailer for a film. This clip has many computer generated features, many close proximity cuts. Trailer/Science-Fiction/Action

*Click on letter label to obtain real cut positions. A "1" on the ith line means that there is a cut between frame i and frame i+1 in the corresponding video.

The results:

True Cut True Non-Cut
Classified as Cut True positive T+ False positive F+
Classified as Non-Cut False negative F- True negative T-

 

Proposed feature tracking method

Pixel Based method with localization

Histogram MethodCut Det (MOCA)

Data Source

Precision

Recall

F1

Precision

Recall

F1

Precision

Recall

F1

A

1

1

1

1

1

1

1

1

1

B

1

1

1

.825

.825

.825

1

.375

.545

C

.595

.870

.707

.764

.778

.771

.936

.536

.682

D

1

1

1

1

1

1

1

.941

.969

E

.938

1

.968

.867

.867

.867

.955

.700

.808

F

1

1

1

0

0

0

1

1

1

G

.810

.944

.872

.708

.994

.809

1

.667

.800

H

.895

.895

.895

.927

1

.962

.971

.895

.932

I

1

1

1

1

1

1

1

.500

.667

J

.497

.897

.637

.623

.540

.591

.850

.395

.540

AVG

.874

.961

.908

.774

.800

.783

.971

.701

.794

VAR

.034

.003

.018

.090

.101

.093

.002

.060

.036

STD. DEV

.185

.054

.134

.301

.318

.304

.048

.246

.190

The tool:

Note: our results are as obtained using automatic thresholding. Better results could have been obtained by manually selecting an optimal threshold for each sequence as we did for the other methods.

The tool:

The VidSegPick software tool for segmenting videos can be found here