Color Image Processing
Lecture on Color Signal Processing
July 16, 2014
The purpose of the lecture is twofold.
Statistical Analysis of 3D Faces in Motion
April 24, 2014
PhD student at the Cluster of Excellence, Multimodal Computing and Interaction
Saarland University, Germany.
Accurate reconstruction of face shape is important for applications such as tele-presence and gaming. Such a reconstruction problem can be solved efficiently and in the presence of noise with the help of statistical shape models that constrain the shape of the reconstruction. In this talk, an approach to robustly compute correspondences between a large set of facial motion sequences in a fully automatic way using a multilinear model as statistical prior is proposed. This motion sequence registration gives a compact representation of each motion sequence consisting of one vector of coefficients for identity and a high dimensional curve for expression. Based on this representation, new motion sequences are synthesized for static input face scans. Furthermore, a statistical model to represent 3D human faces in varying expression is discussed, which decomposes the surface of the face using a wavelet transform, and learns many localized, decorrelated multilinear models on the resulting coefficients. The localized and multi-scale nature of this model allows for recovery of fine-scale detail while retaining robustness to severe noise and occlusion, and is computationally efficient and scalable.
Combining Flexibility and Low-power in Embedded Vision Subsystems: An application to Pedestrian Detection.
April 23, 2014
Pierre G. Paulin, Ph.D.
Director of R&D, Embedded Vision Subsystems
Synopsys Inc., Canada.
This presentation will introduce a new R&D team at Synopsys in Ottawa that is responsible for Embedded Vision Subsystems. I will give a short overview of the team and the mission, and then introduce our upcoming Embedded Vision reference platform which performs pedestrian detection. This platform combines the Synopsys ARC HS high-performance and low-power processor core with a set of four application-specific instruction-set processors (ASIP). These ASIPs are optimized for the main kernels of the HOG (Histogram of Oriented Gradients) algorithm. We present the mapping of different HOG functions on the ARC HS and ASIPs, highlighting different flexibility and power trade-offs. I will also present a two-level software stack, using OpenCV and a C API to the ASIP-based accelerators. The power consumption of the overall platform and its individual components is analyzed. Finally, I present the mapping of the platform onto the Synopsys HAPS FPGA-based rapid prototyping platform. I conclude with a summary of the main future research challenges.
Digital Processing of Visual Signals: From Broadcast Television to 3D Telepresence
June 26, 2013
G.S. Glinski Award for Excellence in Research (Public Seminar)
The field of electronic visual communications has evolved in a relatively
short period from over-the-air broadcast television to a wide array of
visual services ranging from pocket and hand-held devices to giant
screens. This evolution was made possible by advances in acqui-sition and
display technologies (cameras and screens) and by sophisticated digital
processing of visual signals. This talk will illustrate how mathematical
models and tools have allowed advances in a number of specific digital
imaging applications, ranging from broadcast television to digital cameras
to the ability to use 3D imaging to experience distant real-world places
from your desktop computer, the comfort of your living room or right from
the palm of your hand.
Recent Advances in Sampling-based Alpha Matting
June 18, 2013
Alpha matting is the problem of estimating the opaqueness of the foreground in images and video. It plays an important role in film production and photo editing applications, and has attracted considerable research attention. During the past decade, the matting literature has grown remarkably to comprise a rich spectrum of techniques and frameworks, each having its own merits and drawbacks. This seminar is aimed at (i)introducing briefly some basic concepts about matting, (ii)highlighting the main differences between the two major families of matting algorithms, namely, the sampling-based and the propagation-based families, (iii)Shedding light on the latest advances in sampling-based matting, (iv) presenting our contributions to natural image matting, and finally (v)providing some resources for those who want to read more about the subject.
Robust Detection of Vehicles in Large Scale Aerial Images: Salient Regions Detection, Classification and Unsupervised Clustering of Image Features
May 14, 2013
Over the years, the detection of vehicles in large scale imagery acquired by high resolution sensor has received great attention in both remote sensing and computer vision communities. However, detecting automatically such objects in multi-sensor and multi-temporal datasets is a difficult task for several reasons. First, the set of generic features that compose the vehicle (e.g. rooftop, hood, and windshield) can be acquired by different sensors, at different time, and according to various viewpoints. Moreover, vehicles could heavily interfere with the immediate environment, producing occlusions and shadow areas within the scene. Therefore, the aspect of vehicle's main body parts can be drastically modified. And finally, last but not least, the vehicles to be detected are usually small and localized in the regions much smaller than the size of the aerial imagery. Combined together, these constraints make difficult the development of robust and automatic methods for detecting vehicles in aerial imagery.
In order to tackle all these constraints, we first introduce a new algorithm that extracts salient regions from the images. As a pre-processing, this algorithm allows us to reject many inconsistent regions where no vehicle appears. Contrary to traditional segmentation techniques, the salient regions detection is designed to answer the following question "what object in the image attracts the most the visual attention of the observer?" Once the salient regions are highlighted, a further analysis can be done to detect the vehicles in selected regions.
We propose a set of algorithms which are based on geometric and radiometric features, extracted within a multi-resolution linear Gaussian scale-space. The image features, described by their local structures, are classified using a learning algorithm, the Support Vector Machines. Classified features are then clustered by an unsupervised affine propagation clustering algorithm, within a feature-level fusion scheme. Subcomponent of vehicles' body parts are aggregate together with respect to shared spatial relations and based on constraints on the orientation of detected vehicles. Experimental results using large scale aerial imagery demonstrate the efficient and robustness of the proposed algorithms for the detection of vehicles in an urban environment.
Samir Sahli received the BSc degree in Mathematics and the MS degree in Physics of condensed matter from the University of Nice Sophia Antipolis, France, in 2003 and 2005, respectively. He was awarded the MS degree in Physics from Université Laval, Canada, in 2008. He has completed his doctorial studies in the Center for Optics Photonics and Laser (Prof. Sheng). His main area of research is the detection and recognition of objects in aerial imagery in complex and uncontrolled environments, as well as machine learning and computational image acquisition.
Semantic Video Compression Based on Seam Carving
March 28, 2013
Traditional video codecs like H.264/AVC encode video sequences to minimize the Mean Squared Error (MSE) at a given bitrate. Seam carving is a content-aware resizing method. we present a semantic video compression scheme based on seam carving. Its principle is to suppress non salient parts of the video by seam carving. The reduced sequence is then encoded with H.264/AVC and the seams are represented and encoded with our proposed approach. The main idea is to encode the seams by regrouping them. As the background is suppressed and salient objects can be moved, we have proposed a full reference visual quality metric to evaluate a semantic coding system which may not preserve exactly the position and/or the shape of objects. The metric is based on Scale-Invariant Feature Transform (SIFT) points. More specifically, Structural SIMilarity (SSIM) on windows around the SIFT points measures the compression artifacts (SSIM_SIFT). Conversely, the standard deviation of the matching distance between the SIFT points measures the geometric distortion (GEOMETRIC_SIFT). We validate our metric with subjective evaluation and reach a Spearman correlation of 0.86 for SSIM_SIFT and 0.74 for GEOMETRIC_SIFT. For the video compression based on seam carving, Experiments show that, compared to a traditional H.264/AVC encoding, we reach a bitrate saving between 10% and 24% with the same quality of the salient objects.
Marc Décombas graduated in 2010 from the French engineering school Telecom Sud-Paris. He started his PhD, Content aware video compression for very low bitrates application, with Telecom ParisTech and Thales Communications & Security, Paris, France. This PhD was supervised by B. Pesquet and F. Dufaux and leads to several publications in SPIE2011, ICIP2012, MMSP2012 about video compression based on seam carving and a new object based quality metric based on SIFT and SSIM. His main research area is video compression based on seam carving but he has also worked on video segmentation, saliency maps and video summary.
Distance Transforms for Efficient Multi-Scale Matching of High Resolution Planar and Spherical Images
July 17, 2012
In this paper we present a time- and space-efficient multi-scale method for stereo reconstruction from high-resolution planar and omnidirectional images. We first present the stereo algorithm and then extend it to omnidirectional images using a novel spherical disparity formulation. Our multi-scale method is based on a novel application of distance transforms to the disparity space images (DSI) at adjacent scales, without making hard, greedy decisions at coarser scales. We provide extensive experimental validation of our method using public benchmarks and demonstrate state-of-the-art performance for planar stereo and similar high-quality results for the spherical case. We further consider how this method can be extend to the multi-view stereo setting and be used for novel view synthesis.
3D Face Analysis, Recognition and Expression Recognition
July 11, 2012
Boulbaba Ben Amor
TELECOM Lille1 Ecole D'Ingenieurs
The first part of this presentation will describe a novel geometric framework for analyzing 3D faces, with specific goals of comparing, matching, and averaging their shapes. In this framework, we represent facial surfaces by radial curves emanating from the nose tips and use elastic shape analysis of these curves to develop a Riemannian framework for full facial surfaces. This representation, along with the elastic Riemannian metric, seems natural for measuring facial deformations and is robust to data issues such as large facial expressions (even those with open mouth), large pose variations, missing parts, and partial occlusions due to glasses, hair, etc. This framework is shown to be promising in both empirical and theoretical sense. In terms of empirical evaluation, our esults match or improve the state-of-the-art methods on three prominent databases: FRGCv2, GavabDB, and Bosphorus, each posing a different type of challenge. In terms of theoretical aspects, this framework allows for formal statistical inferences, such as estimation of missing facial parts using PCA on tangent spaces and computing average shapes. The second part of this presentation will describe a fully automatic approach for identity-independent facial expression recognition from 3D video sequences. Towards that goal, we propose a novel approach to extract a scalar vector field that represent the deformations between faces conveying different expressions. We extract relevant features from this deformation field using LDA and then train a dynamic model on these features using HMM. Experiments conducted on BU-4DFE dataset following state-of-the-art settings show the effectiveness of the proposed approach.
Multidimensional digital signal processing using symmetry groups
July 5, 2012
Symmetry groups are widely used in physics and chemistry to describe the properties of molecules, crystals, etc., so the theory is very well developed. Symmetry groups can also be applied to the study of multidimensional image and signal processing for signals defined on lattices. This has not received much attention. This seminar shows how the theory of multidimensional linear, shift-invariant filtering can be extended to more general symmetry-invariant filtering and why this might be useful. In one dimension, this leads to standard methods like the discrete cosine transform (DCT) which are usually extended to two or more dimensions separably. In the talk, I will give a brief introduction to symmetry groups, and then outline the general theory for multidimensional symmetry-invariant signal processing. For periodic signals, this leads to block signal representations that extend the DFT and DCT. I will give some examples for rectangular and non-rectangular sampling lattices, as well as for color filter array (CFA) signals. Performance of the signal representations will be illustrated by rate-distortion graphs.
Image Interpolation Using Kernel Regression
June 19, 2012
Kernel regression for multi-dimensional signal processing is a technique popularized by H. Takeda and P. Milanfar for image processing in 2007. This technique was further refined for video processing in 2010. One of the most important assumptions is the existence of a continuous signal in which the input discrete signal is sampled from. This is because the Taylor expansion is used in the regression modelling of the signal. Takeda's flavor of this regression estimation framework was to use Euclidean norm to formulate the task into a weighted least-squares problem. This allowed common linear algebra tools to be used in the analytical solution. As long as the use of the Taylor expansion for the discrete signal is justified, this framework could perform signal estimation without the need for regularly sampled data. Example applications of this framework include signal interpolation, de-noising, and image inpainting.
This tutorial seminar will focus on the theory of Takeda's flavor of the kernel regression framework for 2D images (not videos), and challenges that arise from using the Euclidean norm in the optimization objective function.
Quadratic Pseudo-Boolean Optimization: Theory and Applications At-a-Glance
June 12, 2012
Max-flow/Min-cut (Graph cuts) optimization has proved efficient in solving many early vision problems that can be formulated in terms of energy minimization. During the last decade, a considerable portion of the image processing and computer vision literature was comprised of graph-cuts-based techniques proposed for motion estimation, segmentation, restoration, stereo/reconstruction, pose estimation and inpainting in addition to other applications. However, it is now widely accepted that abilities of the graph cuts algorithm are limited to minimizing a certain class of Markov random field energies, namely the sub-modular energies. Sub-modularity is in general unsatisfiable, and when removing such a constraint, constructing or designing richer energies with greater modelling abilities becomes possible. But which minimizer can work with such energies? Pseudo-Boolean Optimization (PBO) has been known for a long time ago, but only in the past five years has it started to attract the attention of imaging and vision researchers. Its power comes from the fact that graph cuts can be applied to minimize non-sub-modular energies within its framework. Thus, the virtues of graph cuts can now be used to handle non-sub-modular, more complex and general energy functions. Many promising results in various vision applications have been proposed to date and more interesting results are yet to come. The presentation is expected to introduce briefly the theory and capabilities of PBO after explaining the shortcomings of graph cuts. A peek at various applications and results in addition to some key resources will be also given.
Isometry-Invariant Shape Analysis
June 5, 2012
Cluster of Excellence: Multimodal Computing and Interaction, Saarland University
Max-Planck-Institut für Informatik
Shape analysis aims to describe either a single shape or a population of shapes in an efficient and informative way. This is a key problem in various applications such as mesh deformation and animation, object recognition, and mesh parameterization. I will first present a number of approaches to process shapes that are nearly isometric. Second, I will present algorithms to compute the correspondence information between human bodies in varying postures and human faces in varying expressions, respectively. In addition to being nearly isometric, human body shapes and faces, respectively, share the same geometric structure, and we can take advantage of this prior geometric information to find accurate correspondences. Finally, I will talk about some applications in computer aided design and conclude with some ongoing work.
Human Action Representation and Action Classification in Video Analysis
May 2, 2012
Amir H. Shabani
Vision and Image Processing Lab, University of Waterloo
Humans can easily detect and recognize the type of actions performed in a video. However, the automatic recognition of human actions is a challenge in computer vision with growing applications for automated surveillance, content-based video retrieval, video summarization, elderly home monitoring for assisted living, and human computer interaction. The confusion lies in people performing the same action in noticeably different ways, leading to errors of omission. Also, individuals performing different actions that visually appear to be similar, lead to errors of commission. In addition, illumination and view/scale changes create further challenges to automatically interpret the scene. With the focus on the main components of a standard bottom-up discriminative action recognition framework, I first introduce our novel asymmetric scale-space filtering for robust salient feature extraction from a time-causal video. Experimental results are then presented on the performance of the salient features for three evaluation tests of precision, robustness, and quality for action representation on several benchmark datasets. Based on our observations, a sparse set of asymmetric salient motion features shows higher classification accuracy than the state-of-the-art methods with computationally expensive dense sampling and dense trajectories. In the second part of the talk, to address the high intra-class and low-inter class variation problem, I will motivate the explicit multiresolution analysis of the motion patterns for both action representation and action classification using a Bayesian fusion approach. Again, the experimental results show significant improvement of our approach over the state-of-the-art methods on both choreographed and realistic benchmark datasets collected from Youtube and Hollywood movies.
Amir H. Shabani received his PhD from Vision and Image Processing (VIP) lab at the University Of Waterloo in 2011. He was postdoctoral fellow at the Stanford University (Stanford Vision lab), before moving back to Waterloo where he is currently research associate at the department of Systems Design Eng.. Amir's primarily research interests lie on theory and application of computer vision and machine learning techniques for video analytics including motion segmentation, object tracking, and high-level scene understanding. Amir worked on several research projects including object tracking and human action recognition in video during his PhD, 3D object reconstruction using structured light technique during master's, and embedded signal processing during undergrads. From Jan. 2010, Amir is collaborating with the Action Lab at Centre for Cognitive Neuroscience, Wilfrid Laurier University, ON, on the role of different afferents in the motion perception. Prior to his PhD, Amir worked for five years with European companies on hardware and software design for advanced control processes and robot vision solutions for factory automation.
Visual Place Categorization for Mobile Robots
April 17, 2012
Center for Vision Research, York University
We are rapidly moving towards an era when autonomous robots are taken beyond lab and industrial settings, to home and office environments where they can act as personal companions to assist people in their everyday life. A personal companion robot requires a representation of the environment that establishes a link to human concepts and terms in order to facilitate the communication and interaction of the robot with both the environment and its inhabitants (people). In this talk, I will address the problem of visual place categorization, which aims at augmenting different locations of the environment, visited by an autonomous robot, with information that relates them to human-understandable concepts. I will present a method that formulates the problem of visual place categorization in terms of energy minimization. To label visual observations with place categories, HOUP descriptors are introduced and used to build a global image representation that is invariant to common changes in dynamic environments and robust against intra-class variations. To satisfy temporal consistency, a general solution is presented that incorporates statistical cues, without being restricted by constant and small neighborhood radii, or being dependent on the actual path followed by the robot. The results of several experiments on publicly available databases will be presented and the application of the proposed approach (as whole, or in part) for other robotic and vision problems (e.g., topological robot localization, object detection/recognition, video retrieval and face recognition) will be discussed.
Human Categorization of Colours
A lot of works have already been done on colour perception (e.g. Berlin and Kay studies ). We have studied the 11 colour reduction scheme. One of the motivation of this work was the need for an appropriate colour characterization of objects in surveillance applications. We seek a colour reduction scheme that is both compact and that matches well the way human memorize and specify colours. Several versions have been built based on different experimentations to gather samples based on human perception. We have compared our performances on images retrieval tasks using an eBay images database. We will also present a colour implementation of the EMD (Earth Mover Distance) and an Image search engine.
Animated Movie Analysis
July 8, 2012
This talk will give a few results on animated movie analysis. Animated movies is specific to Annecy where each year there is a very famous festival. The first part will be dedicated to indexation. The characteristics which are used are mainly related to colour and rhythm are directly extracted from images. Some other textual information is obtained from synopsis. The second part will give some example of applications: summarization, genre classification, and similarity measure.
Robust model estimation using leverage in a guided sampling framework
June 10, 2011
RANSAC is often used as the standard method for robust model estimation in many vision applications. RANSAC proceeds by randomly fitting models to subsets of data and assigning scores based on how many inliers exist with respect to these models. We offer an improvement to RANSAC where readily available information from the regression statistics are used to guide this sampling process. The result is a model estimation procedure that is significantly faster than RANSAC and has a lower error ratio.
Wavelet Model-based Stereo for Fast, Robust Face Reconstruction
May 20, 2011
When reconstructing a specific type of class of object using stereo, we can leverage prior knowledge of the shape of that type of object. A popular class of object to reconstruct is the human face. In this paper we learn a statistical wavelet prior of the shape of the human face and use it to constrain stereo reconstruction within a Bayesian framework. We initialize our algorithm with a, typically noisy, point cloud from a standard stereo algorithm, and search our parameter space for the shape that best fits the point cloud. Due to the wavelet basis, our shape parameters can be optimized independently, thus simplifying and accelerating the search. We follow this by optimizing for a secondary prior and observation: smoothing and photoconsistency. Our method is fast, and is robust to noise and outliers. Additionally, we obtain a shape in an parameterized and corresponded shape space, making it ready for further processing such as tracking, recognition or statistical analysis.
Video Surveillance and Biometric R&D program at CBSA-S&E
April 11, 2011
Dmitry O. Gorodnichy
Innovation and Technology Branch, Canada Border Services Agency
This presentation will introduce the audience with the R&D activities of the Video Surveillance and Biometrics (VSB) section of the Science and Engineering Directorate of the Canada Border Services Agency (CBSA-S&E).
Dmitry O. Gorodnichy, Ph.D.
Senior Research Scientist, Manager
Video Surveillance and Biometrics Section
Science and Engineering Directorate
Science, Innovation and Technology Branch
Canada Border Services Agency
The Structure and Properties of Color Spaces
December 2, 2010
The numerical specification of colors as perceived by humans has been studied for hundreds of years. Colors are understood to be equivalence classes of light spectral densities that appear identical to a human observer, and they belong to a three-dimensional space. However, a detailed exposition of the algebraic structure of color spaces has been lacking, motivating me to write the book The Structure and Properties of Color Spaces and the Representation of Color Images (available on campus here.). This lecture presents this algebraic structure as that of the quotient space of a vector space of radiometric functions with respect to the equivalence classes corresponding to the viewer. The algebraic notion of quotient space will be covered and then applied to the description of color spaces. Related topics including further decomposition of the color space and transformations between different color spaces will be discussed. An attempt will be made to relate this formulation to the work of some previous scientists who have addressed the problem (Newton, Grassmann, Maxwell, Schrödinger).
Variational minimization methods for image segmentation and object tracking
November 16, 2010
Mohamed Ben Salah
Object detection is a fundamental task in computer vision which serves many important problems such as image segmentation and object tracking. Current major application areas include medical image interpretation, remote sensing, video image analysis, and surveillance. Many studies have focused on variational formulations because they result in the most effective algorithms. By minimizing a functional which can easily encode various segmentation constraints, variational formulations have offered a convenient framework to develop accurate and efficient algorithms. In this talk, we present some variational techniques we proposed using active contours and graph cuts for image segmentation and object tracking.
Perception, Environment Modeling and Motion Planning: Integration on Humanoid Robots
May 7, 2010
This talk will present my research in integrating perception, environment modeling and motion planning on humanoid robots. After a brief review of the environment modeling by the method of occupancy grid, it presents mainly my research work on whole-body motion planning for humanoid robots. It pertains to our recent research work in the scope of a humanoid robotic project with Toyota-Motors Europe.
Dr. Alireza Nakhaei completed a Bachelor of Science and a Master degree in Mechanical Engineering before obtaining his Ph.D. in September 2009 from Institut National Polytechnique de Toulouse in Computer Science and Robotics while working at LAAS-CNRS in Toulouse, France, under the supervision of Dr. Jean-Paul Laumond and Dr. Florent Lamiraux. Dr. Nakhaei has acquired a solid experience in kinematic and dynamic modelling of mobile robots as well as on the development of humanoid robots, with a special interest in motion planning. He also cumulated industrial experience in various manufacturing companies. He currently collaborates with Toyota-Motors Europe while pursuing research at LAAS-CNRS on algorithms for Motion Planning for a Humanoid Robot.
Bone Graphs: Medial Abstraction for Shape Parsing and Object Recognition
June 19, 2009
University of Toronto
The recognition of 3-D objects from their silhouettes demands a shape representation which is invariant to minor changes in viewpoint and articulation. This invariance can be achieved by parsing a silhouette into parts and relationships that are stable across similar object views. Medial descriptions, such as skeletons and shock graphs, attempt to decompose a shape into parts, but suffer from instabilities that lead to similar shapes being represented by dissimilar part sets. In this talk, I will present a novel shape parsing approach based on identifying and regularizing the ligature structure of a given medial axis. The result of this process is a bone graph, a new medial shape abstraction that captures a more intuitive notion of an object's parts than a skeleton or a shock graph, and offers improved stability and within-class deformation invariance. In addition, I will present a novel DAG matching algorithm that treats the similarity of node and edge attributes as a function of the hierarchical node constraints encoded in each graph. This algorithm can be used to compare bone graphs and shock graphs in the presence of noise, occlusion, and clutter. Finally, I will demonstrate the bone graph representation and the graph matching algorithm for the task of object categorization.
Diego Macrini received an Engineering degree in software engineering from the Universidad de Belgrano (Buenos Aires), in 1998, and a M.Sc. degree in computer science from the University of Toronto, in 2003, where he is currently pursuing a Ph.D. degree in the same area. His major field of interest is computer vision with an emphasis on shape representation, object recognition, and visual motion analysis.
Image diffusion under Sobolev metrics
November 7, 2008
The outline of the presentation:
3D human motion tracking for gait analysis using particle filtering
October 23, 2008
LORIA INRIA-Lorraine Laboratory, France
The work we present is a part of a telemedicine project which aims to
prevent the falls in the elderly at home. This can be realized by proposing a
technology and a methodology in order to detect balance disorders. The main
objective of this work is to develop a passive autonomous gait analysis
system at home. We expose the previously used approaches for gait quality
evaluation and then we propose a new solution. The suggested method is based on a 3D
marker less human motion capture system. Thus, the gait parameters can be
estimated using the 3D positions of some key points of the body.
The motion capture system we developed uses an articulated body model and a new particle filtering algorithm. This new algorithm, which we call Interval Particle Filtering, reorganizes the articulated model's configurations search space in an optimal deterministic way and proved to be efficient in tracking natural human movement. In order to reduce the temporal complexity of the algorithm and to have a more precise tracking, a new factorized version of the algorithm is also introduced. This version called Factored Interval Particle
Filtering uses the Dynamic Bayesian Networks formalism. We show 3D reconstructions of movement using this algorithm and we compare it to other approaches too.
Introduction to Video Indexing and Its Applications
September 22, 2008
The Image Processing and Analysis Laboratory, University Politechnica of Bucharest
Advances in modern multimedia technologies (e.g. better storage devices, improved compression techniques, faster wireless communication protocols, etc.) have led to huge and ever-growing archives of multimedia documents, and in particular of video footage. Therefore, the actual issue is not the lack of information, but instead, the difficulty of accessing large amount of such data. The existing solution are the indexing systems which are constantly improve to make unstructured and unknown video data accessible, reusable, searchable and manageable for the common user. In this presentation we shall discuss the concept of video indexing, in his two existing types, that is syntactic and semantic indexing. Apart from general principle, we are going to present in detail some of the video processing tasks. First, we are addressing the concept of video temporal segmentation which is the basis for almost all the existing low-level and semantic-level video processing techniques. We focus on several algorithms we have developed for video transition detection, i.e. cut, fade and dissolve detection. Further, we are addressing the color-content analysis by describing an approach to global color content characterization in terms of color intensity, hue and saturation (e.g. dark color content, color contrasts, etc.). Finally, we are tackling the automatic content summarization issue by describing several methods for constituting still-image and trailer-like video abstracts. We conclude by presenting our perspectives on this project.
Lect.Dr.Ing. Bogdan-Emanuel IONESCU
LAPI - The Image Processing and Analysis Laboratory
University "Politehnica" of Bucharest
1-3 Iuliu Maniu Blvd., 061071 Bucharest, Romania
Frequency domain methods for demosaicking of Bayer sampled color images
July 28, 2006
Eric Dubois (VIVA professor) and Brian Leung (USRA)
Most digital cameras use a sensor with a color filter array (CFA) to capture the color information of a scene. At each spatial location in the image, only one component is measured. The most common type of CFA is the Bayer array, in which half the sensor elements measure the green component and one quarter of the sensors measure each of the red and blue components respectively. The problem of interpolating the remaining samples in each of the three color planes (demosaicking) has received considerable attention in recent years, especially due to the rapid proliferation of digital cameras. In this seminar, we will show that the spatial multiplexing of red, green and blue components in a Bayer CFA is equivalent to the frequency domain multiplexing of a luma (black and white) component at baseband and two chrominance components at high spatial frequency. This frequency domain representation provides insights that lead to novel algorithms that perform better than other published algorithms. In particular, a least-squares design method and a novel adaptive algorithm that provide state-of-the-art performance are described. Eric Dubois will present the basic theory and algorithms and Brian Leung will present recent results of system optimization.
Adapted Bilateral Filtering for Restoration of Decoded Images
July 28, 2006
VIVA Lab and RWTH Aachen University, Germany
In this seminar, the Bilateral Filter is shown to represent an estimator for piecewise constant images. This estimation interpretation allows for a targeted choice of filter parameters for the restoration of decoded images. It also leads to improvements of the Bilateral Filter by considering varying noise and correlations in the decoded image. For the example of the prevalent JPEG standard, rules to derive signal-adaptive parameters for the adapted BF from the decoded data are presented. By applying the adapted filter, significant objective as well as visual improvements over the originally decoded image can be observed.
Visual local navigation using warped panoramic images
July 7, 2006
University of Wales
The method presented here uses panoramic images to perform local homing. It is different from others in that it does not extract any features from the images and only performs simple image processing operations. Furthermore, it uses a method borrowed from computer graphics to simulate the effect in the images of translations of the robot to compute local motion parameters.
Contributions to 3D interpretation of temporal sequences of monocular images by variational approaches and evolution of active curves
September 13, 2005
Institut National de la Recherche Scientifique - EMT
In this research we develop new variational methods for segmentation anddense 3D interpretation of temporal sequence of monocular images. The segmentation, based on motion, is carried out jointly with the 3D interpretation which consists of a dense estimate of relative depth and 3D motion, i.e. of the structure and 3D motion of the mobile objects of the visual field. The movement of these objects, supposed rigid, is considered relative to the camera, thus allowing the objects and the camera to have simultaneous movements. These methods have two common points: the data input are the spatio-temporal variations of the image sequence, and the minimization of these functional is carried out by evolution of active curves via Euler-Lagrange equations and level sets. We propose threemethods: (A) direct method, (b) 3D interpretation with segmentation by essential parameters, and (c) 3D interpretation with segmentation by image motion. They differ essentially by the variables from their functional andthe criterion on which is based the segmentation. Each one of these methodsis formulated according to a variational principle with a functional to minimize.
Multimedia adaptation and transcoding for mobile applications
Because of the fast pace of technology evolution and mobile terminal market segmentation (different models addressing different user needs), their characteristics are very diverse. They support different multimedia formats and have different amounts of memory, processing power, screen resolution, etc. As a result, efficient access to multimedia content and interoperability between mobile terminals is seriously compromised. A solution to such problem is to adapt or transcode, most often in a network element, the multimedia content, to meet the specific characteristics of each terminal. This presentation will provide an overview of the applications requiring adaptation. It will then present the architectural elements of an adaptation system and existing technologies available (such as UAProf and the Standard Transcoding Interface) to develop such a system. Some highly efficient image and video transcoding algorithms are then presented. These algorithms perform transcoding in the DCT domain and provide significantly improved performance compared to the straightforward full decoding and re-encoding. Finally, current research areas in the field of multimedia adaptation are presented.
Facial Recognition in Video
August 23, 2003
Computational Video Group, Institute for Information Technology, National Research Council Canada
There is a physiological reason, backed up by the theory of visual attention in living organisms, why animals look into each others' eyes. This is to illustrate the main two properties in which recognizing of faces in video differs from its static counterpart -- recognizing of faces in images. First, the lack of resolution in video is abundantly compensated by the information coming from the time dimension. Video data is inherently of a dynamic nature. Second, video processing is a phenomena occurring all the time around us - in biological systems, and many results unraveling the intricacies of biological vision already obtained. At the same time, as we examine the way the video-based face recognition is approached by computer scientists, we notice that up till now video information is often used partially and therefore not very efficiently. This work aims at bridging this gap. We develop a multi-channel framework for video-based face processing, which incorporates the dynamic component of video. The utility of the framework is shown on the example of detecting and recognizing faces from blinking. While doing that we derive a canonical representation of a face. Using this representation and the pseudo-inverse associative memory we are able to memorize and recognize on fly the users of our lab.
Experience Builder: Learning and Solving through direct participation. Primary results on a Neurosolver Variant based on Progressive Vector Quantization and Reinforcement Learning
August 14, 2003
A Neurosolver is a special biologically inspired neural network. It is designed to solve problems in the framework of the state space paradigm. Basically, each node represents a state of the problem. A transition from one node to another correspond to an action that when applied to the first state will lead to the second.
The Neurosolver needs to learn the connections through training examples and then can solve problems using a special planning layer that find solution paths from the goal state back to the current state.
Inspired by the Neurosolver I designed a device that I call Experience Builder. It uses the same state space paradigm: a node represents a state of the problem and a transition corresponds to an action that lead from one state to another. However a node is implemented by a learning machine instead of a neural node. The goal of each node is to converge toward the best action.
In order to manage efficiently the number of possible state, a node may also be the representative for a region of the state space. The network of node is managed by an algorithm that also takes care of the learning of each machine.
The main characteristic and advantage of the Experience Builder over the Neurosolver and other General Problem Solvers is that it can learn and solve a problem directly and autonomously by trial and errors. It does not need to plan in order to find a solution.
During the presentation I will describe the Experience Builder and its algorithm in more details, and will present some of the capabilities of the device with results of various experiments where an agent tries to solve a maze.
Sample images can be independently regenerated from face recognition templates
August 7, 2003
Biometrics promise the ability to automatically identify individuals from reasonably easy to measure and hard to falsify characteristics. This talk addresses some of the security and privacy implications of biometric storage. Biometric systems record a sample image, and calculate a template: a compact digital representation of the essential features of the image. To compare the individuals represented by two images, the corresponding templates are compared, and a match score calculated, indicating the confidence level that the images represent the same individual. Biometrics vendors have uniformly claimed that it is impossible or infeasible to recreate an image from a template, and therefore, templates are currently treated as nonidentifiable data. We describe a simple algorithm which allows recreation of a sample image from a face recognition template using only match score values. At each iteration, a candidate image is slightly modified by an eigenface image, and modifications which improve the match score are kept. The regenerated image compares with high score to the original image, and visually shows most of the essential features. This image could thus be used to fool the algorithm as the target person, or to visually identify that individual.
A probabilistic framework for multispectral image segmentation
July 30, 2003
Vision and Image Processing Lab, University of Waterloo
A probabilistic framework for the segmentation of multispectral images based on the Gibbs/Markov random fields and hypothesis tests. First, a probabilistic distance derived from hypothesis tests based on image formation assumptions is introduced. Second, a Gibbs/Markov random field is endowed with the probabilistic distance is applied to a multispectral image to determine the segmentation directly by the minimization of energy. The Gibbs/Markov random field approach enables us to build a framework of rigorous computation where local and global constraints can be optimized. Results will be presented on color images.
Object tracking in visual surveillance applications
July 24, 2003
In this seminar, I'll present a system for tracking moving objects in a sequence of color images (this system uses background modeling methods presented on May 22nd by Benjamin PUJOL).
The objective is to detect moving objects to learn their trajectories. To do this, I'll present the modeling of objects (appearance and dynamic modeling), the tracker, and the interpreter of consistent and inconsistent trajectories.
Some processed sequences of a street intersection will be shown to illustrate this presentation.
Calibration of a trinocular vision system
July 17, 2003
This presentation focuses on the computation and applications of the trifocal tensor defined between three views. Since the trifocal tensor encapsulates all the projective geometric relations between three views, its estimation, for a system of three cameras, can be completed simultaneously and without requiring 3D points. Two estimation methods for tensors will be discussed, requiring only sets of matching point triplets. Their accuracy will also be compared.
For some critical cases, calibration can not be performed using a flat calibration pattern, such as the common chess board pattern, because the angle between the three cameras is too wide, including the case where some cameras look at each other. Then, we propose to use a ball, along with a circle detection program, to produce the triplet correspondences required for the tensor's estimation.
Given points or lines in two images, their correspondences in the third image can be computed by trilinear transfer. This can be applied to feature matching, motion prediction and occlusion detection. Point and line transfer will be used to demonstrate the accuracy of our estimated tensors.
With the help of trifocal tensors, exciting progress was made in many fields of application. Novel view synthesis of a 3D scene will be introduced as an example. Given a virtual camera position inside or far away from the viewing cone of three physically existing cameras, a new tensor can be retrieved and used to synthesize a novel view from three reference images.
Vision-based detection of activity for traffic control
July 10, 2003
In this seminar we present a system for detecting moving vehicles in a sequence of color images acquired by a stationary camera. The objective is to provide effective and robust vehicle detection under various possible weather and illumination conditions. Processing is combined with motion segmentation, HSV space observation and corresponding feature space segmentation. Experimental results based on outdoor scenes demonstrated the system's robustness under many difficult situations such as snow and night.
High-quality image magnification using regularized image up-sampling
July 3, 2003
In this seminar we present a new formulation of the problem of image magnification with higher perceived resolution. The problem is formulated as regularized image up-sampling that incorporates models of the image acquisition and display processes. This approach leads to a new data fidelity term that has been coupled with a bounded-total-variation regularizer to yield our objective function. This objective function is minimized using the level-set method with two types of motion that interact simultaneously. The method was implemented and has been verified to provide good results, yielding crisp edges without introducing ringing or other artifacts.
Iterative linear equation solvers as a common basis for adaptive filter algorithms
June 26, 2003
John Håkon Husøy
Having recently pointed out that the LMS adaptive filtering technique can be viewed as an iterative linear equation solver applied to a time varying linear equation set directly related to the Wiener-Hopf equation, we address in this paper the question if a larger class of adaptive filtering algorithms can be formulated within the framework of the theory of iterative linear equation solvers. We show that this is indeed the case and present the Least Mean Square (LMS), Normalized Least Mean Square (NLMS), Affine Projection Algorithm (APA) as well as the Recursive Least Squares (RLS) algorithms as algorithms that can all be seen as special cases within this general framework. Three important consequences are: 1) The theory of adaptive filtering can be presented in a unified and greatly simplified manner to DSP students. 2) The extensive body of available theory for iterative linear equation solvers becomes directly relevant to the field of adaptive filtering. 3) Performance results can be obtained within the unified framework and be made directly applicable to most known adaptive filtering algorithms through the specification of a few parameters of the general model.
3D reconstruction of points with passive sensors
June 19, 2003
This presentation is a general overview of the techniques used to compute the 3D location of scene points, using information about the calibration of the stereoscopic setup, and the corresponding pixels coordinates of the points in two or more images.
3D reconstruction of points has applications in robotics, pose estimation and 3D model building. In the latter case, it can be used to register different views of a rigid object, in order to reconstruct its visual hull by intersecting a large number of silhouette cones with only two cameras. 3D reconstruction algorithms take as inputs the intrinsic and extrinsic calibration parameters of the stereoscopic setup, along with the pixels locations of the points, matched in all the images. From this information, different algorithms can be used to retrieve the 3D coordinates of the matched points. Among them, reconstruction through triangulation and direct solving with the projection matrices will be presented, with some experimental results.
Wide baseline obstacle detection and localization
June 12, 2003
In this presentation we study the problem of an autonomous robot equipped with a single camera and that must locate the obstacles on a ground plane. The algorithm proposed here proceeds by first matching feature points on widely separated images of the work area using an overhead view transformation. The homography thus obtained is used to estimate the camera motion parameters. Obstacle are then located through a second matching phase.
Matching and Reconstruction from Epipolar Gradient Features
June 5, 2003
This presentation will combine the material which will be presented at two conferences, later this summer: an oral presentation that will be given at the Vision, Video and Graphics conference in Bath, England, in July, and a poster presented at ICIP in Barcelona, Spain, in September.
In applications where a model of some dynamic scene is needed and must be continuously rebuilt as the scene changes, sparse matched points are required. Several cameras would observe the scene, feature points would be selected on each views, and matched between them. Then, using the camera system's calibration, a model can be constructed.
The choice of feature points to be matched has an important impact on the quality of the resulting model. In this work, a method is presented for quickly and reliably selecting and matching points from three views. The selected points are based on epipolar gradients, and consist in stable image features that are more relevant to reconstruction than the more commonly used Harris corners.
Then, the selected points are matched using edge transfer, a measure of geometric consistency for point triplets and the edges on which they lie. This matching scheme is fast, and invariant to image deformations due to changes in viewpoint. Models drawn from matches obtained by the proposed technique will be shown to demonstrate its usefulness.
Electrical Impedance Tomography: Image Reconstruction with Electrode Measurement Errors
May 29, 2003
Electrical Impedance Tomography (EIT) is a relatively new medical imaging technique which allows imaging of the change in conductivity distribution within a body using body surface current applications and voltage measurements. We are particularly interested in its use as a monitoring technique of lung and heart activity in anesthetized and critical care patients. EIT's advantages - non-invasive, non-cumbersome, and relatively low cost - make it ideal for this kind of monitoring application.
Reconstruction of the conductivity change image involves the solution of a non-linear, ill-posed problem from noisy data. Stable solutions are typically achieved by the use of regularization. The talk will present an approach using Maximum a Posteriori (MAP) based regularization, in which the data and image priors are based on detailed modelling of image and noise priors. One of the most challenging problems in EIT - especially for long term monitoring applications - is dealing with errors in electrode measurements. Electronics drift, electrode movement and changing electrode impedance due to sweat and irritation introduce difficult-to-model errors into the data. Our work in progress to deal with some of these effects will be presented. The regularized image reconstruction model is modified to account for known data errors in terms of Bayesian prior information, allowing for the calculation of remarkably good images in the presence of severe single electrode data errors.
Background modeling for visual surveillance
May 22, 2003
For most video processes such as visual surveillance, the very first step to achieve is to detect moving objects. Indeed, background and foreground objects must be distinguished. For robust applications, this segmentation must be accurate and real-time. During, this presentation, two different pixelwise methods will be compared. The first one is an outdoor modeling, based on adaptive background mixtures, that deals with repetitive motions from clutter and long-term scene changes. The second one is an indoor method, based on background subtraction, that is able to cope with local illumination change problems like shadows and highlights. From the realisation of a background modeling, a method to classify objects in the scene will be presented. This method, based on object sizing and motion, will classify immobile objects as being either deposits or withdrawals. Experimental results, which demonstrate the system's performance, are shown.
Crafting the Observation Model for Regularized Image Up-sampling
May 13, 2003
Often used as an observation model for image interpolation, is the moving average a correct and accurate model for most circumstances? Are there other options? In this seminar we present a novel theoretical analysis of the regularized image up-sampling problem focusing on the data fidelity term. We start with formulation of the physical acquisition processes the image has undergone and develop a generalized design for the correct and accurate data fidelity term for regularized image up-sampling.
Minimal Paths and Deformable Models in Image Analysis (Applied Mathematics Seminar)
We present an overview of our work on minimal paths. Introduced first as a way to find the minimal energy of active contours and based on Fast Marching, we have used them afterwards for multiple contour finding in order to make contour completion in 2D and 3D images. Many variations allow improving time computation, simplifying initialization, or centering the path in a tubular structure. Fast Marching is also an efficient way to solve evolution of the balloon model through level sets. We show applications, mainly for medical images, in particular for vascular segmentation and Virtual Endoscopy.
Making stereoscopic pictures
July 18, 2002
This talk will discuss some of the principles and issues involved in making stereoscopic pictures. A number of examples will be shown using the anaglyph method with red/blue glasses.
Improving the Visual Comfort of Stereoscopic Images
July 11, 2002
Lew B. Stelmach, Wa James Tam, Filippo Speranza, Ron Renaud
Advanced Video Systems, Communications Research Centre Canada
It is widely recognized that viewers experience an enhanced sense of depth and realism when viewing stereoscopic images compared to regular monoscopic images. In general, the sense of depth is proportional to the amount of binocular disparity in the stereoscopic images. Cinematographers often exploit this in stereoscopic movies to produce dramatic motion towards the viewer, where objects appear to move out of the screen. However, a side-effect of producing images with large disparities is that viewers experience increased visual discomfort compared to images with relatively small disparities, or compared to monoscopic images. The exact cause of this discomfort is not fully understood, but likely factors include diplopia, mismatches between accommodation and convergence, and stressing of binocular correspondence processes by unnaturally high depth-of-field outside the fixation area.
The purpose of our research was to explore techniques for improving the visual comfort of stereoscopic images, by comparing viewers' responses to two stereoscopic camera configurations: parallel and converged. In the parallel camera configuration, the two cameras pointed straight ahead. In the converged configuration the cameras were toed-in. Parallel cameras were the default configuration, and our goal was discover whether the converged configuration yielded improved ratings of visual comfort.
For a parallel camera configuration, depth is conveyed exclusively by crossed disparities, because the zero-disparity point is located at an infinite distance from the cameras. By comparison, depth is conveyed by both crossed and uncrossed disparities for the converged configuration, because the zero-disparity point is located at a finite distance from the cameras. Consequently, for the latter case the same range of disparities is distributed among crossed and uncrossed values, and the absolute magnitude of disparity is reduced. We hypothesized that this may produce an advantage for ratings of visual comfort over the parallel configuration.
Consistent with these expectations we found that ratings of visual comfort were higher for the converged compared to parallel configurations, for moderate degrees of convergence. Camera configuration had a minimal effect on ratings of perceived depth.
Visual Masking at Video Scene Cuts
July 4, 2002
Wa James Tam, Lew Stelmach
Advanced Video Systems, Communications Research Centre Canada
The visual world around us is smooth and continuous. However, when we watch a movie or a television program, the visual world is smooth and continuous only for brief periods before switching to a different scene. What happens during a scene change or a scene cut is very interesting for visual and cognitive psychologists. By studying perceptual effects at scene cuts one can have a better understanding of how the visual system functions and utilize this knowledge in the design and development of video coding algorithms. In this presentation, we will provide some interesting demonstrations of visual masking and change blindness as well as present some data that we have collected on the visibility of blocky artifacts at scene cuts. We will also present an application in which visual masking at scene cuts is exploited in an innovative video compression scheme for stereoscopic television (3D TV).
3D Imaging: Neptec Product Direction
June 27, 2002
Neptec is a privately owned Canadian company that has been providing vision solutions for NASA and the Canadian Space Agency (CSA) for nearly 12 years. The company is developing a number of new 3D imaging products using the data and scanning laser technology obtained during a very successful trial in the Shuttle Discovery payload bay in August 2001. A few of the potential application areas include space, medical, mining, pulp and paper and security. Neptec is currently collaborating on a number of projects with researchers at Canadian universities. Neptec is interested in discussing projects of mutual interest with researchers at the University of Ottawa.
A Low-Complexity 2-D Hidden Markov Model with Application to Face Recognition
June 20, 2002
This seminar presents a simplified second-order 2-D Hidden Markov Model (HMM) and its application to the problem of Face Recognition (FR). The proposed approach is built on an assumption of conditional independence between feature blocks in close neighborhoods. The complexity of the hidden layer of the proposed model in the order of (2N2T), where N is the number of the states in the model and T is the total number of observation block in the image. The proposed system shows good performance when sufficient number of parameter is granted. Expectedly, the performance degrades in the small model case where fewer parameters are available. It will be shown how tying model parameters improves the performance of small models.
Receiver-Based Packet Loss Concealment for Pulse Code Modulation Coders
June 13, 2002
Voice-over-IP (VoIP), the transmission of packetized voice over IP networks, is gaining much attention as a possible alternative to conventional Public Switched Telephone Networks (PSTN). However, impairments present on IP networks, namely jitter, delay and channel errors can lead to the loss of packets at the receiving end. This packet loss degrades the speech quality. Model-based coders, especially G.729-A and G.723.1 International Telecommunication Union (ITU-T) Standards, have been extensively used for speech coding over IP networks because of their inherent ability to recover from erasure. Their built-in packet loss concealment makes their quality drop slowly with increasing amount of packet loss. However, their memory makes the transition from the concealed state to the correct state require a few frames and they actually tend to corrupt a few good packets before recovery as a result of a phenomenon known as " State Error". On the other hand, Pulse Code Modulation (PCM), although having a higher score than G.729 and G.723 in the periods of normal operations, does not have the ability to conceal erasure and the quality of speech during loss periods drops dramatically. Yet it can recover from packet loss more rapidly than model-based coders since the first speech sample in the first good packet restores speech to its original quality. The goal of this work is to develop a Packet Loss Concealment (PLC) algorithm to provide the G.711 PCM coders with the required ability to conceal erasure and maintain a high score of user satisfaction. This algorithm uses a receiver-based prediction model to develop an estimate of the missing speech segments.
Robust registration of virtual objects for real-time augmented reality
June 6, 2002
Computational Video Group, National Research Council
Vision-based registration techniques for augmented reality have been the subject of recent research recently due to their potential to align virtual objects with the real world. This talk describes the implementation of a fast, but accurate, vision-based corner tracker that forms the basis of a pattern-based augmented reality system. The tracker predicts corner positions by computing a homography between known corner locations on a planar pattern and potential planar regions in a video sequence. Local search windows are then placed around these predicted locations in order to find the actual subpixel corner positions. Experimental results show the robustness of the corner tracking system with respect to occlusion, scale, orientation, and lighting. Using the computed homography it is possible to perform augmentation by placing any 2D picture in place of the pattern in the image. Since the patterns are self-identifying we can perform unique augmentations for different patterns. We also use the computed homography to autocalibrate the camera in real-time, that is to compute the intrinsic and extrinsic camera parameters. This enables us to find the full perspective projection matrix for the camera. This in turn makes it possible to perform 3D augmentation, by placing a 3D object over the pattern in the image. By performing this autocalibration on-line we do not require a prior calibration step for different cameras.
Image coding for transmission over wireless CDMA channels
May 30, 2002
An image coding scheme based on the wavelet transform and lattice vector quantization (LVQ) for transmission over CDMA channels will be presented. For the purpose of providing reliable transmission, a joint source-channel coding method - rate compatible convolutional coding (RCPC) - is used for channels which have Rayleigh fading effects to protect the coded bitstream from the noise of CDMA channels. Three different kinds of wavelets - Daubechies, biorthogonal and B-spline wavelets - were used for the source coding of images, and the statistics of subband coefficients of these wavelet transformed images was analysed. For the CDMA channel model, multipath Rayleigh fading effects and the multiple access interference (MAI) were considered, and two different receivers were used to see their performance in handling those different types of interference. One is the Rake receiver used together with RCPC for channels having Rayleigh fading effects as well as MAI. Another is a recursive-least-square (RLS) minimum-mean-square-error (MMSE) receiver, which is based on the theory of blind multiuser detection, to be used for the cases of fixed CDMA wireless applications which do not have Rayleigh fading effects but still have MAI. The overall performance of the coding scheme based on Daubechies, biorthogonal and B-spline wavelets respectively is also shown.
An Empirical Study of Some Feature Matching Strategies
May 23, 2002
Epipolar geometry is a tool used to describe the relationship between views taken by a pair of cameras. In order to estimate the epipolar geometry of a pair of camera, a set of matched feature points between images taken by the cameras is usually used. It is difficult to obtain such reliable matches automatically, so some robust estimation method must be used. However, the performance of these methods still depends on the qualities of the set of candidate point matches. I will present and empirically evaluate some matching strategies and constraint that can improve candidate match sets, with the goal of estimating epipolar geometry.
This presentation will later be given at the VI '02 conference in Calgary.
High-Level Video Content Extraction and Representation for Real-Time Video Applications
May 10, 2002
Video is becoming integrated in various personal and professional applications such as entertainment, education, tele-medicine, databases, security applications and even low-bandwidth wireless applications. Automated and effective techniques to extract video content and to represent video based on its high-level content such as objects and events are therefore problems of increasing importance. Such high-level video representation would, for example, significantly facilitate the use and reduce the costs of video summarization (e.g., selection of film material) and video surveillance (e.g., event-based alerts) by humans.
This presentation gives an overview of state-of-the-art extraction techniques and introduces a computational framework for stable content-based video representation in terms of moving objects and their related semantic features. Moving objects are represented using low-level features. Generic semantic features are motion-related features such as events (e.g., deposit or removal of objects). To achieve higher applicability, content is extracted independently of the context of the input video. Three processing levels are activated: enhancement to estimate and reduce noise, analysis to extract meaningful objects, and interpretation to extract context-independent semantic features. The system is modular, and layered where low (e.g., motion-based), middle (e.g., trajectory-based), and high (e.g., event based) layers interact. The reliability and real-time response of the proposed system is demonstrated by extensive experimentation on more than 10 indoor and outdoor video shots containing a total of 6371 images with multi-object occlusion, noise, and coding artifacts.
Bi-level Video: Video Communication at Very Low Bit Rates
October 5, 2001
The rapid development of wired and wireless networks tremendously facilitates communications between people. However, most of the current wireless networks still work in low bandwidths, and mobile devices still suffer from weak computational power, short battery lifetime and limited display capability. We developed a very low bit-rate bi-level video coding technique, which can be used in video communications almost anywhere, anytime on any device. The spirit of this method is that rather than giving highest priority to the basic colors of an image as in conventional DCT-based compression methods, we give preference to the outline features of scenes when we have limited bandwidths. These features can be represented by bi-level image sequences that are converted from gray-scale image sequences. By analyzing the temporal correlation between successive frames and flexibilities in the scene presentation using bi-level images, we achieve very high ratios with our bi-level video compression scheme. Experiments show that in low bandwidths, our method provides clearer shape, smoother motion, shorter initial latency and much cheaper computational cost than do DCT-based methods. Our method is especially suitable for small mobile devices such as handheld PCs, palm-size PCs and mobile phones that possess small display screens and light computational power, and work in low bandwidth wireless networks. We have built PC and Pocket PC versions of bi-level video phone systems, which typically provide QCIF-size video with a frame rate of 5-15 fps for a 9.6 Kbps bandwidth.
Continuous Surface-Geometry Measurement by an Unconstrained and Untracked Range-Sensor Head
August 21, 2001
Mechanical Eng. Dept., University of Ottawa
Laser-camera range sensors are commonly used to measure the three-dimensional (3-D) surface geometry of objects for applications such as reverse engineering, product inspection, and environment modeling. The range-sensor head is typically mounted on a positioning device, instrumented with position sensors, to acquire numerous 3-D profiles which comprise a range image or range view. To measure the complete surface geometry of an object, the entire laser-camera sensing apparatus must be repositioned to new viewpoints to acquire several range views. This may be difficult or impossible, for complex objects and for interior surfaces which allow limited access. The presentation will discuss a method of surface-geometry measurement by a laser-camera range sensor which is permitted unconstrained and continuous motion, without requiring pose (position and orientation) measurement of the sensor head by other sensors, tracking systems, or positioning devices. The method is also applicable when the object, or both the object and sensor-head, have continuous unknown motion.
Minimization of edge artifacts in the discrete wavelet transform
July 24, 2001
The discrete wavelet transform (DWT) is a tool extensively used in image processing algorithms. It can be used to decorrelate information from the original image, which can thus help in compressing the data for storage,transmission or other post-processing purposes. However the finite nature of such images gives rise to edge artifacts in the reconstructed data. A commonly used technique to overcome this problem is a symmetric extension of the image, which can preserve zeroth order continuity in the data. This still produces undesirable edge artifacts in derivatives an subsampled versions of the image. In this paper we present an extension to Williams and Amaratunga's work of extrapolating the image data using a polynomial extrapolation technique before performing the forward or inverse DWT, for bi-orthogonal wavelets. Comparitive results of reconstructed data with individual subband reconstruction as well as using the embedded zerotree encoding (EZC) scheme, are also presented for both the aforementioned techniques.
DWT,embedded zerotree coding, biorthogonal wavelets, edge artifacts, symmetric extension, polynomial extrapolation.
July 17, 2001
In a complicated scenario, a robot is required to intelligently navigate its path heading to a determined goal location. The robot is initially positioned in a selected location and is required to navigate among random obstacles to reach a goal location. The navigation algorithm is responsible for determining the route from the initial position to the goal among the available random obstacles. In its way to the goal, the robot avoids collisions with the obstacles. Furthermore, an optimization technique is used in each path link in order to minimize the relative link angle. This optimization technique would result in an overall optimized route. Results show successful routing in circumstances of occluded goal position and mazes (A Demo will be available in the Seminar).
Automated Correspondences of Streamed Images: Emerging Imaging Applications
July 10, 2001
Computational Video Group, National Research Council
Processing video streams is an important imaging problem. A key problem in processing video is the correspondence problem. This means given overlapping images in a video sequence, track the matching 2D locations which represent common 3D features in the scene. In this talk we describe the projective vision toolkit (PVT) and its application to solving the correspondence problem. We focus on the how PCT is used in applications of motion annotation, model building, augmented reality and tracking.
Toward Realistic Image-Based Virtual Environment Representation
June 26, 2001
Image Based Rendering (IBR), which involves the combination of computer graphics and computer vision, is aimed at representing virtual environments and objects using real images, as opposed to geometrical modeling, the traditional method in computer graphics. In this talk, the framework of our method to represent the virtual environment will be given. Two essential topics will be introduced: image mosaic and view synthesis. Image warping, with a role similar to filtering in traditional image processing, will be emphasized in both topics.
Design and Implementation of an Entrance and Exit Monitoring System
June 19, 2001
Monitoring an entrance and exit of an enclosed structure (a room or a building) using computers connected to digital cameras is useful for keeping a database of pertinent information such as who is inside the structure, who has left the structure, time when a person entered and exited. This information can be used for a variety of purposes: security, statistics, etc. I will discuss methods of extracting this information from sequences of images as well as the Intel Computer Vision Library, which is a tool I will be using to implement these methods.
Detecting planes in an image pair
June 12, 2001
In an image pair, a plane in the scene defines a homography between the images. We propose an algorithm that detects planar homographies in uncalibrated image pairs. We then demonstrate how this plane identification method can be used as a first step in an image analysis process, when point matching between images is unreliable. The detection is performed using a RANSAC scheme based on the linear computation of the homography matrix elements using four points. Results are shown on real image pairs.
Reduction of bleed-through in scanned manuscript documents
June 5, 2001
Many old manuscript documents were written on both sides of the paper, and the bleed-through from one side of the document to the other increases the difficulty in reading or deciphering the information on the page. This seminar will present techniques for reducing such bleed-through distortion using digital image processing. Both sides of the document are scanned, maintaining full spatial and amplitude resolution (8 bits/sample). The bleed-through is reduced by processing both sides of the document simultaneously. First the verso side is flipped from left to right, and then the recto and flipped verso images are registered. This registration is necessary since it is impossible to perfectly align the front and back when scanning the document, and the scanner may not be perfectly uniform. We use a six-parameter affine transformation to register the two sides, determining the parameters using an optimization method. Once the two sides have been registered, areas consisting primarily of bleed-through are identified and replaced by the background color or intensity. The method has been tested on a number of documents, including documents we generated under controlled conditions and some original manuscripts; the readability of documents with heavy bleed-through has been greatly improved by this method.
A prototypal system for remote site exploration
August 16, 2000
Vincent Brun, Pascal Bonnevay
The objective of this talk is to present and describe the prototype of a system in which a viewer has the ability to visit and interactively explore an existing remote environment. Several applications can benefit from this kind of technology, such as teleoperation, telesurveillance, teleinspection and entertainment. The proposed approach is to use a 3D model of the remote site (allowing real-time unrestricted navigation) but in which the appearance of this 3D model is regularly refreshed to reflect the current state of the real site. This updating of the appearance model is done using the visual information obtained from a limited set of cameras strategically located at the remote site (thus allowing live visualization). The current prototype uses Java 3D technology allowing real-time 3D visualization and exploration of the remote site through the Internet.
Interactive Content-Based Image Retrieval (SITE Seminar)
July 25, 2000
University of Sydney
Content-based image retrieval (CBIR) is playing a key role in digital image/video library management and multimedia information processing. It is one of the focal research areas in the proposed MPEG-7 Standard for multimedia communications. Its applications range from telemedicine to distance education, entertainment industry and many more. Although many methods have been proposed and implemented in retrieval systems, the state-of-the-art is far from convenient use in the commercial world. Among other issues, two longstanding problems must be solved: a) gaps between the high level concepts and the low level features; b) the subjectivity of human perception. This is particularly true with the current compressed domain video/image coding standards (e.g. DCT in JPEG, MPEG-1 and 2, Wavelets in JPEG2000). Due to the apparent gaps between the features in the compressed domain (DCT and WT/VQ coefficients) and human perceptin, and the linear comparison criteria used, performance of the current retrieval systems is far from satisfactory. In this talk, I will present our recent work on interactive CBIR. In particular, we propose a retrieval system with several novel query models and nonlinear search units to bridge the gaps between high level concepts and low level features, and to simulate human perception. The proposed method has been tested on images from numerous digital image libraries. Comparison with the well know MARS system shows that our method consistently provides superior retrieval performance.
Ling Guan received his Ph.D. Degree in Electrical and Computer Engineering from University of British Columbia, Canada in 1989. In 1988-92, he was a Research Engineer of Array Systems Computing Inc, Toronto, Canada in machine vision and signal processing.
He started his academic career with the University of Sydney, Australia in October 1992, where he is the founder and director of the Signal and Multimedia Processing Lab. In 1994, he was a visiting fellow at British Telecom, and in 1999, he was a visiting professorial fellow at Tokyo Institute of Technology. He is currently on sabbatical leave and a visiting professor at Princeton University. His research interests include multimedia signal processing and systems, optimal search engine, computational intelligence and machine learning, adaptive image and signal processing. He has published more than 130 technical articles in these fields.
Stereo Algorithms and Representations for Image-Based Rendering (Screening)
July 19, 2000
In this talk, I will review a number of stereo matching algorithms and representations I have developed in the last few years. The talk focuses on techniques that are especially well suited for image-based rendering applications such as novel view generation and the mixing of live imagery with synthetic computer graphics. I will begin by reviewing some recent approaches to the classic problem of recovering a depth map from two or more images. I will then describe a number of newer representations (and their associated reconstruction algorithms), including volumetric representations, layered plane-plus-parallax representations, and multiple depth maps. Each of these techniques has its own strengths and weaknesses, which I will address. If time permits, I will also present our latest work on generating infinite video from short video clips (a kind of temporal texture synthesis).
Wavelet packet based Stereoscopic Image Coding Techniques: A subband perspective
July 12, 2000
An ideal representation of a 3D scenario can be achieved by using a computer to mimic the scenario as it would be seen by the human eye. This concept is basically what is known as Stereoscopic Imaging/ Image (SI). In SI a pair of images are used to represent a typical real world scene. Transmission of these image pairs over any communication channels presents us with interesting challenges. In an ideal case it would be perfect if we can transmit both the images. However these images are very similar to each other with minor differences in them. It is this redundancy which we exploit in order to obtain a disparity estimate between the two images. Hence the task of transmitting these images would reduce to coding one image ( for e.g the Left image ) and transmitting the disparity estimate. Present state of the art techniques in disparity estimation can trace their roots of similar techniques employed in motion estimation in moving images. Multiresolution techniques employing Wavelet Packet analysis is a relatively new field in sub-band coding which has developed significantly in the last decade. The goal of our research is thus to employ novel schemes of these techniques to acheive our goal of better SI coding and compression.
Present state of the art techniques employ a global disparity estimation on the SI pair. Multiresolution analysis is then performed on one of the image and the disparity estimate to code and hence compress the images. Our research will try to focus on employing the scheme completely using a Multiresolution analysis. We first decompose both images into a wavelet packet tree and perform a sub-band disparity estimation on the decomposed images. This gives us a computational advantage due to the fact that we are able to localize the high and low frequency components of the image. The problems we generally encounter in such a technique ( Sub-band disparity estimation) is aliasing in the subbands and edge effects. Hence the goal of our research work is to present some novel solutions and techniques for overcoming these problems.
View Planning for Automatic 3D Model Acquisition
July 5, 2000
The acquisition of 3D object models (also known as object reconstruction) using active laser scanning range sensors involves four main processes: view planning, sensing, registration and integration. Automated object reconstruction remains an open computer vision problem. This talk will present a concept for multi-stage, model-based view planning for automated object reconstruction in compliance with specified measurement criteria. The concept is also applicable to object inspection. A prototype view planning system has been implemented based on these concepts. The talk will define the problem, provide an overview of previous work, present the "3M" view planning concept, discuss computational complexity issues and present some early experimental results.
The talk will last about 20-25 minutes, followed by questions and discussion.
Introduction to Image-Based Rendering
June 28, 2000
Image-Based Rendering (IBR) involves a merging of Computer Vision, Image Processing and Computer Graphics. This technique, which can be used for remote telepresence, telemedicine, visualization, etc. has been very actively investigated over the last five years. In this seminar, the concept of IBR will be explained. Current research status concerning some IBR approaches and some coding approaches used in IBR will then be addressed. The seminar will conclude with some suggestions and possible research topics.
Several Key Issues in Image Coding for CDMA Transmission
June 21, 2000
Image coding for transmission over wireless CDMA channels seems to be a necessary option for the future mobile communication systems. For achieving that, several key problems have to be considered jointly: multiuser detection in noisy fading channels, image coding (transform or subband) and quantization, rate-distortion in CDMA channel conditions. In the seminar a review of the literature treating the above problems as well as some possible research directions will be given.
Generation of anaglyph stereoscopic images
July 14, 2000
This seminar describes a method to form an anaglyph stereo image from the left and right color images of a stereo pair. The anaglyph method was patented in 1891 by Louis Ducos du Hauron, but similar methods had been demonstrated previously by W. Rollmann in 1853 and J.C. D'Almeida in 1858. In the classic method, used for monochrome stereo images, the left view in blue (or green) is superimposed on the same image with the right view in red. When viewed through spectacles of corresponding colors but reversed, the three-dimensional effect is perceived. Methods for producing anaglyphs with some capability for color reproduction have also been described. There is very little literature on the production of anaglyph images, and what exists is very empirical. The method proposed in this seminar is adapted to the spectral absorption curves of the left and right filters of the anaglyph glasses. A projection technique is used to compute the anaglyph image that yields an image pair after the glasses producing a 3D image perceptually as similar as possible to the original stereo pair.
Estimation of Scaling and Mimic Parameters for a 3-D Face Model in Video Coding
June 7, 2000
For video coding applications, an algorithm for the automatic estimation of scaling and mimic parameters of a parametric 3-D face model is developed. In the first step, 2-D eye and mouth features are estimated in video sequences. In the second step, scaling and mimic parameters of the 3-D face model are calculated from the estimated 2-D eye and mouth features.
For estimation of the 2-D eye and mouth features a parametric 2-D eye model as well as a parametric 2-D mouth model for an open mouth and a parametric 2-D mouth model for a closed mouth are used. The parameters of the eye model represent the pupil, the eye corner points and the eye's opening height. The parameters of the mouth model represent the mouth corner points, the lip thickness and the lip's opening height. First, the pupil and the corner points are estimated using luminance templates. Then, the lip thickness and the lip's opening height as well as the eye's opening height are determined using the parametric 2-D eye or mouth models. According to an open or a closed mouth in the current frame of the video sequence an appropriate parametric 2-D mouth model will be automatically selected for estimating the lip thickness and the lip's opening height.
Afterwards, scaling and mimic parameters of the 3-D face model are calculated from the estimated values of the 2-D eye and mouth features. The used scaling parameters describe the face size, the eye size and the lip thickness of the 3-D face model. Mimic parameters are used to describe the local 3-D motion of the eyes and the mouth.
For evaluation of the described algorithms, the errors of 3-D distances between selected control points of the face model resulting from estimation errors of scaling and mimic parameters are introduced. Experimental results show that with an error for the estimation of the 2-D eye and mouth features, as it is determined with typical real sequences, an averaged 3-D distance error, which is resulted from the scaling parameter errors, of 2.26% and an averaged 3-D distance error, which is resulted from the mimic parameter errors, of 2.96% relative to the face width can be obtained. For further evaluation of the image quality a real video sequence is compared with an animation based on the estimated mimic parameters. Using the estimated mimic parameters, the animated facial expressions are comparable with real facial expressions.
The Problem of Building Models from Sensor Data
February 25, 2000
Visual Information Technology Group, National Research Council of Canada
In this talk I will discuss the problem of model building, which is an important current application in the field of computer vision and computer graphics. The goal is to go from sensor data to a 3d model that can be rendered in a virtual environment. First, I will describe each step in the model building pipeline, along with some open problems that remain. I will also discuss the place of passive versus active sensors, along with ways of automating the model building process. I will show 3d models that have been built using data obtained from active sensors developed at NRC.
The second half of the talk will focus on projective vision, which was originally intended to deal with an uncalibrated image sequence. Surprisingly, using projective vision methods it is now possible to solve the correspondence problem, which will certainly help automate the model building process. I will demonstrate some experimental results obtained using a publicly available projective vision toolkit that we have written.
Compositing a top view mosaic
December 2, 1999
In this seminar we describe an algorithm for reconstruction of a view form multiple images. The developments and experiments on the algorithm are still in progress. This algorithm is a part of a greater project whose aim is to develop a system based upon a concept of sensori-motor augmented reality which allows obstacle detection and avoidance in a more efficient manner compared to present-day systems. Some experimental results will be presented, also some future work will be proposed.
The Planar Perspective Matrix: Synthesized Perspectives Using Top View Mosaics
November 19, 1999
The planar perspective matrix is a 3x3 homography matrix used to correlate the top view of a plane to different perspective views. Its most important property is that the 3rd dimension is eliminated. The derivation of this matrix is based on the 3x4 perspective projection matrix. Element values are affected by the camera calibration matrix and translation and rotation of the camera as well. The simple case happens when the focal length is equal to the unity and no translation or rotation is involved.