The recent surge in machine learning and in particular deep learning using neural network has revolutionized many fields including speech processing, data mining and medicine. Arguably one of the greatest impacts of this revolution is in computer vision. Since the success of AlexNet at the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) where the deep neural network solution outperformed, by a significant margin, arguably much more advanced classical computer vision systems, deep neural networks can now be found everywhere in visual processing. This revolution has created enormous economic interest (Facebook, Google, …). The OCICS offering contains various courses that cover neural networks from a machine learning or data mining perspective but there is no dedicated course of machine learning in computer vision. This course is to fill this gap. While it necessarily includes machine learning background, it specifically looks at (convolutional) neural networks and their applications to standard problems in computer vision. It will also contrast the deep-learning based approaches to classical computer vision approaches and how classical approaches inform the design of these deep-learning based solutions.
Introduction to learning-based computer vision; statistical learning background; image processing and filtering primer; convolutional neural networks (CNNs), network layers, computer vision data sets and competitions; computer vision problems, in particular, image classification, detection and recognition, semantic segmentation, image generation, multi-view problems and tracking.
The course material will be covered in in-person, synchronous and asynchronous on-line lectures including program demonstrations. Additional resources in form of textbooks and on-line references are listed below. The course will be using group work and interactive student feedback using Virtual Campus (Brightspace) and Microsoft Teams. Students are encouraged to apply their knowledge through three programming assignments in Jupyter notebooks using Scikit-Learn, Keras and Tensorflow. Participation in the course requires approriate access to resources. The active participation of students is encouraged through discussions, the group video presentation and the individual student project presentations.
(Available on-line: http://web.stanford.edu/~hastie/ElemStatLearn/
(Pre-print including draft of 2nd ed. available on-line: http://szeliski.org/Book)
(Electronic version available for download from library http://biblio.uottawa.ca/en)
(Available from http://cs231n.github.io)
(Free online book from http://neuralnetworksanddeeplearning.com/)
Course notes will be made available through Virtual Campus.
ImageNet competition, commercial applications, brief historical overview Machine learning landscape, data handling, visualizing data, organizing the data, training, testing and validation
Linear regression review, linear least squares, regularization, logistic regression
Multi-layer perceptron, feed forward networks, activation functions, loss function, and training by back propagation. Gradient descent and stochastic gradient descent
Correlation, convolution and linear filters
Convolutional, pooling and fully-connected layers, visualizing CNNs
Initialization, transfer learning, data augmentation, regularization, dropout, mini-batch normalization, data sets and competitions
ImageNet competitions, regions with CNNs (R-CNN), fully-convolutional networks (FCNNs), U-Net, one-stage detectors, You Only Look Once (YOLO) detector, SSD: Single-Shot MultiBox Detector, instance segmentation
Image generation, image-to-image translation and style transfer. Variational autoencoders (VAE) and generative adversarial networks (GAN).
Hourglass networks, cascade design, attention layers and multi-task networks. Applications as face detector, people detector and pedestrian detector.
Problem description for stereo and optical flow, geometric constraints and brief overview of classical methods.
Network designs for pixelwise classification and regression. Supervised and unsupervised training. Loss functions, occlusion handling.
Tracking by detection, discriminative and generative models, part-based trackers, discriminative correlation filters, siamese networks, short-term and long-term tracking, multi-object tracking, on-line tracking and real-time tracking, video object segmentation.
Student evaluation will be based on assignments and a project.
The maximum is 100 marks*) with the following breakdown:
|3 Programming assignments (using Tensorflow, Jupyter notebooks)
Groups of about 5 students need to make three 10 minute video presentations on a course topic. Topics will be assigned.
|Project including proposal and oral presentation
Project must be done individually. A video presentation is required.
All components of the course (i.e. assignments, projects, etc.) must be fulfilled otherwise students may receive an INC as a final mark (equivalent to an F). This also holds for a student who is taking the course for the second time.
Any form of plagiarism or fraud including on an assignment or the project will be reported.