The recent surge in machine learning and in particular deep learning using neural network has revolutionized many fields including speech processing, data mining and medicine. Arguably one of the greatest impacts of this revolution is in computer vision. Since the success of AlexNet at the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) where the deep neural network solution outperformed, by a significant margin, arguably much more advanced classical computer vision systems, deep neural networks can now be found everywhere in visual processing. This revolution has created enormous economic interest (Facebook, Google, …). The OCICS offering contains various courses that cover neural networks from a machine learning or data mining perspective but there is no dedicated course of machine learning in computer vision. This topics course is to fill this gap. While it necessarily includes machine learning background, it specifically looks at (convolutional) neural networks and their applications to standard problems in computer vision. It will also contrast the deep-learning based approaches to classical computer vision approaches and how classical approaches inform the design of these deep-learning based solutions.
Introduction to deep-learning in computer vision; statistical learning background, linear regression and classification; neural networks basics of feed forward networks, backpropagation and stochastic gradient descent; image processing and filtering primer; convolutional neural networks (CNNs), network layers, visualizing networks; (supervised) training and computer vision data sets and competitions; software for machine learning in computer vision; computer vision problems, in particular, image classification, detection and recognition, image generation, problem-specific detectors, multi-view problems and video object segmentation and tracking.
The course covers the fundamentals of deep neural networks in computer vision in lectures using multimedia support including program demonstrations and videos. The fundamentals are reviewed in the final exam. The active participation of students is encouraged through discussions and the student project presentations at the end of the course. The students are encouraged to apply their knowledge through three programming assignments.
(Available on-line: http://web.stanford.edu/~hastie/ElemStatLearn/
(Pre-print available on-line: http://szeliski.org/Book)
(Electronic version available for download from library http://biblio.uottawa.ca/en)
(Jupyter notebooks can be found at
and https://github.com/ageron/handson-ml2 for the shortly expected second edition).
(Available from http://cs231n.github.io)
(Free online book from http://neuralnetworksanddeeplearning.com/)
Course notes will be made available through Virtual Campus.
ImageNet competition, commercial applications, brief historical overview
Machine learning landscape, data handling, visualizing data, organizing the data, training, testing and validation
Linear regression review, linear least squares, regularization, logistic regression
Multi-layer perceptron, feed forward networks, activation functions, loss function, and training by back propagation
Gradient descent and stochastic gradient descent
Correlation, convolution and linear filters
Initialization, transfer learning, data augmentation, regularization, dropout, mini-batch normalization, data sets and competitions
Recognition, detection and (semantic) segmentation, ImageNet competitions, regions with CNNs (R-CNN), fully-convolutional networks (FCNNs), You Only Look Once (YOLO) detector, SSD: Single-Shot MultiBox Detector
Variational autoencoders (VAE) and generative adversarial networks (GAN), convolutional networks in GANs, image generation, image-to-image translation, style transfer, DCGAN, Pix2Pix, CycleGan, Neural style
Face detection (cascade design, multi-task CNN), people detector, pedestrian detector
Problem description, brief overview of classical methods, stereo and optical flow
Video segmentation (OSVOS) vs. on-line tracking
Student evaluation will be based on assignments, a project and a final exam.
The maximum is 100 marks*) with the following breakdown:
|3 Programming assignments (using Tensorflow, Jupyter notebooks)
|Project including oral presentation(s)
Project can be done in groups of up to 3. For group projects two presentations are required.
The final exam will be closed book.
Class attendance is mandatory. As per academic regulations, students who do not attend 80% of the class may not be allowed to write the final examinations.
All components of the course (i.e. assignments, projects, etc.) must be fulfilled otherwise students may receive an INC as a final mark (equivalent to an F). This also holds for a student who is taking the course for the second time.
Any form of plagiarism or fraud including on an assignment or the project will be reported.