Learning-based Computer Vision

Professor

Jochen Lang

Contact

Teaching Assistant

Zishen Chen

Contact
  • zchen314@uottawa.ca
  • Office hours: tbd
  • Office: tbd
  • Tasks: Help withs projects and assignments, marking projects and assignments

General and Specific Objectives of the Course

The recent surge in machine learning and in particular deep learning using neural network has revolutionized many fields including language processing, data processing and medicine. Arguably one of the greatest impacts of this revolution is in computer vision. Since the success of AlexNet at the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) where the deep neural network solution outperformed, by a significant margin, arguably much more advanced classical computer vision systems, deep neural networks can now be found everywhere in visual processing. This revolution has created enormous economic interest. The OCICS offering contains various courses that cover neural networks from a machine learning or data mining perspective but there is no dedicated course of machine learning in computer vision. This course is to fill this gap. While it necessarily includes machine learning background, it specifically looks at neural networks and their applications to standard problems in computer vision. It will also contrast the deep-learning based approaches to classical computer vision approaches and how classical approaches inform the design of these deep-learning based solutions.


Calendar Description

Introduction to learning-based computer vision; statistical learning background; image processing and filtering primer; convolutional neural networks (CNNs), network layers, computer vision data sets and competitions; computer vision problems, in particular, image classification, detection and recognition, semantic segmentation, image generation, multi-view problems and tracking.

Course Prerequisites: None

Teaching Methods and Student Expectations

The course material will be delivered in-person. Additional resources in form of textbooks and on-line references are listed below. The course will be using group work and interactive student feedback using Virtual Campus (Brightspace) and Microsoft Teams. Students are encouraged to apply their knowledge through three programming assignments in Jupyter notebooks using Scikit-Learn, Keras and PyTorch. Participation in the course requires approriate access to resources. The active participation of students is encouraged through discussions, a video presentation and an in-person project presentations.


Recommended Textbooks and Additional Resources

  • General statistical learning
  • Computer vision
    • Antonio Torralba, Phillip Isola, and William T. Freeman, Foundations of Computer Vision, MIT Press, 2024.

      (Pre-print available on-line: https://visionbook.mit.edu)

    • Richard Szeliski, Computer Vision: Algorithms and Applications, Springer, 2nd ed., 2022.

      (Pre-print available on-line: http://szeliski.org/Book)

    • Reinhard Klette, Concise Computer Vision: An Introduction into Theory and Algorithms, Springer, 2014.

      (Electronic version available for download from library http://biblio.uottawa.ca/en)

  • Deep Learning

Course Topics and Readings

Course notes will be made available through Virtual Campus.

  • Introduction and course overview, hands-on machine learning

    Machine learning landscape, data handling, visualizing data, organizing the data, training, testing and validation

  • Statistical learning background

    Linear regression review, linear least squares, regularization, logistic regression

  • Neural networks basics, non-linear optimization

    Multi-layer perceptron, feed forward networks, activation functions, loss function, and training by back propagation. Gradient descent and stochastic gradient descent

  • Image processing and filtering (self-study)

    Correlation, convolution and linear filters

  • Convolutional neural networks (CNNs) “Classic layers”

    Convolutional, pooling and fully-connected layers, visualizing CNNs

  • Training CNNs

    Initialization, transfer learning, data augmentation, regularization, dropout, mini-batch normalization, data sets and competitions

  • CNN Architectures for image classification and object detection

    Metrics, ImageNet competitions, CNN architectures

  • Transformer architectures for computer vision

    Vanilla transformer: attention, positional encoding and normalization. Image transformer, DETR, ViT, Swin transformer.

  • Segmentation

    Benchmarks and metrics, architectures, use of attention.

  • Computer vision foundation models

    Overview of visual understanding and generation, unified vision models, large multimodal models, example architectures

  • Multiview problems

    Problem description for stereo and optical flow, geometric constraints and brief overview of classical methods, metrics.

  • Learning-based stereo and optical flow

    Network designs for pixelwise classification and regression. Supervised and unsupervised training. Loss functions, occlusion handling.

  • Image generation and translation (if time permits)

    Image generation, image-to-image translation and style transfer. Variational autoencoders (VAE) and generative adversarial networks (GAN), diffusion models

Student Evaluation

Student evaluation will be based on assignments and a project.

Marking Scheme

The maximum is 100 marks*) with the following breakdown:

3 Programming assignments (using Tensorflow, Jupyter notebooks)
  • Linear and logistic regression
  • Image recognition
  • Transfer learning for small data-sets
30 marks
Project including oral presentation

Project must be done in groups of 2. Two presentations are required. Marked in stages: topic, background presentation, proposal, final presentation and report.

40 marks
Final Exam

Closed book.

30 marks

Reminder: Academic Regulations

All components of the course (i.e. assignments, projects, etc.) must be fulfilled otherwise students may receive an INC as a final mark (equivalent to an F). This also holds for a student who is taking the course for the second time.


Academic Fraud and Plagiarism

For any plagiarism or fraud the university regulation on academic integrity and misconduct applies. A website explains the rules surrounding academic integrity and the use of AI. Please familiarize yourself with them.