Description: Description: UOttawa Logo

Description: Description: util_nav_top

 

Description: Description: h_util_nav_2

Description: Description: h_util_nav_N

Description: Description: h_util_nav_search

Description: Description: h_util_nav_i

Description: Description: h_util_nav_home

Description: Description: util_nav_top

 

 

uOttawa Description: Description: arrowEngineering Description: Description: arrowSITE Description: Description: arrowStaff Description: Description: arrowMiodrag Bolic

 

CEG4136 Computer Architecture III                

 

Instructor: Dr. Miodrag Bolic 
 

News

Catalog Description

Instructor and Teaching assistants

Time and Locations

Required Texts

Additional documents

Prerequisites

Grades

Course Outline

Laboratory

Assignments

 

Last change: October 30, 2012

NEWS                                                                                                                         

 

CATALOG DESCRIPTION                                                                                                                                   

Multiprocessor systems: vector processors, array processors, SIMD, MIMD systems. Interconnection networks. Multiprocessor architecture and programming. Multiprocessing control and algorithms. The PRAM model and algorithms. Message-passing models and algorithms. Scheduling and arbitration algorithms. Parallel virtual machine. Message passing interface. Performance measures for multiprocessor systems.

 

 

INSTRUCTOR AND TEACHING ASSISTANTS                                                                                             

 

Course staff

Name

E-mail address

Fall 2011 Office Hours

Location

Instructor

Miodrag Bolic

mbolic@site.uottawa.ca

Wednesdays, 12:00-13:00,

 

CBY A-616

Teaching assistants

By e-mail

 

 

 

 

TIME and LOCATIONS                                                                                                                            

 

Activity

Time

Location

Room

 LEC

 Monday, 10:00-11:30

 Morisset Hall (MRT)             

 Room: 252

 LEC

 Wednesday, 08:30-10:00

Morisset Hall (MRT)  

 Room: 252

LAB

Friday 2:30 – 5:30

SITE

 Room: 2061

TUT

Monday 2:30 – 4:00

 

 

 TEXTS                                                                                                                                                         

 

NO TEXT BOOK

 

Recommended:

Advanced Computer Architecture and Parallel Processing, by Hesham El-Rewini and Mostafa Abd-El-Barr, John Wiley and Sons, 2005.

http://ca.wiley.com/WileyCDA/WileyTitle/productCd-0471467405.html

 

Advanced Computer Architecture Parallelism, Scalability, Programmability, by  K. Hwang, McGraw-Hill 1993.    

 

Computer Architecture: A Quantitative Approach, by John L. Hennessy, David A. Patterson, David Goldberg, Morgan Kaufmann; 3rd edition, 2002.

 

Multiprocessor Systems-on-Chips (The Systems on Silicon Series) by Wayne Wolf, Morgan Kaufmann, 2004.

 

Advanced Computer Architectures – A Design Space Approach by Desco Sima, Terence Fountain and Peter Kascuk, Pearson, 1997.

 

 ADDITIONAL DOCUMENTS                                                                                                                                                    

 

Appendix E from Computer Architecture: A Quantitative Approach together with the slides.

 

 

PREREQUISITES                                                                                                                                      

:

CEG3131.

 

GRADES                                                                                                                                                                              

 

25% Particitation and Short Questions

30% Final

30% Project

15% Presentation and report

The final mark will be computed using the weighted sum of ALL of the above components. You need to have 50% of the exam component that includes Final and “Participation and Short Questions”

 

EXAMS (FINAL)

·         All exams are closed book.

·         Only material cover in the class, tutorials and labs will be on the exam.

 

QUESTIONS ABOUT MARKS

If you have a question about a mark you have received, this is the procedure (all other questions on marks will be ignored)

·         Schedule an appointment with the T.A. to see the work (if required).

·         Fill out and sign form (obtained from T.A., or download it herethanks to dr. Andy Adler for developing the form)

·         Submit to T.A.

·         You will receive a response within two weeks.

GRADING

Name

Quiz

Lab

Project or literature study

Exams

Miodrag Bolic

-

-

-

Final

 

 

 

 

-

 

 

 

 

 

COURSE OUTLINE                                                                                                                                              

 

 

Note: Lecture slides will in general be available for download before the lecture.

Lecture scribing from 2006 (Disclaimer: please note that all the material is written by students and did not go through peer review process. Be careful and critical when reading this document. Please send me your comments if you see typos or some problems with the material).

 

Topic number

Topic

Scribing from 2006

Literature

T1.    

Introduction, Review of cache memories

Lecture 1,2

 

T2.    

Performance analysis

Lecture 3

 

T3.    

Parallel models

Lecture 4

 

T4.    

Buses

Lecture 5

 

T5.    

Dynamic interconnection networks

Lecture 6,7

 

T6.    

Static networks

Lecture 8

 

T8.    

Shared memory systems

Lecture 9

 

T9.    

Cache coherence

Lecture 10,11

 

T10

GPU architectures: Slides, Doc, OpenCL: Slides, Doc

Lecture 12,13

 

T11

OpenMP, OpenAcc

 

 

Lecture 14

OpenMP Presentations:

·         OpenMP: An API for Writing Portable SMP Application Software: pdf (Slides 9-13, 17-19, 21-27)

http://www.openmp.org/presentations/sc99/sc99_tutorial.pdf

·         ICC’s High Performance Computing: OpenMP

http://www.llnl.gov/computing/tutorials/openMP/

 

OpenAcc 

http://www.ccs.tsukuba.ac.jp/CCS/files/slides/1205_LuizDeRose.pdf

T11.                         

Routing 1, 2

Lecture 15

 

T13.                         

Deadlock

Lecture 16

 

T14.                         

Message passing systems, MPI programming

Lecture 17

 

T15.                         

Embedded multicores

 

 Freescale Semiconductor, EmbeddedMulticore: An Introduction, , EMBMCRM, Rev. 0, 07/2009

T16.    

Network on chip

Lecture 18

 

T17.                         

Cache coherence for multicore computers

 

Chapter 2: On-Chip Networks 

Natalie Enright JergerLi-Shiuan Peh

2009

Abstract | PDF (2472 KB)

        T18.   

 Router microarchitecture

 

 Chapter 6: On-Chip Networks 

Natalie Enright JergerLi-Shiuan Peh

2009

Abstract | PDF (2472 KB)

 T19.          

 Cloud computing

 

 Introduction to Cloud Computing, Jonathan Parri, Report, University of Ottawa, 2011.

T20.

Review

 

 

 

Additional material (Disclaimer: please note that all the material is written by students and did not go through peer review process. Be careful and critical when reading these documents).

Topic number

Topic

Documents

Questions and answers

 

P1.    

Transactional memory (by Patrick Santos 2011)

Transactional memory (by Jonathan Parri)

 

JonathanTransactionalMemoryReport.pdf

questions_4465359.pdf

 

P2.    

Network-on-chip issues and challenges

and the SPIN network (by Mathieu Thibault-Marois)

Report_5049388.docx

Questions_5049388.docx

 

P3.    

Directory-Based Cache Coherence and Non Uniform Cache Architecture (NUCA) (by Marc DeMelo)

NUCA.docx

NUCA_Problems_list.docx

 

P4.    

Programming of shared memory GPUs (by Jean-Philippe_Bergeron)

CUDA_Report.pdf

CUDAQuestions.docx

 

P5.    

Real Time Operating Systems Implemented in Hardware (by Jake Swart)

RTOS report.pdf

 

 

P6.    

SIMD processor extensions (by Houffaneh Osman)

SIMD_Report[Revised_2].pdf

SIMD_Question.docx

 

P7.    

Adaptive routing (by David Ouellet-Poulin)

Report Adaptive routing.pdf

 

 

P8.    

Crossbar switch (by Alex Ayala)

Report Crossbar switches.pdf

 

 

P9.    

Performance Evaluation in Parallel Systems (by Alexey Borisenko)

Alexey_lec_scribe.pdf

Alexey_questions.pdf

 

P10.                       

Snoop based cache coherence (by Muge Guher)

Muge_Cache_Coherence.pdf

 

 

P11.                       

Thread schedulers and thread priority (by Daniel Shapiro)

Daniel_report.pdf

 

 

P12.                       

 

 

 

 

P13.                       

 

 

 

 

P14.                       

 FPGA architecture and design with IP cores

 

 

 

P15.                       

 

 

 

 

 

Presentations and reports from 2011

 

Name

Presentation topic

1.       

Tlhakanelo Polly

Describe the architecture of multi-bank caches. Give examples where they are used. What are multiport caches? Why do we need them?

2.       

Moeti   Letsholo

New trends in programmable logic design. Describe, for example, solutions from http://www.tabula.com

3.       

Militaru Aida

Describe what can be analyzed with Intel VTune Performance analyzer? How it can help us analyze performance of multicores?

4.       

Mathews Joseph Vivian

Describe architecture and instruction of Streaming SIMD Extension (SSE) of Intel processors. What is novel in SSE4?

5.       

Massis  Paul

Description of an AMBA bus? What is the function of the bridge? Show the architecture of a simple bridge.

6.       

Masinjila Ruslan

Describe one state-of-the-art multicore solution based on crossbar. One example is  IBM Cyclops64.

7.       

Maphane Obakeng

Architecture of mesh networks (processors, routers, connection to a shared or distributed memory, connection to peripherals). What are reconfigurable mesh networks? What is coterie topology? Give an example of a state of the art chip where mesh network is used.

8.       

L'Heureux  Mathieu Leon Ayoub

Programming of multicore processors and accelerators in Java: How does Aparapi tool help for programming AMD? What are the other tools that can facilitate Java programming for multicores?

9.       

Janelle Mathew

Show an example of snooping protocol implementation for Intel processors with multiple levels of caches. How is snooping done? At what level? Are the cashes inclusive? What are the signals that are carried between multiple levels of cashes and the bus?

10.   

Inambao Michael

Show examples of directory protocol implementation for multiprocessor systems on chip. One possibility can be to analyze Tilera processor.

11.   

El-Shabani Mohammad

Challenges of routing in on-chip networks. Look at the paper: “Route Packets, Not Wires: On-Chip Interconnection Networks” and other related papers.

12.   

Croskery Andrew

How is the deadlock resolved in large on-chip networks? What are usual design choices for on-chip network used to avoid deadlock? Give examples.

13.   

Chaudhry Fatima Aleya

How is the scheduling done for multicore processors? How does Windows do the scheduling for multicores? Is it possible for the programmer to take a control of it? How is the scheduling of tasks done in Java?

14.   

Bouchaara Rachid

Advanced server multicore chips. What is their architecture. Show, for example, the architecture of AMD Interlagos.

15.   

Bastien-Beaudet Julien

What is UPC programming language? What is Chapel? How is programming done in these languages?

16.   

Bariteau Cédric

What is the reason for having parallel file system? What is Lustre? How does it work?

17.   

Al Buhussain Ali Hassan

What is Amazon Compute Cloud? What is its architecture? What are the services it provides?

18.   

AlAmoudi Bandar Motahar

What is the RAID disc controller? What is a disk array?

 

 

PROJECTS                                                                                                                                      

 

Project should be related to parallel processing. It can include some novel architecture for parallel processing on an FPGA or some program that can be executed in parallel on the FPGA, OpenCL/CUDA implementation on GPU or similar. If this is a program, you have to show the need for parallel processing. If this is a new architecture that you are exploring or proposing for a particular problem, you need to give reasons why you selected this architecture.

 

So, you propose the project.

 

•         Project proposals are expected by October 1st and they will be approved/not approved during the week of October 1st.

•         Project have to be demonstrated and report submitted by November 16th.

•         Grade:  20% Project Proposal, 50% Demonstration, 30% final report

 

Project report

Proposal: The purposes of writing a project proposals are: (i) to determine the topic, (ii) to show that preliminary study of the subject materials have been done, (iii) to assess the likelihood of success of the project, (iv) to give the plan to carry out the project. You should submit one or two pages proposal to the TA for approval of the project. In the case of the rejection of the proposal, the team must come up with an revised proposal or an alternate new proposal before a deadline specified in the course outline. Preliminary discussion with the instructor can also be held in advance during their office hours. However, the opinion expressed by the teaching staff during these preliminary discussions are only suggestions. The team members are responsible to use their best judgement to prepare the proposal for approval.

The format of the proposal is as follows:

•         Title of the project

•         project highlight -- explain what you want to do in this project,

•         Motivation -- explain the significance of the proposed project and the relevance of the project to this course

•         Prior art -- listing at least three previous works (papers, books, etc.) that reported work most closely related to the current project. Briefly review their approaches, advantages and shortcomings.

•         Define the architecture that you will implement, describe software approaches

•         Define how you will test and what will be the final results for the demonstration

 

Report: A type-written, hardcopy project report, as well as an electronic version (including source code, design files developed) are to be submitted by the deadline. The length of the report is not restricted. However, the report must be include the following sections:

•         Introduction: Motivation and backgrounds.

•         Main body of report. Depending on types of project, this part may include method used, approaches taken, problem description, etc.

•         Conclusion and discussion: Highlight your achievement in this project and things may be done in the future.

 

 

 

PAPER ANALISYS                                                                                                                                    

 

Topics

 

Two types of projects:

Literature review: the result is a report in IEEE format - length minimum 5 pages

Problems and solutions: 20 - 30 complex problems with solutions for a specific area.

 

 

Literature review topics

L1.    Architectures of Embedded GPUs: for example Vivante GPU and similar, new Nvidia platforms for embedded systems Carma Devkit.

L2.    Architectures of embedded multicore platforms: for example

L3.    Epiphany multicore accelerator from Adapteva

L4.    Comparison between embedded and desktop-based (PCI-e based) GPUs

L5.    Low level programming of embedded GPUs

L6.    OpenCL standard for embedded systems

L7.    High level programming embedded GPU - Java, C, OpenCL?

L8.    OpenCL compilers for custom SIMD platforms

L9.    Comparison of directive based languages for GPUs such as for example OpenAcc and analysis of their potential implementation on embedded GPUs

L10.Multicore/multiprocessing DSP processors: for example TMS320C66x from TI and similar

L11.Energy Efficient Computing in Parallel Embedded Systems

L12.High Performance Mobile Computing - description of multiprocessing architectures used in mobile computing

 

 

 

 

Topics for problems with solutions

P1.   Parallel Architecture Issues in Multi-Systems: Distributed Computing, Cluster Computing, Grid Computing

P2.   Parallel Architecture of Graphics Processors

P3.   Reconfigurable Computing and IP-Cores on FPGA

P4.   DSP Processors

P5.   Parallel Processing in Embedded Systems

P6.   Energy Efficient Computing in Parallel Embedded Systems

P7.   Fault Tolerant Mechanisms in Parallel Embedded Systems

P8.   High Performance Mobile Computing

 

 

Content

The submitted document must contain the following:

Research papers should include survey papers (IEEE or ACM journals) or general research papers. Please do not copy sentences from the papers: see the paper on plagiarism: How to Handle Plagiarism: New Guidelines  . Paper analysis has to include how the papers are related to the scribed lecture. Technical reports, white papers or book chapters are not accepted.

 

Grading for the literature review

·         Formatting, Style and English (References have to be in proper form) 20%. Please take a look at the proposed format: IEEE format

·         Relevance of the selected papers: 10%

·         Quality of the analysis and presentation 55%

·         About 15 slides – 15%

 

Grading for the examples and problems

·         English and style: 10%

·         Quality of selected problems and/or examples. Please make sure that they come from several different sources and provide references 40%.

·         Accuracy and quality of solutions 50%

 

 

Due dates and turn-in instructions

The analysis should be at least 5 pages long. You have to follow formatting from IEEE format, which means that you need to use the same font, no line spacing and defined margins. The reports  have to be submitted by December 10th on midnight on the Virtual Campus.

 

Submission

All submissions are through Virtual Campus. You will NOT work in groups.

 

 

------------------------------

For example, the report can be organized in the following form:

•              Definition of performance metrics

•              Table that compares papers and/or industrial solutions based on the defined metrics

•              Description of the table

•              Future trends and conclusions

In this case the proposal will need to identify the research problem and identify the performance metrics that will be compared. If there is already a good survey paper on a particular topic then do not select that topic.

Example: Daniel Shapiro, Design Automation for an ASIP Empowered Asymmetric MPSoC report, University of Ottawa, 2009 (however, it is not in IEEE format).

Here, existing solutions are analyzed. Some design criteria are selected and comparison is presented in the table.

 

 

What to analyze

Research papers should include survey papers (IEEE or ACM journals) or general research papers. Please do not copy sentences from the papers: see the paper on plagiarism: How to Handle Plagiarism: New Guidelines.

 

 

 

ASSIGNMENTS FROM 2004 and QUIZES FROM 2005-2007                                                                                                                                    

There will be assignments from textbooks which will not be collected. Some assignments might include working with simulators.

 

Assignment

Problems

Solutions

1

Assignment 1

Assignment 1 Solutions

2

Assignment 2

Assignment 2 Solutions

3

Assignment 3

Assignment 3 Solutions

4

Assignment 4

Assignment 4 Solutions

5

Assignment 5

Assignment 5 Solutions

 

 

 

 

Quizzes from 2007

Quiz 1 Solutions

Quiz 2 Solutions

 

 

Quizzes from 2006

Quiz 1 Quiz 1 Solutions

Quiz 2 Quiz 2 Solutions

Quiz 3 Quiz 3 Solutions

Quiz 4 Quiz 4 Solutions

 

 

Quizzes from 2005

Quiz 1 (Chapters 1,3)  Quiz 1 Solutions

Quiz 2 (Chapters 2)  Quiz 2 Solutions

Quiz 3 (Chapter 5)   Quiz 3 Solutions

Quiz 4 (Additional literature)   Quiz 4 Solutions

Midterm solutions

Final

Final