|
|
||||||
|
|
||||||
|
|
||||||
|
|
CEG4136 Computer
Architecture III
Instructor: Dr.
Miodrag Bolic
Instructor and Teaching assistants
Last change: October
30, 2012
Multiprocessor systems: vector processors, array processors, SIMD, MIMD
systems. Interconnection networks. Multiprocessor
architecture and programming. Multiprocessing control
and algorithms. The PRAM model and algorithms. Message-passing models and algorithms. Scheduling
and arbitration algorithms. Parallel virtual machine.
Message passing interface. Performance measures for
multiprocessor systems.
INSTRUCTOR AND TEACHING ASSISTANTS
|
Course staff |
Name |
E-mail address |
Fall 2011 Office Hours |
Location |
|
Instructor |
Miodrag
Bolic |
Wednesdays,
12:00-13:00, |
CBY A-616 |
|
|
Teaching assistants |
By
e-mail |
|
||
|
|
||||
|
|
|
Activity |
Time |
Location |
Room |
|
LEC |
Monday, 10:00-11:30 |
Morisset
Hall (MRT) |
Room: 252 |
|
LEC |
Wednesday,
08:30-10:00 |
Morisset Hall (MRT) |
Room: 252 |
|
LAB |
Friday 2:30 – 5:30 |
SITE |
Room: 2061 |
|
TUT |
Monday 2:30 – 4:00 |
NO TEXT BOOK
Recommended:
Advanced Computer Architecture and Parallel Processing, by Hesham El-Rewini and Mostafa Abd-El-Barr,
John Wiley and Sons, 2005.
http://ca.wiley.com/WileyCDA/WileyTitle/productCd-0471467405.html
Advanced Computer Architecture Parallelism,
Scalability, Programmability, by K. Hwang, McGraw-Hill
1993.
Computer
Architecture: A Quantitative Approach, by John L. Hennessy, David A. Patterson, David Goldberg, Morgan
Kaufmann; 3rd edition, 2002.
Multiprocessor
Systems-on-Chips (The Systems on Silicon Series) by Wayne Wolf, Morgan Kaufmann,
2004.
Advanced Computer Architectures – A Design
Space Approach by Desco Sima, Terence Fountain and Peter Kascuk,
Pearson, 1997.
Appendix E
from Computer
Architecture: A Quantitative Approach together with the slides.
:
CEG3131.
25% Particitation and
Short Questions
30% Final
30% Project
15% Presentation and
report
The final mark will be computed using the
weighted sum of ALL of the above components. You need to have 50% of the
exam component that includes Final and “Participation and Short Questions”
EXAMS (FINAL)
·
All exams are closed book.
·
Only material cover in the class, tutorials and
labs will be on the exam.
If you have
a question about a mark you have received, this is the procedure (all other
questions on marks will be ignored)
·
Schedule
an appointment with the T.A. to see the work (if required).
·
Fill
out and sign form (obtained from T.A., or download
it here – thanks to dr. Andy Adler for developing the form)
·
Submit
to T.A.
·
You
will receive a response within two weeks.
GRADING
|
Name |
Quiz |
Lab |
Project or literature study |
Exams |
|
Miodrag
Bolic |
- |
- |
- |
Final |
|
|
|
|
|
- |
|
|
|
|
Note: Lecture slides will in general be available
for download before the lecture.
Lecture scribing from 2006 (Disclaimer: please note
that all the material is written by students and did not go through peer review
process. Be careful and critical when reading this document. Please send me
your comments if you see typos or some problems with the material).
|
Topic
number |
Topic |
Scribing
from 2006 |
Literature |
|
T1. |
Lecture
1,2 |
|
|
|
T2. |
Lecture
3 |
|
|
|
T3. |
|
|
|
|
T4. |
Lecture
5 |
|
|
|
T5. |
|
|
|
|
T6. |
Lecture
8 |
|
|
|
T8. |
Lecture
9 |
|
|
|
T9. |
|
|
|
|
T10 |
Lecture
12,13 |
|
|
|
T11 |
OpenMP, OpenAcc |
|
· OpenMP: An
API for Writing Portable SMP Application Software: pdf (Slides 9-13, 17-19, 21-27) http://www.openmp.org/presentations/sc99/sc99_tutorial.pdf · ICC’s High Performance Computing: OpenMP http://www.llnl.gov/computing/tutorials/openMP/ OpenAcc http://www.ccs.tsukuba.ac.jp/CCS/files/slides/1205_LuizDeRose.pdf |
|
T11. |
Lecture
15 |
|
|
|
T13. |
|
|
|
|
T14. |
|
|
|
|
T15. |
|
|
|
|
T16. |
Lecture
18 |
|
|
|
T17. |
|
Natalie Enright Jerger, Li-Shiuan Peh 2009 |
|
|
|
|
Natalie Enright Jerger, Li-Shiuan Peh 2009 |
|
|
T19. |
|
|
|
|
T20. |
|
|
Additional
material (Disclaimer: please note that all the material is written by students
and did not go through peer review process. Be careful and critical when reading
these documents).
|
Topic
number |
Topic |
Documents |
Questions
and answers |
|
|
P1. |
Transactional
memory (by Patrick
Santos 2011) Transactional
memory (by Jonathan
Parri) |
|
|
|
|
P2. |
Network-on-chip
issues and challenges and the SPIN network (by Mathieu
Thibault-Marois) |
|
||
|
P3. |
Directory-Based Cache Coherence and Non Uniform Cache
Architecture (NUCA) (by Marc DeMelo) |
|
||
|
P4. |
Programming of shared memory GPUs (by Jean-Philippe_Bergeron) |
|
||
|
P5. |
Real Time Operating Systems Implemented in Hardware (by Jake Swart) |
|
|
|
|
P6. |
SIMD processor extensions (by Houffaneh Osman) |
|
||
|
P7. |
Adaptive routing (by David Ouellet-Poulin) |
|
|
|
|
P8. |
Crossbar switch (by Alex Ayala) |
|
|
|
|
P9. |
Performance Evaluation in Parallel Systems (by Alexey
Borisenko) |
|
||
|
P10. |
Snoop based cache coherence (by Muge Guher) |
|
|
|
|
P11. |
Thread schedulers and thread priority (by Daniel
Shapiro) |
|
|
|
|
P12. |
|
|
|
|
|
P13. |
|
|
|
|
|
P14. |
|
|
|
|
|
P15. |
|
|
|
|
Presentations
and reports from 2011
|
|
Name |
Presentation topic |
|
1.
|
Tlhakanelo Polly |
Describe the architecture of
multi-bank caches. Give examples where
they are used. What are multiport caches? Why do we need them? |
|
2.
|
Moeti Letsholo |
New trends in programmable logic design. Describe, for example,
solutions from http://www.tabula.com
|
|
3.
|
Militaru Aida |
Describe what can be analyzed with Intel VTune
Performance analyzer? How it can help us analyze performance of multicores? |
|
4.
|
Mathews
Joseph Vivian |
Describe architecture and instruction of Streaming SIMD Extension
(SSE) of Intel processors. What is novel in SSE4? |
|
5.
|
Massis Paul |
Description of an AMBA bus? What is the function of the bridge? Show
the architecture of a simple bridge. |
|
6.
|
Masinjila Ruslan |
Describe one state-of-the-art multicore solution based on crossbar.
One example is IBM
Cyclops64. |
|
7.
|
Maphane Obakeng |
Architecture of mesh networks (processors, routers, connection to a
shared or distributed memory, connection to peripherals).
What are reconfigurable mesh networks? What is coterie topology? Give an
example of a state of the art chip where mesh network is used. |
|
8.
|
L'Heureux Mathieu Leon
Ayoub |
Programming of multicore processors and accelerators in Java: How does
Aparapi tool help for programming AMD? What are the
other tools that can facilitate Java programming for multicores? |
|
9.
|
Janelle Mathew |
Show an example of snooping protocol implementation for Intel processors
with multiple levels of caches. How is snooping done? At what level? Are the cashes inclusive? What are the signals that are
carried between multiple levels of cashes and the bus? |
|
10. |
Inambao Michael |
Show examples of directory protocol implementation for multiprocessor
systems on chip. One possibility can be to analyze Tilera
processor. |
|
11. |
El-Shabani Mohammad |
Challenges of routing in on-chip networks. Look at the paper: “Route Packets,
Not Wires: On-Chip Interconnection Networks” and other related papers. |
|
12. |
Croskery Andrew |
How is the deadlock resolved in large on-chip networks? What are usual
design choices for on-chip network used to avoid deadlock? Give examples. |
|
13. |
Chaudhry Fatima Aleya |
How is the scheduling done for multicore processors? How does Windows do the scheduling for multicores? Is it
possible for the programmer to take a control of it? How is the scheduling of
tasks done in Java? |
|
14. |
Bouchaara Rachid |
Advanced server multicore chips. What is their architecture.
Show, for example, the architecture of AMD Interlagos. |
|
15. |
Bastien-Beaudet Julien |
What is UPC programming language? What is Chapel? How is programming
done in these languages? |
|
16. |
Bariteau Cédric |
What is the reason for having parallel file system? What is Lustre? How does it work? |
|
17. |
Al Buhussain Ali Hassan |
What is Amazon Compute Cloud? What is its architecture? What are the
services it provides? |
|
18. |
AlAmoudi Bandar Motahar |
What is the RAID disc controller? What is a disk array? |
PROJECTS
Project should be related to
parallel processing. It can include some novel architecture for parallel
processing on an FPGA or some program that can be executed in parallel on the
FPGA, OpenCL/CUDA implementation on GPU or similar.
If this is a program, you have to show the need for parallel processing. If
this is a new architecture that you are exploring or proposing for a particular
problem, you need to give reasons why you selected this architecture.
So, you propose the project.
• Project
proposals are expected by October 1st and they will be
approved/not approved during the week of October 1st.
• Project
have to be demonstrated and report submitted by November 16th.
• Grade: 20%
Project Proposal, 50% Demonstration, 30% final report
Project
report
Proposal: The purposes of writing a project proposals are: (i) to determine the topic, (ii) to show that preliminary
study of the subject materials have been done, (iii) to assess the likelihood
of success of the project, (iv) to give the plan to carry out the project. You
should submit one or two pages proposal to the TA for approval of the project.
In the case of the rejection of the proposal, the team must come up with an revised proposal or an alternate new proposal before a
deadline specified in the course outline. Preliminary discussion with the
instructor can also be held in advance during their office hours. However, the opinion expressed by the teaching staff during these preliminary
discussions are only suggestions. The team members are responsible to
use their best judgement to prepare the proposal for approval.
The format of the proposal is as follows:
• Title
of the project
• project
highlight -- explain what you want to do in this project,
• Motivation
-- explain the significance of the proposed project and the relevance of the
project to this course
• Prior
art -- listing at least three previous works (papers, books, etc.) that
reported work most closely related to the current project. Briefly review their
approaches, advantages and shortcomings.
• Define
the architecture that you will implement, describe software approaches
• Define
how you will test and what will be the final results for the demonstration
Report: A type-written, hardcopy project report, as well as an
electronic version (including source code, design files developed) are to be
submitted by the deadline. The length of the report is not restricted. However,
the report must be include the following sections:
• Introduction:
Motivation and backgrounds.
• Main
body of report. Depending on types of project, this part may include method
used, approaches taken, problem description, etc.
• Conclusion
and discussion: Highlight your achievement in this project and things may
be done in the future.
PAPER
ANALISYS
Topics
Two types of projects:
Literature
review: the result is a report in IEEE format - length minimum 5 pages
Problems and
solutions: 20 - 30 complex problems with solutions for a specific area.
Literature review topics
L1. Architectures of Embedded GPUs: for example
Vivante GPU and similar, new Nvidia
platforms for embedded systems Carma Devkit.
L2. Architectures of embedded multicore
platforms: for example
L3. Epiphany multicore accelerator from Adapteva
L4. Comparison between embedded and
desktop-based (PCI-e based) GPUs
L5. Low level programming of embedded GPUs
L6. OpenCL standard for embedded systems
L7. High level programming embedded GPU - Java,
C, OpenCL?
L8. OpenCL compilers for custom SIMD platforms
L9. Comparison of directive based languages for
GPUs such as for example OpenAcc and analysis of
their potential implementation on embedded GPUs
L10.Multicore/multiprocessing DSP processors:
for example TMS320C66x from TI and similar
L11.Energy Efficient
Computing in Parallel Embedded Systems
L12.High Performance
Mobile Computing - description of multiprocessing architectures used in mobile
computing
Topics for problems with solutions
P1.
Parallel Architecture Issues in Multi-Systems:
Distributed Computing, Cluster Computing, Grid Computing
P2.
Parallel Architecture of Graphics Processors
P3.
Reconfigurable Computing and IP-Cores on FPGA
P4.
DSP Processors
P5.
Parallel Processing in Embedded Systems
P6.
Energy Efficient Computing in Parallel Embedded
Systems
P7.
Fault Tolerant Mechanisms in Parallel Embedded
Systems
P8.
High Performance Mobile Computing
Content
The
submitted document must contain the following:
Research
papers should include survey papers (IEEE or ACM journals) or general research
papers. Please do not copy sentences from the papers: see the paper on
plagiarism: How
to Handle Plagiarism: New Guidelines . Paper analysis
has to include how the papers are related to the scribed lecture. Technical
reports, white papers or book chapters are not accepted.
Grading for the literature review
·
Formatting,
Style and English (References have to be in proper form) 20%. Please take a
look at the proposed format: IEEE
format
·
Relevance
of the selected papers: 10%
·
Quality
of the analysis and presentation 55%
·
About
15 slides – 15%
Grading for the examples and problems
·
English
and style: 10%
·
Quality
of selected problems and/or examples. Please make sure that they come from
several different sources and provide references 40%.
·
Accuracy
and quality of solutions 50%
Due dates and turn-in instructions
The
analysis should be at least 5 pages long. You have to follow formatting from IEEE format, which means that you need
to use the same font, no line spacing and defined margins. The reports have to be
submitted by December 10th on midnight on the Virtual Campus.
Submission
All submissions
are through Virtual Campus. You will NOT work in groups.
------------------------------
For example, the
report can be organized in the following form:
•
Definition of performance metrics
•
Table that compares papers and/or industrial solutions based on the defined
metrics
•
Description of the table
•
Future trends and conclusions
In this case the
proposal will need to identify the research problem and identify the
performance metrics that will be compared. If there is already a good survey
paper on a particular topic then do not select that topic.
Example: Daniel
Shapiro, Design Automation for an ASIP Empowered
Asymmetric MPSoC report, University of Ottawa, 2009
(however, it is not in IEEE format).
Here, existing
solutions are analyzed. Some design criteria are selected and comparison is
presented in the table.
What to analyze
Research papers
should include survey papers (IEEE or ACM journals) or general research papers.
Please do not copy sentences from the papers: see the paper on plagiarism: How to Handle Plagiarism: New Guidelines.
ASSIGNMENTS FROM 2004 and QUIZES FROM 2005-2007
There will be assignments from textbooks which
will not be collected. Some assignments might include working with simulators.
|
Assignment |
Problems |
Solutions |
|
1 |
||
|
2 |
||
|
3 |
||
|
4 |
||
|
5 |
||
|
|
|
|
Quizzes from 2007
Quizzes from 2006
Quiz 1 Quiz 1 Solutions
Quiz 2 Quiz 2 Solutions
Quiz 3 Quiz 3 Solutions
Quiz 4 Quiz 4 Solutions
Quizzes from
2005
Quiz 1 (Chapters 1,3) Quiz 1 Solutions
Quiz 2 (Chapters 2) Quiz 2 Solutions
Quiz 3 (Chapter 5) Quiz 3 Solutions
Quiz 4 (Additional literature) Quiz 4 Solutions
Final