ELG7187 (EACJ5808) Topics in Computers: Multiprocessor Systems on Chip
PROFESSOR
Miodrag Bolic
School of Information Technology and Engineering (SITE), University of Ottawa
Tel: (613) 562-5800 x 6224, Fax: (613) 562-5175
Email: mbolic@site.uottawa.ca
Web: www.site.uottawa.ca/~mbolic
Office Hours: Monday 11:30-13:00, CBY A-616
NEWS and MESSAGES
March 30th, 2012: Some links are corrected.
COURSE DESCRIPTION
Architectures of multiprocessing systems, Interconnection networks, Cache coherence, Synchronization, Systems on chip, Multicore architectures, Graphical processing units (GPU), Parallel programming, Operating systems for embedded multiprocessors. Case studies: parallel processing with Nios II and XTensa, Cell processor, Inter Core i7.
COURSE SCHEDULE
Activity |
Time |
Location |
LEC |
Monday, 5:30 -8:00 |
STE J0106 |
SUGGESTED TEXTS
· [Culler98] David Culler, Jaswinder Pal Singh, and Anoop Gupta, Morgan Kaufmann, Parallel Computer Architecture: A Hardware/Software Approach, Morgan Kaufmann; 1 edition (Aug 1 1998).
· [Hwang93] K. Hwang, Advanced Computer Architecture Parallelism, Scalability, Programmability, , McGraw-Hill 1993.
· [Hennessy2006] John L. Hennessy, David A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann; 4rd edition, 2006.
Synthesis Lectures on Computer Architecture:
· [Harris10] Transactional Memory, 2nd edition
Tim Harris , James Larus , Ravi Rajwar
June 2010
Abstract | PDF (1899 KB) | PDF Plus (1899 KB)
· [Eeckhout10] Computer Architecture Performance Evaluation Methods
June 2010
Abstract | PDF (1695 KB) | PDF Plus (1696 KB)
· [Jerger10] On-Chip Networks
Natalie Enright Jerger, Li-Shiuan Peh
2009
Abstract | PDF (2472 KB) | PDF Plus (2845 KB)
· [Kaxiras08] Computer Architecture Techniques for Power-Efficiency
Stefanos Kaxiras, Margaret Martonosi
2008
Abstract | PDF (5368 KB) | PDF Plus (5369 KB)
· [Kaxiras07] Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency
Kunle Olukotun, Lance Hammond, James Laudon
2007
Abstract | PDF (5693 KB) | PDF Plus (2661 KB)
PREREQUISITES
Carleton University: SYSC 4507
University of Ottawa: CEG3156
Or equivalent courses.
TOPICS DISCUSSED
(This is a very preliminary schedule)
Week of |
Topic |
Literature |
Quizzes |
Jan 9 |
Introduction parallel computer architectures, |
Freescale Semiconductor, Embedded Multicore: An Introduction, , EMBMCRM, Rev. 0, 07/2009 |
|
Jan 16 |
Mandatory reading: 1. Lei Hu and Ian Gorton, Performance Evaluation for Parallel Systems: A Survey, University of NSW, Australia, UNSW-CSE-TR-9707, October 1997. 2. B. Sprunt, The Basics of Performance Monitoring Hardware, IEEE Micro, July-August, page 64-71, 2002.
Additional reading: · "Reevaluating Amdahl's Law in the Multicore Era". Presentation slides can be found at here · [Eeckhout10] · References in the end of slides
|
|
|
Jan 23 |
Interconnection networks on chip: Topologies Routing
Slides: Course based on [Jerger10]
|
Mandatory reading: [Jerger10] Additional reading: 1. Design and Evaluation of a Hierarchical On-Chip Interconnect for Next-Generation CMPs PDF 2. Design Tradeoffs for Tiled CMP On-Chip Networks PDF 3. D. N. Jayasimha, Bilal Zafar, Yatin Hoskote, On-Chip Interconnection Networks: Why They are Different and How to Compare Them 4. 1. A 5-GHz Mesh Interconnect for a Teraflops Processor PDF 5. Characterizing the Cell EIB On-Chip Network PDF 6. On-Chip Interconnection Architecture of the Tile Processor PDF 7. W. Dally, On Chip Interconnection networks low power interconnects, ISLPED, 2007. |
|
Jan 30 |
Interconnection networks on chip: Flow control Router design Slides: Course based on [Jerger10] |
|
Quiz 1 |
Feb 6 |
Shared
memory systems: Cache coherence
Snoping protocols: Culler Interconnection interfaces: [Jerger10] Chapter 2 Course based on [Jerger10] |
Reading
Cache coherence in: Design of the snooping bus: R. Kumar, V. Zyuban, and D. M. Tullsen. Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling. In ISCA, Jun 2005. |
|
Feb 13 |
Buses:
Chapter 2 from On-chip
communication architectures [electronic resource] :
system on chip interconnect, by Pasricha, Sudeep ; Dutt, Nikil, 2008. Available
online in the library. Shared memory systems: Directory protocols: Culler Chapter 8 Chapter 2 from the thesis: Efficient and Scalable Cache Coherence for Many-Core Chip Multiprocessors Synchronization Shared memory systems - synchronization Culler
- Pages from the online book:
314-324, 330-333, 367-369 Slides: Chapter 5
|
Shameem Akhter and Jason Roberts, Common problems in multi-core programming, part 1-3
Sarita V. Adve, Kourosh Gharachorloo, Shared Memory Consistency Models: A Tutorial, WRL Research Report 95/7 |
Quiz 2 |
Feb 20 |
Study Break
|
|
|
Feb 27 |
|
||
Mar 5 |
Shared memory systems - synchronization : Culler 5.5, 6.3.5-6.3.9, 7.9 |
Introduction
to Cloud Computing, Jonathan Parri, Report, University of Ottawa, 2011. |
Quiz 3 |
Mar 12 |
Virtualization Chapter
1 and Chapter
8 (pages 368-393, 397-402, 404-405, 408-414, 436-443) from the book: Virtual
Machines: Versatile Platforms for Systems and Processes, Elsevier Inc., 2005 Available online in OttawaU library. |
|
|
Mar 19 |
Presentations |
|
Quiz 4 |
Mar 26 |
Presentations |
|
|
Apr. 2 |
Final |
|
|
|
|
|
|
|
MARKING SCHEME
Final exam (30%)
Quizzes and/or assignments (20%)
Presentation and report based on the course topic (15%)
Presentation
and report on a topic of interest (35%)
QUIZZES and QUESTIONS
Quiz 2
Quiz 3 lost a soft
copy
Additional problems: http://www.site.uottawa.ca/~mbolic/ceg4131/index.shtml#assig
PRESENTATION OF A TOPIC OF INTERESTS
Presentation will last for about 30 minutes
It will need to be organized in the following form:
Definition of performance metrics
Table that compares papers and/or industrial solutions based on the defined metrics
Description of the table
Future trends and conclusions
The goal of this exercise is to prepare students to do
proper literature review and define weaknesses in state-of-the-art works.
What to
analyze
Research papers should include survey papers (IEEE or
ACM journals) or general research papers. Please do not copy sentences from the
papers: see the paper on plagiarism: How to Handle Plagiarism: New Guidelines . .
Style
The goal is to present several industrial and/or
academic solutions, understand and present performance metrics and then give
qualitative or quantitive comparison of those
solutions. Please see this link as an example:
Daniel Shapiro, Design Automation for an ASIP Empowered
Asymmetric MPSoC report, University of Ottawa, 2009 (however, it is not
in IEEE frmat).
Here, existing solutions are analyzed. Some design
criteria are selected and comparison is presented in the table.
Grading
·
Formatting, Style and English (References have to be in proper form)
20%. Please take a look at the proposed format: IEEE format· Relevance of the selected papers: 10%
·
Quality of the analysis: 20%
·
Quality of comparison: 20%
· Slides
and presentation: 30%
Due dates
and turn-in instructions
The analysis should be at least 5 pages long. You have
to follow formatting from IEEE format, which means that you need to use the
same font, no line spacing and defined margins. The slides and reports need to
be uploaded until March 15th.
PRESENTATION AND ANSWERING TO THE QUESTIONS
The report is due March 1st. It should be 2-3 pages.
For each topic show at least one commercial implementation and relate the
concepts to what we studied in the class.
Report
- Are the references correct? Are they up-to-date? Are they in correct format?
- Is every figure properly referenced?
- Is there any copy-paste?
Answering to the questions
- Are all the questions answered properly
- Did the student consult the appropriate literature?
- Did the student perform proper comparison?
Style and format
- Are the paper and references in IEEE format?
- Please correct English
The technical report will be marked based upon the advice in the document "The best method for presentation of research results in theses and papers" by Prof. Ivan Stojmenovic.
http://www.site.uottawa.ca/~dshap092/ceg4136/Stojmenovic.pdf
Topics
(select one):
1. Analyze coherence problems that
include other components besides caches such as DMA controller. How are these
problems resolved in modern processors? Describe one commercial implementation.
2. How is the instruction set of modern
processors modified to support parallel processing including synchronization,
support for cache coherence, support for
communications.
3. How is power consumption in
interconnection networks related to required bandwidth? Find example of the
solutions that optimize power consumption for a given bandwidth.
4. Describe a commercial solution of
mesh network topology. How are on-chip peripherals
connected to the mesh network.
5. Describe QuckPath
1.1 Interconnect used (QPI) in Sandy Bridge processors.
6. Describe AMD Bulldozer CPU and its
interconnection networks.
7. Compare two state-of-the art
architectures of Intel and Amd processors regarding
their interconnections, memory hierarchy, virtual memory organization, GPU
support, support for parallel programming and more. Elaborate on similarities
and differences.
8. Describe implementation in details of
cache coherence mechanism of one of commercial processors. Include
architectures and commands that are added to support cache coherence and memory
transfer.
9. Draw schematically
the architecture of a hardware performance counter that can measure up to four event. Describe each component. Add control registers as
well. Show implementation of some of commercial performance counters.
10. What is the major difference between
server chips and regular general purpose chips? What design choices were
considered in designing server chips? Describe the architecture,
interconnection network, cache coherence and other parameters of one of commercial
server chips.
11. Describe characteristics of
commercial and academic interconnection network simulators. Define metrics and
compare several of them.