ELG7187 (EACJ5808)  Topics in Computers: Multiprocessor Systems on Chip

 

PROFESSOR

Miodrag Bolic

School of Information Technology and Engineering (SITE), University of Ottawa

Tel: (613) 562-5800 x 6224, Fax: (613) 562-5175

Email: mbolic@site.uottawa.ca

Web: www.site.uottawa.ca/~mbolic

Office Hours: Monday 11:30-13:00, CBY A-616

 

NEWS and MESSAGES

March 30th, 2012: Some links are corrected.

 

COURSE DESCRIPTION

Architectures of multiprocessing systems, Interconnection networks, Cache coherence, Synchronization, Systems on chip, Multicore architectures, Graphical processing units (GPU), Parallel programming, Operating systems for embedded multiprocessors. Case studies: parallel processing with Nios II and XTensa, Cell processor, Inter Core i7.

 

COURSE SCHEDULE

 

Activity

Time

Location

 LEC

 Monday, 5:30 -8:00

 STE J0106

 

  

SUGGESTED TEXTS

·         [Culler98] David Culler, Jaswinder Pal Singh, and Anoop Gupta, Morgan Kaufmann, Parallel Computer Architecture: A Hardware/Software Approach, Morgan Kaufmann; 1 edition (Aug 1 1998).

·         [Hwang93] K. Hwang, Advanced Computer Architecture Parallelism, Scalability, Programmability, , McGraw-Hill 1993.   

·         [Hennessy2006] John L. Hennessy, David A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann; 4rd edition, 2006.

 Synthesis Lectures on Computer Architecture:

·                     [Harris10] Transactional Memory, 2nd edition 

Tim Harris James Larus Ravi Rajwar

June 2010

Abstract | PDF (1899 KB) | PDF Plus (1899 KB) 

·                     [Eeckhout10] Computer Architecture Performance Evaluation Methods 

Lieven Eeckhout

June 2010

Abstract | PDF (1695 KB) | PDF Plus (1696 KB) 

·                     [Jerger10] On-Chip Networks 

Natalie Enright JergerLi-Shiuan Peh

2009

Abstract | PDF (2472 KB) | PDF Plus (2845 KB) 

·                     [Kaxiras08] Computer Architecture Techniques for Power-Efficiency 

Stefanos KaxirasMargaret Martonosi

2008

Abstract | PDF (5368 KB) | PDF Plus (5369 KB) 

·                     [Kaxiras07] Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency 

Kunle OlukotunLance HammondJames Laudon

2007

Abstract | PDF (5693 KB) | PDF Plus (2661 KB) 

 

PREREQUISITES

Carleton University: SYSC 4507

University of Ottawa: CEG3156

Or equivalent courses.

 

TOPICS DISCUSSED

(This is a very preliminary schedule)

 

Week of

Topic

Literature

 Quizzes

Jan 9

Intro

Introduction – parallel computer architectures,

Embedded Multicore: An Introduction

Freescale Semiconductor, Embedded Multicore: An Introduction, , EMBMCRM, Rev. 0, 07/2009

Jan 16

Performance metrics

Performance of multiprocessing systems

Mandatory reading:

1.       Lei Hu and Ian Gorton, Performance Evaluation for Parallel Systems: A Survey, University of NSW, Australia, UNSW-CSE-TR-9707, October 1997.

2.       B. Sprunt, The Basics of Performance Monitoring Hardware, IEEE Micro, July-August, page 64-71, 2002.

 

Additional reading:

·         "Reevaluating Amdahl's Law in the Multicore Era". Presentation slides can be found at here

·         [Eeckhout10]

·         References in the end of slides

 

 

Jan 23

Interconnection networks on chip:

Topologies

Routing

 

Slides: Course based on [Jerger10]

 

Mandatory reading:

[Jerger10]

Additional reading:

1. Design and Evaluation of a Hierarchical On-Chip Interconnect for Next-Generation CMPs PDF

2. Design Tradeoffs for Tiled CMP On-Chip Networks PDF

3. D. N. Jayasimha, Bilal Zafar, Yatin Hoskote, “On-Chip Interconnection Networks: Why They are Different and How to Compare Them”

4. 1. A 5-GHz Mesh Interconnect for a Teraflops Processor PDF

5. Characterizing the Cell EIB On-Chip Network PDF

6. On-Chip Interconnection Architecture of the Tile Processor PDF

7. W. Dally, On Chip Interconnection networks – low power interconnects, ISLPED, 2007.

Jan 30

Interconnection networks on chip:

Flow control

Router design

 

Slides: Course based on [Jerger10]

 

Quiz 1

Feb 6

Shared memory systems: Cache coherence

Chapter 5

 

Snoping protocols: Culler

Chapter 6 

 

Interconnection interfaces: [Jerger10] Chapter 2

Course based on [Jerger10]

 Reading

 

Cache coherence in:

NIOS II processor

Pentium

Design of the snooping bus:

R. Kumar, V. Zyuban, and D. M. Tullsen. Interconnections in Multi-Core Architectures: Understanding Mechanisms,

Overheads and Scaling. In ISCA, Jun 2005.

 

Feb 13

Buses: Chapter 2 from

On-chip communication architectures [electronic resource] : system on chip interconnect, by Pasricha, Sudeep ; Dutt, Nikil, 2008.

Available online in the library.

 

Shared memory systems: 

Directory protocols: Culler Chapter 8

Chapter 2 from the thesis: 

Efficient and Scalable Cache Coherence for Many-Core Chip Multiprocessors

Slides

Synchronization

 

Shared memory systems - synchronization

Culler - Pages from the online book: 314-324,  330-333, 367-369

Slides: Chapter 5

 Chapter 6 

 

 

 

Shameem Akhter and Jason Roberts,

Common problems in multi-core

programming, part 1-3

 

Sarita V. Adve, Kourosh Gharachorloo, Shared Memory Consistency Models: A Tutorial, WRL Research Report 95/7

Quiz 2

Feb 20

 

Study Break

 

 

 

Feb 27

 

Mar 5

Shared memory systems - synchronization

: Culler 5.5, 6.3.5-6.3.9, 7.9

 

Cloud computing

 

 

 Introduction to Cloud Computing, Jonathan Parri, Report, University of Ottawa, 2011.

Quiz 3

Mar 12

Virtualization

Chapter 1 and Chapter 8 (pages 368-393, 397-402, 404-405, 408-414, 436-443) from the book: Virtual Machines: Versatile Platforms for Systems and Processes, Elsevier Inc., 2005

Available online in OttawaU library.

 

 

 

Mar 19

 Presentations

 

 Quiz 4

Mar 26

Presentations

 

 

Apr. 2

Final

 

 

 

 

 

 

 

MARKING SCHEME

•             Final exam (30%)

•             Quizzes and/or assignments (20%)

•             Presentation and report based on the course topic (15%)

•             Presentation and report on a topic of interest (35%)

 

QUIZZES and QUESTIONS

                Quiz 1

Quiz 2

Quiz 3 – lost a soft copy

Quiz 4

Problems for practicing

Additional problems: http://www.site.uottawa.ca/~mbolic/ceg4131/index.shtml#assig

 

 

PRESENTATION OF A TOPIC OF INTERESTS

Presentation will last for about 30 minutes

It will need to be organized in the following form:

•              Definition of performance metrics

•              Table that compares papers and/or industrial solutions based on the defined metrics

•              Description of the table

•              Future trends and conclusions

The goal of this exercise is to prepare students to do proper literature review and define weaknesses in state-of-the-art works.

 

What to analyze

Research papers should include survey papers (IEEE or ACM journals) or general research papers. Please do not copy sentences from the papers: see the paper on plagiarism: How to Handle Plagiarism: New Guidelines  .  .

 

Style

The goal is to present several industrial and/or academic solutions, understand and present performance metrics and then give qualitative or quantitive comparison of those solutions. Please see this link as an example:

 

Daniel Shapiro, Design Automation for an ASIP Empowered Asymmetric MPSoC report, University of Ottawa, 2009 (however, it is not in IEEE frmat).

 

Here, existing solutions are analyzed. Some design criteria are selected and comparison is presented in the table.

 

Grading

·         Formatting, Style and English (References have to be in proper form) 20%. Please take a look at the proposed format: IEEE format·         Relevance of the selected papers: 10%

·         Quality of the analysis: 20%

·         Quality of comparison: 20%

·         Slides and presentation: 30%

 

Due dates and turn-in instructions

The analysis should be at least 5 pages long. You have to follow formatting from IEEE format, which means that you need to use the same font, no line spacing and defined margins. The slides and reports need to be uploaded until March 15th.

 

 

PRESENTATION AND ANSWERING TO THE QUESTIONS

The report is due March 1st. It should be 2-3 pages. For each topic show at least one commercial implementation and relate the concepts to what we studied in the class.

Report

-               Are the references correct? Are they up-to-date? Are they in correct format?

-               Is every figure properly referenced?

-               Is there any copy-paste?

Answering to the questions

-               Are all the questions answered properly

-               Did the student consult the appropriate literature?

-               Did the student perform proper comparison?

Style and format

-               Are the paper and references in IEEE format?

-               Please correct English

The technical report will be marked based upon the advice in the document "The best method for presentation of research results in theses and papers" by Prof. Ivan Stojmenovic.

http://www.site.uottawa.ca/~dshap092/ceg4136/Stojmenovic.pdf

 

 Topics (select one):

1.       Analyze coherence problems that include other components besides caches such as DMA controller. How are these problems resolved in modern processors? Describe one commercial implementation.

2.       How is the instruction set of modern processors modified to support parallel processing including synchronization, support for cache coherence, support for communications.

3.       How is power consumption in interconnection networks related to required bandwidth? Find example of the solutions that optimize power consumption for a given bandwidth.

4.       Describe a commercial solution of mesh network topology. How are on-chip peripherals connected to the mesh network.

5.       Describe QuckPath 1.1 Interconnect used (QPI) in Sandy Bridge processors.

6.       Describe AMD Bulldozer CPU and its interconnection networks.

7.       Compare two state-of-the art architectures of Intel and Amd processors regarding their interconnections, memory hierarchy, virtual memory organization, GPU support, support for parallel programming and more. Elaborate on similarities and differences.

8.       Describe implementation in details of cache coherence mechanism of one of commercial processors. Include architectures and commands that are added to support cache coherence and memory transfer.

9.       Draw schematically the architecture of a hardware performance counter that can measure up to four event. Describe each component. Add control registers as well. Show implementation of some of commercial performance counters.

10.    What is the major difference between “server” chips and regular general purpose chips? What design choices were considered in designing server chips? Describe the architecture, interconnection network, cache coherence and other parameters of one of commercial server chips.

11.    Describe characteristics of commercial and academic interconnection network simulators. Define metrics and compare several of them.