skip to main content

CS 420 - Parallel Progrmg: Sci & Engrg

Fall 2020

Official Description

Fundamental issues in design and development of parallel programs for various types of parallel computers. Various programming models according to both machine type and application area. Cost models, debugging, and performance evaluation of parallel programs with actual application examples. Course Information: Same as CSE 402 and ECE 492. 3 undergraduate hours. 3 or 4 graduate hours. Prerequisite: CS 225.


Online lecture notes

Learning Goals

  1. Prepare non-CS science and engineering students to the use of parallel computing in support of their work. (1,2,6)
  2. Acquire basic knowledge of CPU architecture: execution pipeline, dependencies, caches; learn to tune performance by enhancing locality and leveraging compiler optimizations. (2,6)
  3. Understand vector instructions and learn to use vectorization (2,6)
  4. Acquire basic knowledge of multicore architectures: cache coherence, true and false sharing and their relevance to parallel performance tuning (2,6)
  5. Learn to program using multithreading, parallel loops, and multitasking using a language such as OpenMP. Learn to avoid concurrency bugs. (2,6)
  6. Learn to program using message passing with a library such as MPI. (2,6)
  7. Understand simple parallel algorithms and their complexity. (1,6)
  8. Learn to program accelerators using a language such as OpenMP (2,6)
  9. Acquire basic understanding of parallel I/O and of frameworks for data analytics, such as map-reduce. (6)
  10. Team project (1,2,3,5,6)

Topic List

  1. Introduction: Course introduction. Importance of parallel computing with the end of Moore’s Law
  2. Basic CPU architecture and performance bottlenecks. Tuning for locality and leveraging optimizing compilers.
  3. Vector instructions and compiler vectorization.
  4. Basic multicore architecture and performance bottlenecks. False sharing.
  5. OpenMP: Mutithreading model; parallel sections, parallel loops, tasks and task dependencies. Races and atomicity. Deadlock avoidance
  6. Basic parallel algorithms: matmult, stencils, sparseMV
  7. Basic cluster architecture and performance bottlenecks
  8. MPI: Point-to-point, one-sided, collectives.
  9. Basic distributed memory algorithms: matmult, stencils, sparseMV, sorting; data distribution.
  10. Parallel programming patterns: divide-and-conquer, pipeline
  11. Basic GPU architecture; programming GPUs with OpenMP
  12. Basics of parallel I/O
  13. Basics of data analysis using map-reduce
  14. Mid-term and final exams

Assessment and Revisions

Revisions in last 6 years Approximately when revision was done Reason for revision Data or documentation available?
Updasted list of topics covered fall 2010 List of topics had evolved throught the years New list of topics with hours.

Required, Elective, or Selected Elective

Selected Elective

Last updated

2/10/2019by Marc Snir