CSE-746 – Advanced Parallel and High Performance Computing
McMaster University

Instructor: Dr. Pawel Pomorski,  

Time: Wednesdays 2:30-5:30, first lecture Wednesday, January 4th, 2017
Place: MMC HH 207
Note: no lecture on Feb. 22 (Reading Week)

Students in the course will need an account with SHARCNET for their assignments. These can be obtained through their supervisor. If that is not feasible, the course instructor will be able to provide the student a temporary SHARCNET account.

Syllabus The goal of this course is to equip students with theoretical knowledge and practical skills necessary to independently develop scalable parallel and high-performance codes for various applications. The course covers a selection of advanced topics concerning both software and hardware aspects of code development. The main focus will be on parallel programming for the GPU with CUDA, both for the single and multiple GPU cases utilizing MPI and OpenMP, with the use of associated parallel debuggers and profilers. Students will learn optimization techniques for the GPU, as well as encounter other approaches to GPU programming such as OpenCL and OpenACC.

Final project: 40% (deadline for submitting all material: 5 pm, Wednesday, April 26, 2017)
Two assignments: 20% each
Two in-class quizzes: 10% each

Formally, last semester's CSE745 course is a prerequisite for CSE746 as it covered a lot of useful material relevant to this course. The link to that course webpage is here. However, student's who have not taken CSE745 can still take CSE746 at instructor's discretion if their programming background is good enough.

Course notes and code examples

Lecture 1 - Introduction to GPU programming

Lecture 2 - Introduction to GPU programming

Lecture 2 programming notes

Lecture 3 - Optimizing memory access on GPUs

Lecture 4 - Reduction with CUDA

Lecture 5 - Dynamic parallelism with CUDA

Lecture 6 - Profiling CUDA

Lecture 7 - Introduction to OpenCL

Lecture 8 - Introduction to OpenACC

Lecture 9 - Thrust library

Lecture 10 - CUDA on multiple GPUs

Lecture 11 - Intel Phi Programming

Lecture 12 - Intel Phi Programming continued

Lecture 13 - Charm++

Assignments for 2017 course

Assignment 1

Assignment 2

Useful documents

CUDA C programming guide