High Performance Computing with CUDA

 

                                                            Instructor: Ying Liu

 

 

 

 

 

 

 

Motivation

 

The size of various data sets has increased tremendously in recent years as speedups in processing and communication have greatly improved the capability for data generation and collection in areas such as scientific experimentation, business and government transactions, as well as the Internet. Due to the huge size and high dimensionality of the available data sets, it is quite common to see databases on the order of gigabytes or terabytes. Sequential processing is unable to run in-core or would take a tremendous amount of time. Therefore, parallel computing is an essential solution to speed up the computation.


Many parallel computing platforms have been developed in the past couple of decades, including shared memory parallel machines, distributed parallel machines, and clusters. They offer more computation capability, more memory and storage than single-processor systems. The challenge is to develop parallel programs for applications to achieve efficiency and performance goals.


GPU (Graphics Processing Unit) is a dedicated graphics rendering device for a personal computer, workstation, or game console. Modern GPUs are very efficient at manipulating and displaying computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms. It has been demonstrated that many computational-intensive applications on GPUs can achieve dramatic speedups than on CPUs, which is a revolution of parallel processing due to its massive parallelism and low cost.


Nvidia’s Compute Unified Device Architecture (CUDA) is a general purpose scalable parallel programming model for writing highly parallel applications on GPUs. It provides several key abstractions – a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has proven quite successful at programming multi-threaded many-core GPUs and scales transparently to hundreds of cores. CUDA is steadily winning customers in scientific and engineering fields.

 

Course Objectives

 

This course presents an introduction to a new emerging paradigm: GPU Computing with CUDA. The objective of this course is to provide students with knowledge and hands-on experience in developing multi-threaded code for GPUs using CUDA. We present parallel programming principles, the parallelism models, communication models, synchronization mechanism, toolkits, as well as the resource limitations of GPUs. Some existing examples and application areas are also presented.

 

 

Spring 2009


Syllabus

 

Lecture

Material

Lecture 1 - Introduction

Slides (pdf)

Lecture 2 - Parallel Computing

Slides (pdf)

Lecture 3 - CUDA Programming Model

Slides (pdf)

Lecture 4 - CUDA Memory

Slides (pdf)

Lecture 5 - CUDA Threads

Slides (pdf)

Lecture 6 - Performance Optimization

Slides (pdf)

Lecture 7 - Case Study: Typical Examples

Slides (pdf)

Lecture 8 - Case Study: Association Rules Mining

Slides (pdf)

Lecture 9 - Case Study: Clustering in Cosmological Simulation

Slides (pdf)

Lecture 10 - Final Project Presentation & Conclusion

Slides (pdf)

Prerequisites

• C programming
• Operating System
• Algorithms/Data Structure

References

• Matt Pharr (ed.), GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation, Addison Wesley.
• http://www.nvidia.com/object/cuda_home.html
Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar, Introduction to Parallel Computing (2nd Edition), Addison Wesley, 2003.
• Barry Wilkinson, Michael Allen, Parallel Programming 2nd edition, Pearson Education (Prentice Hall)
• Quinn, Michael J., Parallel Programming in C with MPI and OpenMP McGraw-Hill Science, 2004
• http://www.mpi-forum.org/
• http://www.openmp.org/

Contact Information

Instructor:
Ying Liu. yingliu@gucas.ac.cn

Teaching Assistant:
Liheng Jian, Peng Zhang, Shenshen Liang.

Lecture Hours

Lecture: Thu 7:00-9:30 PM
Room: S104

Lab and Assignments

We will use NVIDIA processors and the CUDA™ programming tools in the lab section of this course. The programming assignment will involve successively sophisticated programming skills. The topic of the final project is free, but must involve a computational-intensive application followed by some form of display of the results, such as mathematics, image processing, data mining, etc.

Grading Policy


Lab assignment: 60%

Final Project: 40%

 

 

 

Fall 2009


Syllabus

 

Lecture

Material

Lecture 1 - Introduction

Slides (pdf)

Lecture 2 - Parallel Computing

 

Lecture 3 - CUDA Programming Model

 

Lecture 4 - CUDA Memory

 

Lecture 5 - CUDA Threads

 

Lecture 6 - Performance Optimization

 

Lecture 7 - Case Study: Typical Examples

 

Lecture 8 - Case Study: Association Rules Mining

 

Lecture 9 - Case Study: Clustering in Cosmological Simulation

 

Lecture 10 - Final Project Presentation & Conclusion

 

 

Prerequisites

 

• C programming
• Operating System
• Algorithms/Data Structure

 

References

 

• Matt Pharr (ed.), GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation, Addison Wesley.
• http://www.nvidia.com/object/cuda_home.html
• Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar, Introduction to Parallel Computing (2nd Edition), Addison Wesley, 2003.
• Barry Wilkinson, Michael Allen, Parallel Programming 2nd edition, Pearson Education (Prentice Hall)
• Quinn, Michael J., Parallel Programming in C with MPI and OpenMP McGraw-Hill Science, 2004
• http://www.mpi-forum.org/
http://www.openmp.org/

 

Contact Information

 

Instructor:
Ying Liu. yingliu@gucas.ac.cn

 

Teaching Assistant:
Sheng Xiao, Shenshen Liang.

 

Lecture Hours

 

Lecture: Mon 7:00-9:30 PM
Room: S304

 

Lab and Assignments

 

We will use NVIDIA processors and the CUDA™ programming tools in the lab section of this course. The programming assignment will involve successively sophisticated programming skills. The topic of the final project is free, but must involve a computational-intensive application followed by some form of display of the results, such as mathematics, image processing, data mining, etc.

 

Grading Policy


Lab assignment: 60%

Final Project: 40%