Hands-On GPU Programming with Python and CUDA

Dr. Brian Tuomanen

更新时间：2021-06-10 19:26:12

coverpage

Title Page

Dedication

About Packt

Why subscribe?

Packt.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Why GPU Programming?

Technical requirements

Parallelization and Amdahl's Law

Using Amdahl's Law

The Mandelbrot set

Profiling your code

Using the cProfile module

Summary

Questions

Setting Up Your GPU Programming Environment

Technical requirements

Ensuring that we have the right hardware

Checking your hardware (Linux)

Checking your hardware (windows)

Installing the GPU drivers

Installing the GPU drivers (Linux)

Installing the GPU drivers (Windows)

Setting up a C++ programming environment

Setting up GCC Eclipse IDE and graphical dependencies (Linux)

Setting up Visual Studio (Windows)

Installing the CUDA Toolkit

Installing the CUDA Toolkit (Linux)

Installing the CUDA Toolkit (Windows)

Setting up our Python environment for GPU programming

Installing PyCUDA (Linux)

Creating an environment launch script (Windows)

Installing PyCUDA (Windows)

Testing PyCUDA

Summary

Questions

Getting Started with PyCUDA

Technical requirements

Querying your GPU

Querying your GPU with PyCUDA

Using PyCUDA's gpuarray class

Transferring data to and from the GPU with gpuarray

Basic pointwise arithmetic operations with gpuarray

A speed test

Using PyCUDA's ElementWiseKernel for performing pointwise computations

Mandelbrot revisited

A brief foray into functional programming

Parallel scan and reduction kernel basics

Summary

Questions

Kernels Threads Blocks and Grids

Technical requirements

Kernels

The PyCUDA SourceModule function

Threads blocks and grids

Conway's game of life

Thread synchronization and intercommunication

Using the __syncthreads() device function

Using shared memory

The parallel prefix algorithm

The naive parallel prefix algorithm

Inclusive versus exclusive prefix

A work-efficient parallel prefix algorithm

Work-efficient parallel prefix (up-sweep phase)

Work-efficient parallel prefix (down-sweep phase)

Work-efficient parallel prefix — implementation

Summary

Questions

Streams Events Contexts and Concurrency

Technical requirements

CUDA device synchronization

Using the PyCUDA stream class

Concurrent Conway's game of life using CUDA streams

Events

Events and streams

Contexts

Synchronizing the current context

Manual context creation

Host-side multiprocessing and multithreading

Multiple contexts for host-side concurrency

Summary

Questions

Debugging and Profiling Your CUDA Code

Technical requirements

Using printf from within CUDA kernels

Using printf for debugging

Filling in the gaps with CUDA-C