更新时间:2021-06-10 19:26:12
coverpage
Title Page
Dedication
About Packt
Why subscribe?
Packt.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Why GPU Programming?
Technical requirements
Parallelization and Amdahl's Law
Using Amdahl's Law
The Mandelbrot set
Profiling your code
Using the cProfile module
Summary
Questions
Setting Up Your GPU Programming Environment
Ensuring that we have the right hardware
Checking your hardware (Linux)
Checking your hardware (windows)
Installing the GPU drivers
Installing the GPU drivers (Linux)
Installing the GPU drivers (Windows)
Setting up a C++ programming environment
Setting up GCC Eclipse IDE and graphical dependencies (Linux)
Setting up Visual Studio (Windows)
Installing the CUDA Toolkit
Installing the CUDA Toolkit (Linux)
Installing the CUDA Toolkit (Windows)
Setting up our Python environment for GPU programming
Installing PyCUDA (Linux)
Creating an environment launch script (Windows)
Installing PyCUDA (Windows)
Testing PyCUDA
Getting Started with PyCUDA
Querying your GPU
Querying your GPU with PyCUDA
Using PyCUDA's gpuarray class
Transferring data to and from the GPU with gpuarray
Basic pointwise arithmetic operations with gpuarray
A speed test
Using PyCUDA's ElementWiseKernel for performing pointwise computations
Mandelbrot revisited
A brief foray into functional programming
Parallel scan and reduction kernel basics
Kernels Threads Blocks and Grids
Kernels
The PyCUDA SourceModule function
Threads blocks and grids
Conway's game of life
Thread synchronization and intercommunication
Using the __syncthreads() device function
Using shared memory
The parallel prefix algorithm
The naive parallel prefix algorithm
Inclusive versus exclusive prefix
A work-efficient parallel prefix algorithm
Work-efficient parallel prefix (up-sweep phase)
Work-efficient parallel prefix (down-sweep phase)
Work-efficient parallel prefix — implementation
Streams Events Contexts and Concurrency
CUDA device synchronization
Using the PyCUDA stream class
Concurrent Conway's game of life using CUDA streams
Events
Events and streams
Contexts
Synchronizing the current context
Manual context creation
Host-side multiprocessing and multithreading
Multiple contexts for host-side concurrency
Debugging and Profiling Your CUDA Code
Using printf from within CUDA kernels
Using printf for debugging
Filling in the gaps with CUDA-C