Querying your GPU with PyCUDA
Now, finally, we will begin our foray into the world of GPU programming by writing our own version of deviceQuery in Python. Here, we will primarily concern ourselves with only the amount of available memory on the device, the compute capability, the number of multiprocessors, and the total number of CUDA cores.
We will begin by initializing CUDA as follows:
import pycuda.driver as drv
drv.init()
We can now immediately check how many GPU devices we have on our host computer with this line:
print 'Detected {} CUDA Capable device(s)'.format(drv.Device.count())
Let's type this into IPython and see what happens:
Great! So far, I have verified that my laptop does indeed have one GPU in it. Now, let's extract some more interesting information about this GPU (and any other GPU on the system) by adding a few more lines of code to iterate over each device that can be individually accessed with pycuda.driver.Device (indexed by number). The name of the device (for example, GeForce GTX 1050) is given by the name function. We then get the compute capability of the device with the compute_capability function and total amount of device memory with the total_memory function.
Here's how we will write it:
for i in range(drv.Device.count()):
gpu_device = drv.Device(i)
print 'Device {}: {}'.format( i, gpu_device.name() )
compute_capability = float( '%d.%d' % gpu_device.compute_capability() )
print '\t Compute Capability: {}'.format(compute_capability)
print '\t Total Memory: {} megabytes'.format(gpu_device.total_memory()//(1024**2))
Now, we are ready to look at some of the remaining attributes of our GPU, which PyCUDA yields to us in the form of a Python dictionary type. We will use the following lines to convert this into a dictionary that is indexed by strings indicating attributes:
device_attributes_tuples = gpu_device.get_attributes().iteritems()
device_attributes = {}
for k, v in device_attributes_tuples:
device_attributes[str(k)] = v
We can now determine the number of multiprocessors on our device with the following:
num_mp = device_attributes['MULTIPROCESSOR_COUNT']
A GPU divides its individual cores up into larger units known as Streaming Multiprocessors (SMs); a GPU device will have several SMs, which will each individually have a particular number of CUDA cores, depending on the compute capability of the device. To be clear: the number of cores per multiprocessor is not indicated directly by the GPU—this is given to us implicitly by the compute capability. We will have to look up some technical documents from NVIDIA to determine the number of cores per multiprocessor (see http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities), and then create a lookup table to give us the number of cores per multiprocessor. We do so as such, using the compute_capability variable to look up the number of cores:
cuda_cores_per_mp = { 5.0 : 128, 5.1 : 128, 5.2 : 128, 6.0 : 64, 6.1 : 128, 6.2 : 128}[compute_capability]
We can now finally determine the total number of cores on our device by multiplying these two numbers:
print '\t ({}) Multiprocessors, ({}) CUDA Cores / Multiprocessor: {} CUDA Cores'.format(num_mp, cuda_cores_per_mp, num_mp*cuda_cores_per_mp)
We now can finish up our program by iterating over the remaining keys in our dictionary and printing the corresponding values:
device_attributes.pop('MULTIPROCESSOR_COUNT')
for k in device_attributes.keys():
print '\t {}: {}'.format(k, device_attributes[k])
So, now we finally completed our first true GPU program of the text! (Also available at https://github.com/PacktPublishing/Hands-On-GPU-Programming-with-Python-and-CUDA/blob/master/3/deviceQuery.py). Now, we can run it as follows:
We can now have a little pride that we can indeed write a program to query our GPU! Now, let's actually begin to learn to use our GPU, rather than just observe it.