Mastering TensorFlow 1.x
上QQ阅读APP看书,第一时间看更新

Executing graphs across compute devices - CPU and GPGPU

A graph can be pided into multiple parts and each part can be placed and executed on separate devices, such as a CPU or GPU. You can list all the devices available for graph execution with the following command:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

We get the following output (your output would be different, depending on the compute devices in your system):

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 12900903776306102093
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 611319808
locality {
  bus_id: 1
}
incarnation: 2202031001192109390
physical_device_desc: "device: 0, name: Quadro P5000, pci bus id: 0000:01:00.0, compute capability: 6.1"
]

The devices in TensorFlow are identified with the string /device:<device_type>:<device_idx>. In the above output, the CPU and GPU denote the device type and 0 denotes the device index.

One thing to note about the above output is that it shows only one CPU, whereas our computer has 8 CPUs. The reason for that is  TensorFlow implicitly distributes the code across the CPU units and thus by default CPU:0 denotes all the CPU's available to TensorFlow. When TensorFlow starts executing graphs, it runs the independent paths within each graph in a separate thread, with each thread running on a separate CPU. We can restrict the number of threads used for this purpose by changing the number of inter_op_parallelism_threads. Similarly, if within an independent path, an operation is capable of running on multiple threads, TensorFlow will launch that specific operation on multiple threads. The number of threads in this pool can be changed by setting the number of intra_op_parallelism_threads.