Distributed Computing with Go
上QQ阅读APP看书,第一时间看更新

Scheduling with G, M, and P

By the time the program is ready to start executing, the runtime has machines and processors set up. The runtime would request the OS to start an ample number of Machines (M), GOMAXPROCS number of Processors to execute goroutines (G). It is important to understand that M is the actual unit of execution and G is the logical unit of execution. However, they require P to actually execute G against the M. Let's look at a possible scenario to better explain the scheduling process. First let's look at the components we shall be using for the scenario:

  • We have a set of M ready to run: M1...Mn
  • We also have two Ps: P1 and P2 with runqueues—runq1 and runq2 respectively
  • Last but not least, we also have 20 goroutines, G1...G20, which we want to execute as part of the program

Go's runtime and all of the components, M1...Mn, P1 and P2, and G1...G20, are represented as shown in the following figure:

Given that we have two Processors, the global scheduler would ideally distribute the goroutines between the two Processors equally. Assume that P1 is assigned to work on G1...G10 and and puts them into its runqueue, and similarly P2 puts G11...G20 in its runqueue. Next, P1's scheduler pops a goroutine from its runqueue to run, G1, picks a machine to run it on, M1, and similarly P2 runs G11 on M2. This can be illustrated by the following diagram:

A process's internal scheduler is also responsible for switching out the current goroutine with the next one that it wants to execute. If everything is going well, the scheduler will switch the current goroutine for one of three possible reasons:

  • Time slice for current execution is over: A process will use schedtick (it is incremented every time the scheduler is called) to keep track of how long the current goroutine has been executing and, once a certain time limit is reached, the current goroutine will be put back in the runqueue and the next goroutine is picked up for execution.
  • Done with execution: Simply put, the goroutine is done executing all of its instructions. In this case, it will not be put back in the runqueue.
  • Waiting on system call: In some cases, the goroutine might need to make a system system call, and as a result, the goroutine will be blocked. Given that we have a handful of processors, it doesn't make sense to block such an expensive resource. The good news is that in Go, the processor is not required to wait on the system call; instead it can leave the waiting M and G combo, which will be picked up by the global scheduler after the system call. In the meantime, the processor can pick another M from the available machines, pick another goroutine from its runqueue, and start executing it. This is explained with the help of the following diagram:

The previous diagram explains that the processor P1 is running goroutine G1 on machine M1. G1 will now begin making a system call. This can be illustrated in the following diagram:

The previous diagram explains that the processor P1 detaches itself from machine M1 and goroutine G1 due to a system call. P1 picks a new machine, M5, and a new goroutine, G9, to execute:

In the previous diagram, the G1-M1 system call is completed. Now G1 is put back in the P1 runqueue and M1 is added to the set of idle machines.

In the last part of this section, we are going to discuss another strategy implemented in the scheduler, called work-stealing.

Let's say that processor P1 has 10 goroutines and P2 has 10 goroutines. However, it turns out that the goroutines in P1 were quickly completed and now there are zero goroutines in P1's runqueue. It would be a tragedy if P1 were idle and waiting for the global scheduler to provide it with more work. With the help of the work-stealing strategy, P1 starts checking with other processors and, if another processor has goroutines in its runqueue, it will "steal" half of them and start executing them. This ensures that we are maximizing the CPU usage for our program. Let's ask two interesting questions:

  • What if a processor realizes that it can't steal any more tasks? The procesor will wait for a little while expecting new goroutines and, if none are created, the processor is killed.
  • Can a processor steal more than half of a runqueue? Even if we have many processors at work, work-stealing will always take half of the target processor's runqueue.

This can be illustrated with the following diagram:

The preceding diagram shows two processors, P1 and P2, executing a goroutine each from their runqueue on two machines. Let's consider that the tasks for processor P2 complete while P1 is running. This is shown in the figure here:

Processor P2 has exhausted its runqueue, and does not have any more goroutines to execute. Thanks to the work-stealing strategy, P2 has "stolen" half of the goroutines from P1's runqueue and can start executing them, as shown in the figure here: