Updating service constraints
The services we created so far are scheduled without any constraints, apart from those that tie some of the services to one of the Swarm managers.
Without constraints, Swarm will distribute service replicas evenly. It will place them on a node that has fewest containers. Such a strategy can be disastrous. For example, we might end up with Prometheus, ElasticSearch, and MongoDB on the same node. Since all three of them require a fair amount of memory, their performance can deteriorate quickly. At the same time, the rest of the nodes might be running very undemanding services like go-demo. As a result, we can end up with a very uneven distribution of replicas from the resource perspective.
We cannot blame Swarm for a poor distribution of service replicas. We did not give it any information to work with. As a minimum, we should have defined how much memory it should reserve for each service as well as memory limits.
Memory reservation gives Swarm a hint how much it should reserve for a service. If, for example, we specify that a replica of a service should reserve 1 GB of memory, Swarm will make sure to run it on a node that has that amount available. Bear in mind that it does not compare reservation with the actual memory usage but, instead, it compares it with the reservations made for other services and the total amount of memory allocated to each node.
Memory limit, on the other hand, should be set to the maximum amount we expect a service to use. If the actual usage surpasses it, the container will be shut down and, consequently, Swarm will reschedule it. Memory limit is, among other things, a useful protection against memory leaks and a way of preventing a single service abducting all the resources.
Let us revisit the services we are currently running and try to set their memory reservations and limits.
What should be the constraint values? How do we know how much memory should be reserved and what should be the limit? As it happens, there are quite a few different approaches we can take.
We could visit a fortune teller and consult a crystal ball, or we can make a lot of very inaccurate assumptions. Either of those is a bad way of defining constraints. You might be inclined to say that databases need more memory than backend services. We can assume that those written in Java require more resources than those written in Go. There is no limit to the number of guesses we could make. However, more often than not, they will be false and inaccurate. If those two would be the only options, I would strongly recommend visiting a fortune teller instead guessing. Since the result will be, more or less, the same, a fortune teller can, at least, provide a fun diversion from day to day monotony and lead to very popular photos uploaded to Instagram.
The correct approach is to let the services run for a while and consult metrics. Then let them run a while longer and revisit the metrics. Then wait some more and consult again. The point is that the constraints should be reviewed and, if needed, updated periodically. They should be redefined and adapted as a result of new data. It's a task that should be repeated every once in a while. Fortunately, we can create alerts that will tell us when to revisit constraints. However, you'll have to wait a while longer until we get there. For now, we are only concerned with the initial set of constraints.
While we should let the services run for at least a couple of hours before consulting metrics, my patience is reaching the limit. Instead, we'll imagine that enough metrics were collected and consult Prometheus.
The first step is to get a list of the stacks we are currently running:
docker stack ls
The output is as follows:
Let us consult the current memory usage of those services.
Please open Prometheus' graph screen.
open "http://$(docker-machine ip swarm-1)/monitor/graph"
Type container_memory_usage_bytes{container_label_com_docker_stack_namespace="exporter"} in the Expression field, click the Execute button, and switch to the Graph view.
If you hover over the lines in the graph, you'll see that one of the labels is container_label_com_docker_swarm_service_name. It contains the name of a service allowing you to identify how much memory it is consuming.
While the exact numbers will differ from one case to another, exporter_cadvisor should be somewhere between 20 MB and 30 MB, while exporter_node-exporter and exporter_ha-proxy should have lower usage that is around 10 MB.
With those numbers in mind, our exporter stack can be as follows (limited to relevant parts).
... ha-proxy: ... deploy: ... resources: reservations: memory: 20M limits: memory: 50M ... cadvisor: ... deploy: ... resources: reservations: memory: 30M limits: memory: 50M node-exporter: ... deploy: ... resources: reservations: memory: 20M limits: memory: 50M ...
We set memory reservations similar to the upper bounds of the current usage. That will help Swarm schedule the containers better, unless they are global and have to run everywhere. More importantly, it allows Swarm to calculate future schedules by excluding these reservations from the total available memory.
Memory limits, on the other hand, will provide limitations on how much memory containers created from those services can be used. Without memory limits, a container might "go wild" and abduct all the memory on a node for itself. Good example are in-memory databases like Prometheus. If we would deploy it without any limitation, it could easily take over all the resources leaving the rest of the services running on the same node struggling.
Let's deploy the updated version of the exporter stack.
docker stack deploy \ -c stacks/exporters-mem.yml \ exporter
Since most of the stack are global services, we will not see much difference in the way Swarm schedules them. No matter the reservations, a replica will run on each node when the mode is global. Later on, we'll see more benefits behind memory reservations. For now, the important thing to note is that Swarm has a better picture about the reserved memory on each node and will be able to do future scheduling with more precision.
We'll continue with the rest of the stacks. The next in line is go-demo.
Please go back to Prometheus' Graph screen, type container_memory_usage_bytes{container_label_com_docker_stack_namespace="go-demo"} in the Expression field, and click the Execute button.
The current usage of go-demo_db should be between 30 MB and 40 MB while go-demo_main is probably below 5 MB. We'll update the stack accordingly.
The new go-demo stack is as follows (limited to relevant parts).
... main: ... deploy: ... resources: reservations: memory: 5M limits: memory: 10M db: ... deploy: resources: reservations: memory: 40M limits: memory: 80M ...
Now we can deploy the updated version of the go-demo stack.
docker stack deploy \ -c stacks/go-demo-mem.yml \ go-demo
Two stacks are done, and two are still left to be updated. The monitor and proxy stacks should follow the same process. I'm sure that by now you can query Prometheus by yourself. You'll notice that monitor_monitor service (Prometheus) is the one that uses the most memory (over 100 MB). Since we can expect Prometheus memory usage to rise with time, we should be generous with its reservations and set it to 500 MB. Similarly, a reasonable limit could be 800 MB. The rest of the services are very moderate with their memory consumption.
Once you're done exploring the rest of the stacks through Prometheus, the only thing left is to deploy of the updated versions.
DOMAIN=$(docker-machine ip swarm-1) \ docker stack deploy \ -c stacks/docker-flow-monitor-mem.yml \ monitor docker stack deploy \ -c stacks/docker-flow-proxy-mem.yml \ proxy
Now that our stacks are better-defined thanks to metrics, we can proceed and try to improve our queries through memory reservations and limits.