Designing a more dynamic monitoring solution
How can we improve Prometheus design to suit our purposes better? How can we make it more dynamic and more scheduler friendly?
One improvement we can make is the usage of environment variables. That would save us from having to create a new image every time we need to change its configuration. At the same time, environment variables would remove the need to use a network drive (at least for configuration).
We can make a generic solution that will transform any environment variable into a Prometheus configuration entry or an initialization argument.
To enable Prometheus configuration through environment variables, we need to distinguish those that should be used as command line arguments from those that will serve to create the configuration file. We'll define a naming convention stating that every environment argument with a name that starts with ARG_ is a startup argument.
The code can be as follows:
func Run() error { cmdString := "prometheus" for _, e := range os.Environ() { if key, value := getArgFromEnv(e, "ARG"); len(key) > 0 { cmdString = fmt.Sprintf("%s -%s=%s", cmdString, key,
value) } } cmd := exec.Command("/bin/sh", "-c", cmdString) return cmdRun(cmd) }
It is a very simple function. It iterates through all the environment variables. If their names start with ARG, they will be added as arguments of the executable Prometheus. Once the iteration is done, binary is launched with arguments.
We made Prometheus more Docker-friendly with only a few lines of code that sits on top of it.
The full source code can be found in the run.go file at https://github.com/vfarcic/docker-flow-monitor/blob/master/prometheus/run.go.
We should do something similar with the configuration file. Specifically, we can make the global section of the configuration use environment variables prefixed with GLOBAL_.
The logic of the code is similar to the Run function we explored. Please go through config.go for more details (https://github.com/vfarcic/docker-flow-monitor/blob/master/prometheus/config.go). The GetGlobalConfig function returns global section of the config while the WriteConfig function writes the configuration to the file.
Please consult Prometheus configuration (https://prometheus.io/docs/operating/configuration/) for more information about the available options.
By using environment variables, we managed to get rid of the network drive. As far as configuration is concerned, it will be fault tolerant. If the service fails and gets rescheduled with Swarm, it will not lose its configuration since it is part of the service definition. There is a downside though. Every time we want to change the configuration, we'll need to execute docker service update command or modify the stack file, and re-execute docker stack deploy. As a result, Docker will stop the currently running replica and start a new one thus producing a short downtime. However, since we are, at the moment, only dealing with global configuration and startup arguments, changes will be very uncommon. We'll deal with more dynamic parts of the configuration later.
I have the code compiled and available as vfarcic/docker-flow-monitor/ (https://hub.docker.com/r/vfarcic/docker-flow-monitor/). Let's give it a spin.