Would like to move to to https://github.com/rug-cit-hpc/pg-playbooks but has large files...
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 

2.0 KiB

Prometheus

Below is a picture of the current prometheus monitoring setup on gearshift. Our setup consists of the following components:

Gross simplification

Node exporter

Each peregrine node has a node exporter running. It was installed using the node exporter.yml playbook in the root of this repository.
This playbook applies the node exporter role which does little more than copy the binary (from promtools/results) to the node and install a systemd unit file on the node. The node exporter listens for requests on port 9100 on each node.

Prometheus server

The server runs in a docker container on knyft. It was installed using the prometheus.yml playbook that installs the prom_server role. This role also contains its configuration files. The server scrapes the exports of the nodes. It stores them in a special time series database that is integrated in the prometheus server. Targets and alerts are configured using these files. Prometheus also has a web frontend that listens on knyft and is accessible from the management vlan. Via the webinterface it is possible to query the data directly and to see the status of reporters to the server. Alerts are also shown here.

Grafana

Grafana runs from the rancher environment queries the prometheus servers on knyft and in the rancher environment itself. It has various dashboards that present the data. (the prometheus server in the rancher environment monitors other systems than peregrine)

Alertmanager

Prometheus posts the alerts it raises to the alertmanager in the rancher cloud. The aletmanager filters the alerts. It filters to duplicate errors should you have one node monitored with more than one prometheus for instance. It is also possible to silence alerts here. The web interface of the alertmanager is here. The aletmanager is configured to push alerts to various slack channels.