379 Commits (fix/no_alerts_var)
 

Author SHA1 Message Date
Egon Rijpkema a9d1f4e5bd No 72h predictions on /var. 6 months ago
B.E. Droge 3320c4d570 Merge pull request 'Increased timeout for not using the GPU to 4 hours' (#20) from feature/increased_gpu_timeout into master 6 months ago
F. Dijkstra 0e1fc73cca Increased timeout for not using the GPU to 4 hours, since 6 months ago
Egon Rijpkema 700c7fd0a6 Updated prometheus documentation a little. 6 months ago
E.M.A. Rijpkema 454061659a Merge pull request 'Add PrivateData setting to slurmdbd.conf and slurm.conf' (#18) from feature/privatedata into master 6 months ago
F. Dijkstra 63d5c01d59 Added PrivateData setting to slurm.conf as setting it only in 6 months ago
F. Dijkstra fe09b7faf5 Added users to PrivateData, as usage on itself did not have the 7 months ago
F. Dijkstra 80c0533eb6 Added the parameter PrivateData to prevent regular users from seeing 7 months ago
G.J.C. Strikwerda 32dc935e4c Merge pull request 'Added tree to the list of tools.' (#17) from feature/tree into master 7 months ago
F. Dijkstra 2b0c012502 Added tree to the list of tools. 7 months ago
Egon Rijpkema 5dc4274e96 Added new prometheus cert for knyft. 9 months ago
Egon Rijpkema 210c8a6911 Made build work again. 10 months ago
Egon Rijpkema c70e4a4af9 Lustre exporter is extremely verbose. 10 months ago
B.E. Droge 7e43402cb0 set pg-node247 and 269 to FUTURE 11 months ago
B.E. Droge 3e775df7a7 remove dh-node11 and 19 11 months ago
root 5074348f17 slurmd_restart should actually restart (not reload) slurmd 11 months ago
root 6735ac1e69 add config tag to config-related steps, do restart of slurmd 11 months ago
B.E. Droge a4cb09cd33 Merge branch 'master' of ssh://git.web.rug.nl:222/HPC/pg-playbooks 11 months ago
B.E. Droge ecc56268c4 remove xdmod scripts 11 months ago
B.E. Droge 39e4b8ad77 disable task affinity for cgroups 11 months ago
B.E. Droge df5090ca69 decrease tmpdisk values to a close power of 10 11 months ago
B.E. Droge c29670ceaf Add TmpFS=/local and TmpDisk values for nodes 11 months ago
root 41f075af42 fix syntax error 11 months ago
root 1bb6ca0329 update db password 11 months ago
root b965d07018 split single slurm logrotate setting into two separate ones 11 months ago
root 56ac7e9194 fix deprecation warning for loop in yum module 11 months ago
root a3eb7a3e72 bump slurm version 11 months ago
B.E. Droge c2fc2e779a make slurm user owner of slurmdbd.conf 11 months ago
B.E. Droge e7cf23fb7d change mode of slurmdbd.conf 11 months ago
B.E. Droge 5e730f4364 move node 267 back to generic list with esx nodes 1 year ago
B.E. Droge 6e684d9056 move nodes with broken ib to vulture, set merlin nodes to future 1 year ago
Egon Rijpkema d2d799cf56 Do not crash when no usage data for a gpu is available. 1 year ago
Egon Rijpkema e23f29f39e Added alerts for ceph health status. 1 year ago
Egon Rijpkema 3d5120363e Scrape ceph on the merlin-management001 1 year ago
Egon Rijpkema dba8e6269b Added a slurmdbd storage pass 1 year ago
E.M.A. Rijpkema 335e60087c Merge pull request 'Use NodeSets in SLURM config' (#16) from nodesets into master 1 year ago
B.E. Droge 731fe3d802 Use nodesets, and move non-ib nodes to vulture 1 year ago
B.E. Droge 746d716385 modify link to scientific papers that acknowledge peregrine 2 years ago
Egon Rijpkema c6fcf9ca27 When it is actualy full, send an alert about /tmp 2 years ago
B.E. Droge d801c55ff4 fix typo in lua script 2 years ago
B.E. Droge 3e361b5cac set max_rpc_cnt=150 2 years ago
B.E. Droge 9347bd9238 set messagetimeout to 30 2 years ago
B.E. Droge 68806f782e set maxnodes=1 for gpushort 2 years ago
B.E. Droge 9731cc7b04 Merge pull request 'Added gpushort partition and removed pg-gpu06 from the list of nodes.' (#15) from gpushort into master 2 years ago
B.E. Droge c5722f5c89 Merge pull request 'Moved the location of the job private temporary directory from /local to /local/tmp.' (#14) from localdir into master 2 years ago
F. Dijkstra 9267d5ffbc Added missing plugstack.conf change. The private tmpdir is now taken 2 years ago
F. Dijkstra 90a5552b47 Changed the limit for short jobs in the gpu partition to 2 hours, 2 years ago
F. Dijkstra ae3532d3f8 Added 2nd node to gpushort partition. 2 years ago
F. Dijkstra 0bc104ad17 Fixed a typo. 2 years ago
F. Dijkstra 795577652c Added gpushort partition and corresponding qos. This to be able to 2 years ago