400 Commits (master)
 

Author SHA1 Message Date
B.E. Droge 5e730f4364 move node 267 back to generic list with esx nodes 1 year ago
B.E. Droge 6e684d9056 move nodes with broken ib to vulture, set merlin nodes to future 1 year ago
Egon Rijpkema d2d799cf56 Do not crash when no usage data for a gpu is available. 1 year ago
Egon Rijpkema e23f29f39e Added alerts for ceph health status. 1 year ago
Egon Rijpkema 3d5120363e Scrape ceph on the merlin-management001 1 year ago
Egon Rijpkema dba8e6269b Added a slurmdbd storage pass 1 year ago
E.M.A. Rijpkema 335e60087c Merge pull request 'Use NodeSets in SLURM config' (#16) from nodesets into master 2 years ago
B.E. Droge 731fe3d802 Use nodesets, and move non-ib nodes to vulture 2 years ago
B.E. Droge 746d716385 modify link to scientific papers that acknowledge peregrine 2 years ago
Egon Rijpkema c6fcf9ca27 When it is actualy full, send an alert about /tmp 2 years ago
B.E. Droge d801c55ff4 fix typo in lua script 2 years ago
B.E. Droge 3e361b5cac set max_rpc_cnt=150 2 years ago
B.E. Droge 9347bd9238 set messagetimeout to 30 2 years ago
B.E. Droge 68806f782e set maxnodes=1 for gpushort 2 years ago
B.E. Droge 9731cc7b04 Merge pull request 'Added gpushort partition and removed pg-gpu06 from the list of nodes.' (#15) from gpushort into master 2 years ago
B.E. Droge c5722f5c89 Merge pull request 'Moved the location of the job private temporary directory from /local to /local/tmp.' (#14) from localdir into master 2 years ago
F. Dijkstra 9267d5ffbc Added missing plugstack.conf change. The private tmpdir is now taken 2 years ago
F. Dijkstra 90a5552b47 Changed the limit for short jobs in the gpu partition to 2 hours, 2 years ago
F. Dijkstra ae3532d3f8 Added 2nd node to gpushort partition. 2 years ago
F. Dijkstra 0bc104ad17 Fixed a typo. 2 years ago
F. Dijkstra 795577652c Added gpushort partition and corresponding qos. This to be able to 2 years ago
F. Dijkstra 5599223993 Moved the location of the job private temporary directory from /local to /local/tmp. 2 years ago
Egon Rijpkema 1e7d18fbde The node exporters for dh should be included as well... 2 years ago
Egon Rijpkema 0f3a52e65a removed nodes that we don't care about. (for now) 2 years ago
B.E. Droge 956197b186 fix types 2 years ago
B.E. Droge 06fa1cb16b Additional check for memory usage 2 years ago
B.E. Droge 1256535d27 Additional check for memory usage 2 years ago
root bbf79f38bc added xdmod prolog/epilog 2 years ago
B.E. Droge d241bbbb2b Move kill_invalid_depend to DependencyParameters 2 years ago
root 980cd6628c add nvme flag to gpu22 2 years ago
root 3c87f0a6cf better task names, make .d dirs first 2 years ago
root 501fd9d992 only run slurm client playbook on non-storage nodes 2 years ago
root 0134eefb03 Dont copy taskprolog to prolog.d 2 years ago
root 9b4a4ae829 split epilog and prolog 2 years ago
B.E. Droge 86af4f5ea8 Prolog/epilog fix 2 years ago
B.E. Droge 3030b5825e Move some slurm files from templates to files 2 years ago
B.E. Droge f447ad2fef Use copy module instead of template for prologs/epilogs 2 years ago
B.E. Droge 97fc1024ba Move prolog and epilog scripts to .d directories 2 years ago
B.E. Droge 4397c3f9c4 Move prolog and epilog scripts to .d directories 2 years ago
B.E. Droge 6a5e205bd8 Removed MemLimitEnforce from config, deprecated in SLURM 20 2 years ago
B.E. Droge fd59c0dc98 Removed FastSchedule from config, removed in SLURM 20 2 years ago
B.E. Droge be1b27bfd0 Modified name of Euclid CVMFS package 2 years ago
B.E. Droge 34330e91aa Install texlive on all nodes 2 years ago
Egon Rijpkema ced403a61d dh-node19 is Cobbler test node. 2 years ago
F. Dijkstra c0021bcb1b Added pack_serial_at_end to SchedulerParameters which should improve 2 years ago
B.E. Droge fba7ddf740 Changed threshold for regular qos 2 years ago
Egon Rijpkema fd326603d2 Take average usage when multiple gpus are present. 2 years ago
Egon Rijpkema b069bb58f8 really really remove fuse. 2 years ago
root 24e26c3ca8 Increased weight of job age parameter to prevent job starvation. 2 years ago
root 077218bbfb Added nvme label to GPU nodes with nvme enabled /local 2 years ago