B.E. Droge
e8428b8cae
Changed QOSes for new gpu nodes
3 years ago
B.E. Droge
1b091abd5f
New gpu nodes
3 years ago
Egon Rijpkema
0c686fd015
Updated spacewalk client, spacewalk repo
...
and updated type in slurmdbd.sh
3 years ago
B.E. Droge
b6396538c2
Renamed
3 years ago
B.E. Droge
cadb6bb997
fix typo
3 years ago
G.J.C. Strikwerda
90ef719f7f
Merge branch 'feature/skylake' of HPC/pg-playbooks into master
3 years ago
B.E. Droge
e96eed56c9
Added skylake_sym.yml, renamed sandybridge file
3 years ago
B.E. Droge
97ed960abe
Added skylake nodes
3 years ago
B.E. Droge
0af33826c9
Renamed file to make it similar to other files
3 years ago
B.E. Droge
8a3eb3e627
Fix typo
3 years ago
B.E. Droge
ce8c572a70
Make symlinks in /software for Skylake nodes
3 years ago
Egon Rijpkema
b808fc56b8
removed a nested if
3 years ago
Egon Rijpkema
f364fa52ee
Check for the total memory usage by user.
3 years ago
Egon Rijpkema
56c61f5912
Added more gpu nodes.
3 years ago
Egon Rijpkema
b9e9b45e87
Umcg people should always be checked.
3 years ago
Egon Rijpkema
29c53b5efe
Added checks to monitor login and interactive
3 years ago
B.E. Droge
b29a6f7924
Use default slurm exporter (they accepted our PR)
3 years ago
Egon Rijpkema
e60799f6a9
Made a script that kills user programs
...
on the login node taking more than 10% of memory.
It has the '--dumy' flag enabled for now as i would like to test this
first.
3 years ago
B.E. Droge
27a5241060
Use our own slurm exporter which has core metrics
3 years ago
Egon Rijpkema
fceddec4ca
Scraping less because of code 500 by exporter.
4 years ago
Egon Rijpkema
f65fd50c15
Merge branch 'feature/ldap-and-lustre-roles'
4 years ago
Egon Rijpkema
b5ea680558
Also on metadata
4 years ago
F. Dijkstra
9c91af9d89
* Decreased PriorityDecayHalfLife to 4 days to reduce the impact of
...
heavy usage in the past
* Increased the priority weight of job age
4 years ago
Egon Rijpkema
e2a3d6c6e5
ldap and lustre-client roles.
...
This was made for the OpenOndemand machine which needs the peregrine
users and filesystems.
For other machines, it may be nessecary to make this playbook more
versatile.
4 years ago
F. Dijkstra
79be8bb9ab
Added gelifesmedium QOS
4 years ago
F. Dijkstra
41ab9eec5e
Changes to the scheduling parameters in order to try to improve backfilling.
...
Changes are described in a Google doc in the HPC team drive.
4 years ago
Egon Rijpkema
fe7689349a
Removed systems that are phased out.
4 years ago
Egon Rijpkema
770a88a588
updated giolang version
4 years ago
Egon Rijpkema
b398c0e793
Added local changes on xcat
4 years ago
Egon Rijpkema
fbe24ccbf5
no longher using proxy
4 years ago
Egon Rijpkema
5aa4801ed8
Added euclid vms ond dh-20
...
Moved alertmanager pass to secrets file and decrypted the
prometheus.conf.
4 years ago
B.E. Droge
dab5945032
Removed reference to acct_gather_profile_influxdb.so
4 years ago
B.E. Droge
487b9b5d4b
Removed reference to acct_gather_profile_influxdb.so
4 years ago
B.E. Droge
e45b0e401f
Removed acct_gather_profile_influxdb.so, which is now part of the SLURM installation
4 years ago
B.E. Droge
d5f9ec6f0e
Restored short partition settings
4 years ago
B.E. Droge
7c59f73279
Style fixes
4 years ago
B.E. Droge
f3d24fd476
Changed Redmine URL with GPU information
4 years ago
B.E. Droge
3a98db7c41
Fix typo: end -> else
4 years ago
B.E. Droge
1877a19a1a
Add missing curly bracket
4 years ago
B.E. Droge
ce7502082e
Rewrote the partition-to-qos functionality
4 years ago
B.E. Droge
4f629e5f62
Add check that rejects GPU jobs without a --gres specification
4 years ago
Egon Rijpkema
edafe0794f
Increased retention.
...
After 100 days only 10% of the volume was in use.
4 years ago
Egon Rijpkema
8f2bca08a7
NO proxy client needed anymore.
4 years ago
F. Dijkstra
069edd60f4
Added two nodes to the short partition, because the other nodes are
...
offline.
4 years ago
Egon Rijpkema
54ba95c960
Corrected alertmanager address.
4 years ago
Egon Rijpkema
470d09cb1f
changed url of alertmanager
4 years ago
Egon Rijpkema
f3fa8ea3ee
Run nslcd in slurmdbd container.
4 years ago
root
e904071c84
Changed clustername to peregrine in lowercase, because it is
...
defined like that in the slurm database.
4 years ago
Egon Rijpkema
f86add1f79
add check for memcached on merlin.
4 years ago
Egon Rijpkema
8173603b56
Explaned a little bit more.
4 years ago