Posts in triton

How busy is the cluster? A discussion

We occasionally get some questions: how busy is the cluster? How long do I have to wait? Is there some dashboard that can tell me?

The answer is, unfortunately, not so easy. Our cluster uses dynamic scheduling with a fairshare algorithm. All users have a fairshare priority, which decreases the more you have recently run. Jobs are ranked by priority (including fairshare plus other factors), and scheduled in that order. If there are unschedulable holes between those jobs, it can take a job with a lower priority and fill them in (“backfilling”). So that gives us:

A small-enough job with a low priority might still be scheduled soon.

A higher priority user could submit something while you are waiting, and increase your wait time.

An existing job could end early, making other wait times shorter.

An existing job could end early, allowing some other higher priority jobs to run sooner, making backfilled jobs run later.

In short: there is no way to give an estimate of the wait time, in the way people want. We’ve tried but haven’t find a way to answer the question well.

What can we know?

You can compare your fairshare factor with other users. If you run sshare you can see the fairshare (higher means higher priority). sprio shows relatively priority for all jobs (here, the raw values are multiplied by some factor and added). On Triton (the new install since 2024 may), they mean the following:

at 7 days) (zero when first submitted, increasing to 10000 at 7 days old)

The fairshare factor is “1e7 × FairShare priority from sshare”

The FairShare value is computed based on the raw usage value: at each level of the share tree, it divides it up among the users so that those who have run less have a higher priority.

The usage value decays with a two-week half life.

The others are mostly constant.

Still: this is all very abstract and what others submit has more effect than your priority. The only thing you can control is using less resources.

This is quite cluster dependent so we’d recommend asking for help for how your own cluster is setup.

This may be your real question. There are two main things:

Use less resources. Make sure you don’t over-request more than you need (CPU, memory, GPUs) - this will affect your future fairshare less. Of course, use everything you need, “saving for later” doesn’t give you more resources than you save now.

Request less resources per job. This will let you be backfilled into the scheduling holes (see below).

In this case, if there is a slot for you, you are scheduled very soon. srun --test [RESOURCE_REQUESTS] might give you some hint about when a job would be scheduled - it basically tries to schedule an empty job and reports the currently estimated start time. (It uses a JobID though so don’t run it in a loop)

In this case, nothing can be said since the queue is always being re-shuffled. In the long-run, you get a fair share of resources. If you haven’t used much lately, you have more now. Your wait time depends more on what other users submit (and their priorities) than what you submit - and this is always changing. You can tell something about how soon you’d be scheduled by looking at your priority relative to other users. Make your jobs as small and efficient as possible to fit in between the holes of other jobs and get scheduled as soon as possible. If you can break one big job into smaller pieces (less time, less CPU, less memory) that depend on each other, then you can better fit in between all of the big jobs. See the Tetris metaphor here in TTT4HPC

If your need is “run stuff quickly for testing”, make sure the jobs are as short as possible. Hopefully, your cluster staff about development or debugging partitions that may be of use, because that’s the solution for quick tests.

This description was in an old version of our docs but has since been removed. The exact values are out of date. It’s included here for detailed reference anyway:

Triton queues are not first-in first-out, but “fairshare”. This means that every person has a priority. The more you run the lower your user priority. As time passes, your user priority increases again. The longer a job waits in the queue, the higher its job priority goes. So, in the long run (if everyone is submitting an never-ending stream of jobs), everyone will get exactly their share.

Once there are priorities, then: jobs are scheduled in order of priority, then any gaps are backfilled with any smaller jobs that can fit in. So small jobs usually get scheduled fast regardless.

Warning: from this point on, we get more and more technical, if you really want to know the details. Summary at the end.

What’s a share? Currently shares are based on department and their respective funding of Triton (sshare). It used to be that departments had a share, and then each member had a share of that department. But for complex reasons we have changed it so that it’s flat: so that each person has a share, and the shares of everyone in a department corresponds to that department’s share. When you are below your share (relative to everyone else), you have higher priority, and vice versa.

Your priority goes down via the “job billing”: roughly time×power. CPUs are billed at 1/s (but older, less powerful CPUs cost less!). Memory costs .2/GB/s. But: you only get billed for the max of memory or CPU. So if you use one CPU and all the memory (so that no one else can run on it), you get billed for all memory but no CPU. Same for all CPUs and little memory. This encourages balanced use. (this also applies to GPUs).

GPUs also have a billing weight, currently tens of times higher than a CPU billing weight for the newest GPUs. (In general all of these can change, for the latest info see search BillingWeights in /etc/slurm/slurm.conf).

If you submit a long job but it ends early, you are only billed for the actual time you use (but the longer job might take longer to start at the beginning). Memory is always billed for the full reservation even if you use less, since it isn’t shared.

The “user priority” is actually just a record how much you have consumed lately (the billing numbers above). This number goes down with a half-life decay of 2 weeks. Your personal priority your share compared to that, so we get the effect described above: the more you (or your department) runs lately, the lower your priority.

If you want your stuff to run faster, the best way is to more accurately specify your time (may make that job can find a place sooner) and memory (avoids needlessly wasting your priority).

While your job is pending in the queue SLURM checks those metrics regularly and recalculates job priority constantly. If you are interested in details, take a look at multifactor priority plugin page (general info) and depth-oblivious fair-share factor for what we use specifically (warning: very in depth page). On Triton, you can always see the latest billing weights in /etc/slurm/slurm.conf

Numerically, job priorities range from 0 to 2^32-1. Higher is sooner to run, but really the number doesn’t mean much itself.

These commands can show you information about your user and job priorities:

slurm s

list of jobs per user with their current priorities

slurm full

as above but almost all of the job parameters are listed

slurm shares

displays usage (RawUsage) and current FairShare weights (FairShare, higher is better) values for all users

sshare

Raw data of the above

sprio

Raw priority of queued jobs

slurm j <jobid>

shows <jobid> detailed info including priority, requested nodes etc.

tl;dr: Just select the resources you think you need, and Slurm tries to balance things out so everyone gets their share. The best way to maintain high priority is to use resources efficiently so you don’t need to over-request.

Read more ...


Triton v3 is now default

Triton has a major update. You can read our previous info about this at Preparing for new Triton, and our “what has changed” in Triton issue #1593.

You might get SSH host key warnings.

It has the same name, and importantly the same user accounts and data, but all the software and operating system is changed. In particular:

All software modules are different

Any software which has been complied will need to be re-compiled.

Triton’s previous operating system was released in 2014. Security support runs out at the end of 2024 May, and it has to be updated. Stability is good for research, so we try to reduce the number of changes (compare)

We realize that a change is very disruptive and painful, especially since the expectation is that Triton never changes. But an old operating system makes problem for users too, and they have gotten more and more over the years.

Most of the transition for different types of software is described in Triton issue #1593.

Read more ...


Triton v3 SSH host key warnings

When updating Triton, many users will get a message like this (or similar things if you use other SSH clients like PuTTY):

SSH (Secure SHell) is made to be secure, and that means one it verifies the server you are connecting to via its ssh host key. The representation of this key is the fingerprint, like SHA256:OqCehC2lbHdl8mYGI/G9vlxTwew3H3KrvxKDkwIQy9Y. This means that the NSA or someone can’t intercept the connecting and get your password by pretending to be Triton. This is a good thing.

OpenSSH (the command line program on Linux, MacOS, Windows) saves these connection IDs (fingerprints) in $HOME/.ssh/known_hosts. Other programs may store the keys somewhere else.

The warning looks scary but the first thing to ask is “should the server I am connecting to have changed?”. If you have been directed to this blog post, then probably yes, it has. You should always think if the fingerprint should change, and if there is no reason for them to have changed, contact your administrators. You can usually verify the keys online, for example Triton ssh key fingerprints.

If you are on command line OpenSSH (Linux), it will propose a command that will remove the old host key:

For other programs, follow whatever prompts it might give to replace the host key fingerprint.

When you get a “The authenticity of host ‘triton.aalto.fi’ can’t be established”, verify the SSH key fingerprints that are presented, then click “yes” to permanently save them (until they change next, they can always be updated). The fingerprints for Triton v3 are:

Read more ...


libwebp security vulnerability and computational scientists

Recently, a major security vulnerability (CVE-2023-5129) has been found in libwebp, an image decoding library for the .webp format. This is major, since this library is embedded in many apps and web browsers and allows remote code execution just by opening a file. For computational scientists, there is still some impact - and it’s harder to compensate for. In short, just by processing an image in the .webp format, someone can take over your computer or session.

libwebp is the current issue, but the problem is general: computational scientists often create software environments and use them for a long time. These environments aren’t usually browsing the web (the most likely attack vector here), but they do involve lots of code installed from different projects. How does one manage security in this case?

This post may be updated

If you use web browsers or apps on your own desktops, laptops, phones, etc. - make sure update them!

If you don’t use images in your research, there probably isn’t much impact.

If you do, this is what could happen:

You make a Python / Anaconda environment which uses libwebp somehow - directly installed through Conda, or some other application.

You download a dataset containing images. You process them as part of your research with the old environment.

The malicious image runs an exploit. It has access to your whole user account on that computer: extract any data, add SSH keys for remote access, corrupt/delete data (which might not be backed up from the cluster…).

Many things have to happen here, but it’s very possible for it to happen. You could lose access to non-backed up data or code or other confidential or sensitive data could be compromised, since code from one project from your user account has access to all projects from your account.

One would normally fix things by updating software. But when you are dealing with a research environment that can’t easily be updated, what should you do? This is the real question here.

It’s a multi-layered problem, and the answer will depend on your work. libwebp is what we are thinking about now, but the problem is general: there are other security problems that occasionally come up that can affect more scientific code. How do you prepare for next time?

Update your environments (conda, virtualenv, etc). You could try to see if libwebp is inside of them (conda list | grep webp), but especially for Pip packages it might not be apparent.

Make your environments reproducible: If you define your dependencies in requirements.txt (Python), environment.yml (conda), or whatever is suitable for your language, you can easily re-generate environments to bring everything up to date. (delete old one, re-create).

If you pin versions of dependencies (like numpy==1.20.0), it’s possible it can pull in older versions of other dependencies.

Containerize your workflows. If code runs inside of a container, it keeps it isolated from the rest of the operating system and user account. (but containers aren’t usually designed for strict security, but it’s better than nothing).

If you use pre-built modules on the cluster, try not to use old versions. We’ll update some recent modules, but we can’t update all of the old ones. At least webp is in the default anaconda modules.

If you write or maintain software in general, keep it up to date as much as reasonable! Don’t make others get into a place where they are having to use old versions of libraries to make it work.

In general, think about your dependencies. Be at least a little bit suspicious before you install random other software, that may possibly pull in lots of other dependencies. Of course, as a researcher, you may not have much choice.

These commands seem to be able to update an environment to a newer libwebp. It seems to work on newer environments, but we don’t know for sure. Instead of mamba, conda in theory works but is to slow it may not be practical:

There is a major security vulnerability in libwebp. While the impact on computational scientists may not be that much, a bigger issue is the difficulty of keeping all of the environments up to date so that next time this happens, it’s easier to respond.

We hope to have more security recommendations for computational scientists in the future. If anyone is interested in collaborating on this, let us know.

Common apps which embed Chrome or libwebp: Chrome, Firefox, VSCode, Zulip, Slack, Discord… things that use Electron to embed a web browser are affected, and that’s many things.

Read more ...


Preparing for new Triton

Sometime in autumn of 2023 (e.g. October/November), we will do a major update of Triton: updating the basic operating system, and thus almost everything else. There are big benefits to this: newer basic operating system software, but also such a basic update affects almost every user. For a short time, this will make a lot of work for almost every user. This post gives advance warning and a chance of feedback of how to make the update most usable.

This post is just advance warning and things to prepare already. All actual instructions will come later.

We will update the basic operating system from CentOS 7 to something else (Red Hat 9). We’ve ordered all new management hardware to make the backend more reliable and manageable. Along with this comes with an update of the software build system, which should allow us to deploy software to our users even better. We’ll also update our configuration management system for more reproducibility.

We also hope to think about the usability of the new system: remove a lot of old options and add in new, simpler ways of doing what people need.

All data and storage will remain the same, so there is no big data migration needed.

The old and new clusters will be accessible at the same time (two different login nodes), with the same filesystems mounted (same data available) and some compute resources still available there, so that people can slowly migrate. But the old one won’t stay running too long, to avoid long maintenance effort or splitting of the resources.

The biggest problem with big cluster updates like this is reproducibility: does you work from a month ago still work in one month? If not, this is a big problem. It’s even worse if there is a much longer gap before you come back to it (paper revisions, anyone?).

You could say there are two things that can go wrong with a cluster upgrade or change:

Specific software/code that needs to be compiled and installed: Software needs re-compiling for new clusters or new cluster OS updates.

Whole workflows: you need to make all the pieces work together. Different paths and workflow managers may need updating.

What you can do:

Manage any messes you have earlier rather than later. It’s better if you slowly clean up over time, so you can focus on the differences once the change happens.

Know what software you are using. It’s easier for us to re-install something we have already installed when someone can tell us the exact name and version that they are using.

Tests for your software. Some way to validate that it works correctly.

Contact Aalto RSE for hands-on help supporting the transition. Come to the garage early and often.

If there are any annoyances about Triton that you’d like us to consider for the upgrade, now is the time to let us know so we can plan them. We especially value feedback on usability problems.

Discuss with us in our chat, or open a Triton issue.

This post has been updated with minor corrections, changes be found in git history.

Read more ...