Posted in 2026
What is AI? Everything, everywhere, all at once
- 2026 Apr 27
As Research Software Engineers (RSEs) in a university, we get lots of AI questions and projects. But what is “AI” exactly? It can mean almost anything, and it makes a bit of a problem when we and our customers may not be starting from a shared understanding of what “AI” is. Let’s discuss.
For us research engineers, a “real AI” project might mean “writing Python code and optimizing deep learning training using a large computer cluster” or “does a Mixture of Experts-model produce better accuracy compared to a traditional feed-forward network in my use case?”. In practice, “AI” questions to us have included:
Problems with web APIs
Problems with PyTorch
Problems with HPC libraries
Problems with laws and regulations
Problems with web servers
Problems with Python installations
I think the effect of “AI” is to reduce the effort needed to use computing tools, so the amount of computing people want to do increases. These increases are proportional to all the usual computing projects we get, even if the projects aren’t exactly deep learning training. Thus the increased need for our team and a general increase in computing and data literacy.
In the end, “AI” is such an overloaded term that it can mean anything. To better ask for help with “AI”, it’s good to be able to decipher it down to the actual topic. This post isn’t about scientific computing general, so let’s look deeper below at what the core “AI” use cases are:
This blog post evolved out of a colleague’s talk at NoBSC 2026, where they pointed the issue that “AI” can mean anything, thus it isn’t a good to speak of “AI” projects without further clarification. In that talk, we thought of two broad categories for the actual uses of “AI”:
“AI” can mean pattern matching and decision making. In this, you have some input data and predict some output based on that.
“AI” can mean content generation, as in generating text, images, or more. This is, in fact, also a prediction of an output based on an input prompt.
You notice these two categories are pretty normal things? That’s because they aren’t, it’s just that modern machine learning (deep learning) has gotten so much better at it than even a decade ago. It’s not intelligence, it’s predictions that seem human-like or better.
These categories aren’t scientific - it’s just what we use as a base for discussion with our customers.
This is a traditional use of machine learning. Here, you take input data and predict some output. This is usually done by having a lot of sample input data and corresponding “true” outputs for that data, and training a model. When you give the model new input data, it can predict an output. This is known as “supervised learning”.
This needs specific training data and the output model is usually specific to that domain or model. Deep learning can certainly find patterns that humans or traditional machine learning can’t - assuming such patterns exist in the first place.
For an advanced example, an industrial plant has sensors monitoring the whole process and records of each time it broke down. By using this data, “AI” might be able to predict breakdowns more accurately that a human or non-deep machine learning tools could. These types of uses are relatively un-objectionable to society.
Other examples include things such as insurance companies using all of their data to analyze claims and preemptively deny coverage to those who think are more likely to get sick. Or using pattern matching to approve/deny claims without a human taking responsibility. These have much more societal impact and thus lead to lots of suspicion about “AI”, since it’s being used to diffuse responsibility.
Another type of pattern matching is classifying things without having any true labels. This is called “unsupervised learning”. One example would be the “Netflix Challenge” where scientists tried to use watching data to predict what would be relevant recommendations. It doesn’t matter what the detected categories are, just that things go together.
Just like the section title says, this is generating content. Examples could be chatbots (generating text) or image generators based on some input. This has become so widespread since 2022 that it’s easy to think that this is “AI”.
Under the hood, this is actually pattern matching, since it takes a prompt, uses all the previous input data, and generates a predicted output. There isn’t actually “intelligence” under the hood, and it’s all limited by the power of the algorithms and the source data. It can be wrong, not useful, etc. Content generation is definitely not at human-level, and does not have the wide background and task knowledge of a human. It is much faster, though.
Some people can use content-generating methods to make predictions, which isn’t as refined as actual pattern-matching/decision-making method, but because of the general-purpose nature of content generation, it can work without much effort. One example I’ve heard of is using large language models (LLMs) with an input such as “is this social media post positive or negative sentiment? Answer with one word ‘positive’ or ‘negative’”. You get a sentiment analyzer with very little work, that has some large implicit background knowledge. This is the power of these so-called “foundation models” that can do many tasks. The downside is it uses much more computing resources and more chance of going off the rails (hallucination, implicit biases of input data, etc.).
Content generation is powerful, but has potential for misinformation or misuse on a massive scale. One probably wants to be careful when using content generation for predictive tasks. The prevalence of generated content and its potential for misuse is behind a lot of the backlash to “AI”.
While not a category, there are also commercial platforms that do the above things. They are set up to be easy to use by a broad audience. We can help with these things, but most of the actual “AI” work is already done. ChatGPT’s early dominance in “AI” was probably caused almost as much by figuring out a useful, usable interface for the general public as their underlying “AI” technology.
So, while commercial platforms and purchasable tools may be easy, they usually have very limited information about how they work (hidden behind “AI” to make you think it’s intelligent), and similar things can be done locally given enough time and effort. The delegation of accountability (and thinking) to third-party platforms is also behind part of the backlash to “AI”.
Examples include the chatbots that everyone uses, coding assistants, text summarizers, etc.
If you, or someone, has an “AI” project, the first step is to think deeper and figure what is really the goal. rkdarst has an old saying, “explain it to me again without any terms invented or made popular in the last ten years”. This helps to peel back these layers and get a description of what is actually needed, and is probably quite useful when trying to figure out what an “AI” project actually is. If you can explain your project without saying “AI”, you are well on the way to solving it with “AI”.
Also, we shouldn’t separate “AI” from computing skills in general. “AI” lets people do more computing with less work, but it means everyone needs a higher base level of knowledge (even about no-exactly-“AI” topics). Think back to when cars became cheaper and more reliable. Once cars became more common, people didn’t have to know as much about their intervals, but many more people had to learn how to drive and interact with them. (We’re not saying we want our cities infested with cars, nor AI). This is true even when the field of study isn’t “computing”. Don’t let “AI literacy” become “ChatGPT literacy”, it is literacy in computing, data, and problem solving - and a healthy suspicion for details hidden behind jargon.
You certainly can benefit from “AI” without knowing all the details. It’s very hard to benefit from it without knowing your actual goal.
Note: this blog post was written with zero “AI” content generation. My anonymous colleague has contributed some of the key ideas and title.
Machine learning: A field of study using statistical algorithms to learn from data and generalize to unseen data.
Deep learning: Machine learning using multilayered neural networks.
Supervised learning: Machine learning methods taking input and labeled “true” outputs as the training data.
Unsupervised learning: Compared to the above, algorithms learning from input data which is unlabeled.
Training: The process of iterating through the input data to find patterns, resulting in the trained model.
Model: The learned parameters from input data and training, which can be combined with the right code to make predictions.
Application programming interface (API): An interface that makes it easy for a computer code to interact with something (as opposed to a human-optimized interface).
“Artificial intelligence”: Did you think I’m going to give a single simple definition here?
Fundamentals of secure AI systems with personal data -> What is artificial intelligence?
The dilemma of setting Slurm parameters
- 2026 Apr 16
Sometimes people come to us and complain that there are idle cluster GPUs, and they could be used if there wasn’t a per-user limit on max GPUs that any one user could use at once. Other people come to us and complain that all GPUs are in use by various people, and they can’t start jobs quickly.
Perhaps you see the dilemma. People both expect there are usually available GPUs for them, and also that GPUs can be used to the fullest extent. We’ll use GPUs as an example here, but this isn’t specific to GPUs.
We are very aware of this and try to enable as much overall research as possible. Still, there are choices to be made, and in this post we will try to describe them, so that our users can better give feedback for how we should adjust things.
We wrote this post so users can understand what’s going on in the background and let us know when something seems wrong.
A HPC cluster is fundamentally designed for batch work: for a given amount of resources, schedule them as efficiently as possible to get the maximum amount of computation out, with as high resource utilization as possible. We, and many clusters, have some resources reserved for interactive use, since interactive testing and debugging is extremely useful for getting work done.
We also have a “fairshare” system, where the use of users should be equalized over the long run. This means that if one user runs a lot now (because the cluster was somewhat empty), their priority will be less later. The Triton priority decay half-life is 14 days.
The situation: one user is dominating GPU use. Is this fair?
If one user was able to fill up the cluster, that means that at that time the jobs started, there were no other users waiting for those resources. If there were multiple users waiting, then the resources would have been split a bit more fairly (according to their priorities).
Don’t worry, once their current jobs finish, their priority will be much lower and everyone else will have a much higher priority to run next.
Situation: one user has a lot of GPUs in use. I know I can wait for them to finish, but they last many days. Do I really have to wait X days before I can get stuff started?
Sometimes, if the cluster is free, a user can submit many long jobs. This means resources aren’t being wasted right now (which is good), but the resources remain occupied for that duration (max time 3 or 5 days on Triton). This is a bit annoying. This is mainly a problem when the cluster is mostly empty, since if there are lots of things running, jobs turn over frequently enough that people can get some resources quickly (and the heavy users have lower priority at the time, so the more recent users have priority for free slots).
In this case, we usually wait and just let the situation develop, and once we get to a “steady state fullness” jobs cycle fast enough it’s not usual for the cluster to get to this state. There aren’t that many free GPUs opening up all at once without multiple users queuing, so it can’t get overloaded by one user.
We don’t want to prohibit all long jobs, since long jobs are useful especially for new users. Yes, heavy users can and should adapt to checkpointing and mainly using small jobs, but we don’t want to force everyone to go straight into hardest, purest way of using a cluster.
One option is the Slurm partition parameter GrpTRESRunMins
(“Trackable RESources Run Minutes”): this is a limit not on number of
jobs, or length of jobs, but sum(job_resources×job_length). If this
was 120 GPU-hours, then one could run one 5-day job, or thirty 4-hour
jobs at once. By tuning this, we can make it where one can run long
jobs, or use all the cluster, but not use all the cluster with long
jobs.
Situation: There are free GPUs, but Slurm doesn’t let me use them. Isn’t this a waste?
Clearly this is the opposite situation of the two situations above. We’d normally like to prevent this situation, but there are some reasons it may occur. Sometimes, we do have a limit on the max number of jobs that can run. Hopefully this is temporary while we work something out. Sometimes, we have various resources reserved for short jobs, for interactive jobs, and so on. Sometimes someone has bought their own dedicated resources and we want to leave some available for them.
Situation: I have a conference deadline in a few days, and I need as many resources as possible to finish my submission.
Unfortunately, this isn’t really how a cluster works. It could work for clusters that are really bought by one group and they can decide what runs, but Triton resources are bought for general use and we don’t free up resources for deadlines. The fairshare system may also affect you here, with you getting less resources if you have used a lot in the past.
There are other clusters that may be usable and have more free resources (or you may have higher priority since you haven’t run as much there lately). It’s good if you can make your code portable, or ask for our help early enough in your work and we can help do that.
Situation: Each time I submit a job, I have to wait to see if it works, edit, and try again. This is slow.
Indeed. We try to save some resources in a debug partition
(gpu-debug), which are in theory always available (but have a very
short time limit, like 15 minutes). However, it’s only easy to
allocate a whole node to a debug partition, and four or more GPUs is a
lot to spend on a partition that’s mostly idling, so sometimes we
don’t have the most advanced GPUs available there.
Triton’s GPU debug partition does overlap with a lot of different other nodes and has a high partition priority, so if you submit there it’ll hopefully run ASAP.
We also have interactive partitions, which you can open in OnDemand to do development work. We don’t have GPUs with huge memory there, since interactive GPUs are mostly idle and not doing computing (we mainly have older GPUs and Multi-instance-GPUs (MiGs) which split one GPU into several with smaller memory). Everything in the rest of this post, about balancing amount used and convenience, can be repeated with interactive GPUs. The more we give for interactive work, the more GPUs are idle overall. It’s a balance we are constantly trying to adjust.
Situation: I’ve tried to see what is slowing stuff down, and noticed one user has low GPU efficiency. Should they really be using GPUs in that case?
We aren’t aiming for maximum GPU calculations, we aim for getting the most work done. Some work is CPU-bound but GPUs can speed up part of it. Some work uses other third-party code and can’t be optimized. Sometimes the bottleneck is just somewhere else but the GPU still significantly speeds things up. With this, we don’t want to prevent someone from doing their work just because it’s not perfectly GPU-bound.
We do scan for low efficiency users and invite them to garage to see if we can make things faster. If someone is using expensive resources, we consider there’s an obligation to work with us to make the usage as efficient as possible. And yes, sometimes they are using the GPUs optimally for their own case. Also note that GPU occupancy doesn’t mean the GPU is doing useful work - sometimes the measures can be off.
If you see a user that you think is inefficiently using the resources, don’t contact them yourself (unless you are their friend, colleague, etc.). Let us know and we’ll investigate if we haven’t done so yet.
Situation: My group has purchases dedicated resources and they are working as part of the cluster. Someone else is using them, and it slows down our use.
We set up the way resources are shared when someone gets the resources. Normally, the deal is we want overall highest use of the cluster, since after all the university is also contributing significant sysadmin and electricity resources. We don’t necessarily guarantee that you can use it right away, but we try to make it as close to that as possible. With some dedicated resources, we have used preemptible jobs (see below) for the “common” access.
One solution to many of the things above is to make jobs in partitions preemptible, which means that if a higher priority jobs comes along, a currently running job can be killed. It’s killed with a short grace period to save its state (which it should be designed to do) so that it can be resumed.
Preemptible jobs are great since they allow all the otherwise-unused resources to be scheduled. However, it can be a big step up with effort to manage saving state and scripting the continuation of jobs at scale. We want new users to have some easy onboarding path, so we will always make preemptible jobs opt-in. We expect that big users will have enough benefit to adapting to preemptible jobs, which helps to improve efficiency for everyone else who can’t.
If you can adapt your work to use preemptible jobs (and you are using a cluster that has them enabled), then we encourage you to make use of that option.
Partition layout and overlaps
Maximum runtime
Maximum job size
GrpTRESRunMins
Preemptibility
Most importantly, while it may be possible to make some theoretically perfect arrangement for maximum use and minimum waiting when not expected, that can make the cluster usage much harder to explain. So we try to find a balance of those things and overall usability. So then, at the end, it becomes a trilemma: maximum resource usage, resources always standing by for you, and usability.