Posted in 2023

ASC development day, 2023 August

2023 Oct 30

We had another development day (previous: ASC development day, 2023 March). It went mostly like the last one, and we have less important news for the world, but below is the summary anyway.

We have about 1550 people with accounts, with 202 new account requests in the last six months.

Most routine issues tend to be about software installation, which is good (this is the actually hard part, it’s good people ask us).

We are still on track for about 500 garage visits per year. We don’t try too hard to keep track of them all, we might get about 75% of them.

The number of interactive and Jupyter users are increasing, while Open OnDemand is decreasing. This is the wrong direction from what we’d like. We will open OOD to connections from all of Finland to make this easier.

Triton v3 is still on the way. This isn’t a new cluster, but a new operating system which individual nodes will be migrated to slowly (while maintaining the same accounts and data). Most of this happens in the background, but the change of base operating system images will require most code to be recompiled, which will require attention from many users. The transition can be made slowly, both old and new OSs will run for a time being. There won’t be a change in total amount of computing power.

An upcoming blog post will discuss this more, and the effects on users. Now is the time to start preparing. We still expect the transition to happen sometime in the autumn.

We are thinking to merge home and scratch directories, to make a common quota for both. This would improve usability by reducing the frequency of home quota affecting usage. We’d welcome any other usability suggestions.

Practically, we are using the chance to automate things even more, which should make it easier to manage in the future.

Teaching has gone well. For this academic year, we’d like to add back in a few smaller, special-purpose courses (not just to teach them, but also to get good quality video recordings for the future).

Goals:

Developing and delivering the “workflows” course with CodeRefinery

Short courses to record (e.g. rerun of debug series, once a week, record and publish).

Update Debugging linking the different debugging course repositories.

LUMI is the new EU cluster with plentyful GPU resources. A user can essentially get as many GPU resources as they need with no waiting, but since the GPUs are AMD, there is some initial barrier. Our general feeling remains: “we won’t recommend our users directly go and use LUMI, but we recommend they talk with us first and we help them use it”.

Next steps:

Continue encouraging users to contact us.

RSEs will ask the top GPU user each week if they would like support with taking LUMI into use. We’ll go and do all the setup for them.

Slide on infoscreens around the buildings?

Read more ...

libwebp security vulnerability and computational scientists

2023 Sep 28

Recently, a major security vulnerability (CVE-2023-5129) has been found in libwebp, an image decoding library for the .webp format. This is major, since this library is embedded in many apps and web browsers and allows remote code execution just by opening a file. For computational scientists, there is still some impact - and it’s harder to compensate for. In short, just by processing an image in the .webp format, someone can take over your computer or session.

libwebp is the current issue, but the problem is general: computational scientists often create software environments and use them for a long time. These environments aren’t usually browsing the web (the most likely attack vector here), but they do involve lots of code installed from different projects. How does one manage security in this case?

This post may be updated

If you use web browsers or apps on your own desktops, laptops, phones, etc. - make sure update them!

If you don’t use images in your research, there probably isn’t much impact.

If you do, this is what could happen:

You make a Python / Anaconda environment which uses libwebp somehow - directly installed through Conda, or some other application.

You download a dataset containing images. You process them as part of your research with the old environment.

The malicious image runs an exploit. It has access to your whole user account on that computer: extract any data, add SSH keys for remote access, corrupt/delete data (which might not be backed up from the cluster…).

Many things have to happen here, but it’s very possible for it to happen. You could lose access to non-backed up data or code or other confidential or sensitive data could be compromised, since code from one project from your user account has access to all projects from your account.

One would normally fix things by updating software. But when you are dealing with a research environment that can’t easily be updated, what should you do? This is the real question here.

It’s a multi-layered problem, and the answer will depend on your work. libwebp is what we are thinking about now, but the problem is general: there are other security problems that occasionally come up that can affect more scientific code. How do you prepare for next time?

Update your environments (conda, virtualenv, etc). You could try to see if libwebp is inside of them (conda list | grep webp), but especially for Pip packages it might not be apparent.

Make your environments reproducible: If you define your dependencies in requirements.txt (Python), environment.yml (conda), or whatever is suitable for your language, you can easily re-generate environments to bring everything up to date. (delete old one, re-create).

If you pin versions of dependencies (like numpy==1.20.0), it’s possible it can pull in older versions of other dependencies.

Containerize your workflows. If code runs inside of a container, it keeps it isolated from the rest of the operating system and user account. (but containers aren’t usually designed for strict security, but it’s better than nothing).

If you use pre-built modules on the cluster, try not to use old versions. We’ll update some recent modules, but we can’t update all of the old ones. At least webp is in the default anaconda modules.

If you write or maintain software in general, keep it up to date as much as reasonable! Don’t make others get into a place where they are having to use old versions of libraries to make it work.

In general, think about your dependencies. Be at least a little bit suspicious before you install random other software, that may possibly pull in lots of other dependencies. Of course, as a researcher, you may not have much choice.

These commands seem to be able to update an environment to a newer libwebp. It seems to work on newer environments, but we don’t know for sure. Instead of mamba, conda in theory works but is to slow it may not be practical:

There is a major security vulnerability in libwebp. While the impact on computational scientists may not be that much, a bigger issue is the difficulty of keeping all of the environments up to date so that next time this happens, it’s easier to respond.

We hope to have more security recommendations for computational scientists in the future. If anyone is interested in collaborating on this, let us know.

Common apps which embed Chrome or libwebp: Chrome, Firefox, VSCode, Zulip, Slack, Discord… things that use Electron to embed a web browser are affected, and that’s many things.

Read more ...

Aalto public servers requiring passwords with SSH keys

2023 Sep 27

From 2023-09-25, publicly accessible Aalto server login is changing and will now require a password in addition to SSH keys. This will have a significant usability impact on some users. This post is made as a landing page for users who need immediate, practical help and for whom the aalto.fi page isn’t findable or detailed enough. The official contact is the IT Services service desk

The reference page SSH has been updated to include detailed reference information for every common operating system and SSH client. Secure Shell is one of the standard methods of connecting to remote servers and it is important that users of all skill levels are able to use it securely.

This change is not from Science-IT, but since it will affect many of our users but is not being publicized or supported very much, we are preemptively doing some major user support.

What is not happening is: requiring locally encrypted SSH keys (although this is highly recommended).

What is happening: When you connect to an SSH server from outside Aalto networks, you will need to have an SSH key set up and send your Aalto password to the remote server interactively.

If you already have an SSH key set up, you’ll start to be asked to enter a password every time you connect.

You can always connect to the Aalto VPN in advance to prevent this, but there may be cases where this isn’t a practical solution.

If you do not have an SSH key set up, you should:

Follow SSH to generate an SSH key - we have heavily revised this page to cover almost every common SSH arrangement.

Place your SSH key on any common Aalto server (kosh, etc. - not Triton since that doesn’t share home directories with the public servers)

You could connect by VPN, and then use normal password to connect and add the key.

You could use https://vdi.aalto.fi with a Linux computer to place the key.

You could place the key while on an Aalto network (as usual, this means eduroam or aalto only from an Aalto computer).

You could use another computer that’s already set up with an SSH key to place the key.

The key will then be available on all common Aalto shell servers (and other workstations), since they share the home directory.

Re-read SSH, in particular the SSH key agent, ProxyJump and Multiplexing sections, to see how to configure your SSH to minimize the number of times you need to enter passwords.

This was needed for security as evidenced by recent history. Password-only login is simply not feasible anymore (nor for some time). Removing passwords as an option is good security practice that most organizations should adopt these days.

But why an ssh key and remote password instead of a properly encrypted SSH key? An SSH key requires something you have (the key) and something you know (the password), doesn’t it? And doesn’t require sending a plaintext password to the remote server. This was decided by whoever is setting this up, probably partly due to the fact that it is not possible to enforce passwords on SSH keys via the server config.

In general (outside of Aalto), you should use SSH keys everywhere and be wary of ever sending plaintext passwords to remote servers (even in conjunction with an SSH key). Security is important, and by using SSH keys with local encryption of the key you are doing your part.

We apologize for the difficulty in getting work done and want to help you as much as possible (though Science-IT was not the ones that designed this or communicated it).

There are, unfortunately, some trivial workarounds that involve putting your password in plain text on your computer to script things. However, please note that writing passwords down (outside of password managers) is bad security practise and against the Aalto password guidelines. It is better to contact us to help design a better and more secure workflow, or ask IT Services and ask them to consider other use cases.

Read more ...

Preparing for new Triton

2023 Sep 12

Sometime in autumn of 2023 (e.g. October/November), we will do a major update of Triton: updating the basic operating system, and thus almost everything else. There are big benefits to this: newer basic operating system software, but also such a basic update affects almost every user. For a short time, this will make a lot of work for almost every user. This post gives advance warning and a chance of feedback of how to make the update most usable.

This post is just advance warning and things to prepare already. All actual instructions will come later.

We will update the basic operating system from CentOS 7 to something else (Red Hat 9). We’ve ordered all new management hardware to make the backend more reliable and manageable. Along with this comes with an update of the software build system, which should allow us to deploy software to our users even better. We’ll also update our configuration management system for more reproducibility.

We also hope to think about the usability of the new system: remove a lot of old options and add in new, simpler ways of doing what people need.

All data and storage will remain the same, so there is no big data migration needed.

The old and new clusters will be accessible at the same time (two different login nodes), with the same filesystems mounted (same data available) and some compute resources still available there, so that people can slowly migrate. But the old one won’t stay running too long, to avoid long maintenance effort or splitting of the resources.

The biggest problem with big cluster updates like this is reproducibility: does you work from a month ago still work in one month? If not, this is a big problem. It’s even worse if there is a much longer gap before you come back to it (paper revisions, anyone?).

You could say there are two things that can go wrong with a cluster upgrade or change:

Specific software/code that needs to be compiled and installed: Software needs re-compiling for new clusters or new cluster OS updates.

Whole workflows: you need to make all the pieces work together. Different paths and workflow managers may need updating.

What you can do:

Manage any messes you have earlier rather than later. It’s better if you slowly clean up over time, so you can focus on the differences once the change happens.

Know what software you are using. It’s easier for us to re-install something we have already installed when someone can tell us the exact name and version that they are using.

Tests for your software. Some way to validate that it works correctly.

Contact Aalto RSE for hands-on help supporting the transition. Come to the garage early and often.

If there are any annoyances about Triton that you’d like us to consider for the upgrade, now is the time to let us know so we can plan them. We especially value feedback on usability problems.

Discuss with us in our chat, or open a Triton issue.

This post has been updated with minor corrections, changes be found in git history.

Read more ...

The Aalto RSE hiring process

2023 Aug 21

This post describes the hiring process of Aalto RSE. The goal is to make hiring more equitable by providing the background information so that everyone can apply successfully. For those not applying to us, it might still provide some valuable insight about how to market your skills as a PhD making a sideways career move. What’s said here may not apply to every organization, but it might give you some things to think about.

Disclaimer: This page is a rough average description of the past, not a promise to always do this in the future.

Aalto RSE has usually hired people who have postdoc experience and will transition to a more applied software/data/computing oriented role (as opposed to being focused on writing papers). For many people, we are the first experience of job applications post-degree and thus people have to learn how to present their skills in a new, non-academic context.

One should start by reading about us - we have lots of information publicly available about what we do and how we think. This should be understood in order to do the next steps well.

The cover letter is the most important thing we read, and the first and most important filter. It’s read before the CV.

At the level we are at, almost everyone’s CV and achievements are effectively equivalent. Does it matter who got the most fancy papers? Who has the most awards? The classes people took? When most of a person’s knowledge has come from self-study, probably not. The cover letter is the chance to interpret your skills in the context of the job you are applying for.

When reading the cover letter, the first question we ask is “does this person know what they are applying to and know why they think they are a good fit?” (It’s always interesting to get letters which clearly don’t understand the job, but on the other hand it’s an easy filter.) The first paragraph should answer this question and that the rest of the letter will go into detail about why. Start with the most important information, don’t make it hard for us.

Beyond that, talk about interests and skills as relevant to the organization. Discuss special projects, including non-academic ones or random things that you are interested in (this is especially true for us, since we are the transition from academia to practical work). Our job advertisement gives you some specific ideas that you can talk about. Anything specifically important to the job should be pointed out here and not just left in the CV.

If you don’t exactly fit the stated job requirements: here is the chance to explain it. The job requirement has to say roughly what we need (to not waste people’s time when applying, and because our hiring decisions must be justifiable based on the requirements), but there are many cases where someone with a different experience can accomplish our actual goal (as said in the job ad or found in your background research). A person that can say this, that they are adaptable, and will have a very good chance.

We have adopted some system of anonymous recruiting. We request that cover letters are submitted without identifying information (name, signature, etc) so that one person gives them numbers, and a broader group tries to take a non-biased look at them. After this initial impression, we bring in the rest of the application. Don’t make assumptions about what the reader will know about your background, just say it.

The letter should be as short as possible to get the information across. One page is usually about the shortest we get, and a bit less than two pages is typical. But if it’s engaging, we’ll read as much as you write. Remember, most important information first, don’t make us hunt for things.

Update 2024: Do you want to use AI to write your cover letter? Please think again. Since LLMs became a thing, cover letters have become harder to read, longer, and more generic-sounding. It’s better to write in your own voice and be shorter than rely on what AI gives you.

The CV serves as non-anonymous reference information, but they are hard to read and all look pretty similar. To be honest, we don’t worry that much about the format and contents here: get us basic factual information in the most efficient way. For our particular jobs, non-academic skills such as software/data tools are more important than scientific articles, etc. Remember, we are busy and have plenty of applications, make it easy to read.

Open Science isn’t just good for research, it’s good for you, too. If you can point to public repositories of work you have done, this is very useful. Things like Gitlab/Github profiles with activity and your own projects, links to data you have released, etc. They don’t have to be perfect - something is better than nothing. The best case would be a few projects which are well-done (and you know it and point them out to us), and plenty more stuff that may be of lower quality to show you can get simple stuff done simply. Not everyone is fortunate to have a field where they can practice open science throughout their career, but even publishing a project or two before they apply for a job with us is very useful.

Despite what the previous section said, we do try to dig through applications that seem on-topic but don’t say everything we are looking for, to give them the most fair shot we can.

We always need to heavily filter the list down. Some relevant filtering includes:

Do they know what job they are applying for? Can they connect their skills to the job?

Have they touched on the main points in our job advertisement and the linked “Become a RSE” page?

Are they interested in teaching, mentoring, and real collaborative projects? Do they know what kind of teaching and mentoring we do?

Is there enough knowledge about the research process?

Any relevant skills about this call’s particular topic (if there is any)?

How do their skills and experience match what our team is currently missing, regardless of the open call?

How similar has their previous work been to “research engineering” (helping the research process) instead of only focusing on academic promotion?

The recruitment team makes several passes over and we discuss how to filter down. We try to get a good variety of candidates.

Sometimes, there is some initial recorded “video interviews”, which provide some initial familiarity in both directions before the actual interviews. We know these are non-interactive and a recording isn’t a conversation so this is harder than an interview, but we consider that when watching them. One shouldn’t worry too much about these, if we do them.

Our actual interviews are not designed to be stressful. We have some prepared questions and go through them in a friendly manner. You have a chance to ask questions to use at the beginning and end (and any other time too). The questions are designed to hear about your experiences and not trick or test you.

We don’t currently ask technical challenge questions. The number of things which you’d need to know is so broad, it’s more important that you can learn things quickly. Since we usually interview relatively advanced people, we can instead look at existing projects they have done and check references, without having to do a technical challenge. This may change depending on the type of candidates we are interviewing, but just like the main interviews we are more interested in how people think, rather than raw knowledge.

In the future, there might be more “meet the team” kind of events.

We want to respond to people as soon as possible, but there’s a simple fact: we don’t want to tell anyone “no” until we are very sure we have an acceptance (we don’t want to tell someone “no” and then hire them later), and we have very many qualified candidates. So there is often an unfortunately long delay in hearing back. We hope that everyone knows within a month, though (and ideally ~2 weeks if all goes well).

We get a relatively large number of applications, with a lot of good people. So far (before 2023), we have been hiring at a relatively high level - researchers with postdoc experience who have been some sort of RSE-like experience with helping others with research (beyond only focusing on making papers for themselves) and technology. Don’t let this discourage you. There are many qualified applications, so if you don’t get selected, that doesn’t mean that you were unqualified. We look at everyone, regardless of their level, for every position. The fit to our particular job is more important that anything else, so keep trying until you get the right fit - it’s just a numbers game.

For reference, this is an older job application text, so that you can see how the things above are integrated. (to be updated with the 2023 version soon)

[ standard header removed ]

Aalto Scientific Computing is looking for a

Research Software Engineer/Supporter

To a permanent, full-time position.

Are you more of a programmer than your researcher colleagues? Are you more of a researcher than commercial developers? Do you fit in both, but have a home in neither? Be a Research Software Engineer with us and find your home. If you are looking for a career path which combines the interesting parts of both fields, this is a good choice.

Aalto Scientific Computing is an elite “special forces” unit of Research IT, providing high-performance computing hardware, management, research support, teaching, and training. Our team consists of a core of PhD staff working with top researchers throughout the university. Our services are used by every school at Aalto University and known throughout Finland and the Nordics. All our work is open-source by default and we take an active part in worldwide projects.

In this position, you will:

Provide software development and consulting as a service, depending on demand from research groups.

Provide one-on-one research support from a software, programming, Linux, data, and infrastructure perspective: short-term projects helping researchers with specific tasks, so that the researchers gain competence to work independently.

As needed and depending on interest, teaching and other research infrastructure support.

Continually learn new skills as part of our team.

Primary qualifications: There are two main tracks, and candidates of diverse backgrounds are encouraged to apply – every candidate will be evaluated according to their own unique experiences.

PhD degree with research experience in some computational field and much knowledge of practical computing strategies for research, or

Software developer or computational scientist with a strong software/open source/Linux background, scientific computing experience, and some experience in research. Masters degree or similar experience.

This particular call emphasizes the ability to work in machine learning and AI environments. The ideal candidate will be working closely with machine learning researchers, and thus a background in machine learning is highly desirable.

Important skills:

Ability to tackle any problem with a researcher’s mindset and a developer’s passion for technology.

Experience or knowledge of the principles of open source software, open science, and software development tools such as version control.

Please see https://scicomp.aalto.fi/rse/become-a-rse/ for more information on what kind of skills we value - or more precisely what you are likely to learn.

What we offer:

You will join the dynamic Aalto Scientific Computing team, where you will learn from some of the best research IT specialists in Finland.

Co-working within top-quality research groups, getting experience in a wide variety of fields and developing an extensive network of scientific contacts. This includes contacts to the Aalto startup scene and community.

A way to be close to the research process while focusing on interesting computational problems and not the publication process.

Our program will offer you a chance to improve your software skills – you are expected to engage in plenty of professional development.

Open Source is our expectation. All (or most) of your code may be open source and may be added to your public CV, depending on the needs of researchers.

Salary will be according to experience, for a recently graduated PhD similar to a postdoc salary. Work hours are flexible, but are expected to sync with the audience being served. Primary workplace is Otaniemi, Espoo (Helsinki region), Finland. Aalto University has a hybrid work policy which allows 60% remote work possibility, and our team takes good advantage of this flexibility.

To apply successfully:

Please include a separate cover letter (~1-2 pages). Please try to write your cover letter avoiding information like name, gender, nationality or other demographic information that is not directly related to why you would be the right person for this position (this includes, for example, a signature on the letter) unless you think it benefits you. This will assist in anonymous recruitment possibilities. The letter should include for example:

Why being a Research Software Engineer is for you,

past research experience, if any

past technical teaching or mentoring experience,

past software development experience (even informal self-learning),

past Linux, command line, or scripting experience,

highlight one (or a few) collaborative projects you have taken part in and your role within it, and

what you bring and what you intend to learn.

A normal professional or academic CV including

a list of your technical and programming tools and level of proficiency (e.g. basic/proficient/expert). This is the time to show the breadth of your experience.

Github link or other public sample code. If not available, whatever is possible to demonstrate past programming experience. Please highlight one or two of your outstanding research software projects.

[ standard footer removed ]

Read more ...

Whisper deployed on Triton, LLMs coming

2023 Aug 08

Whisper on Triton documentation

OpenAI Whisper is a tool for speech transcription. It works well and has potential applications in many different research and non-research use cases. Using it isn’t too hard - if you can install it and if you have a GPU. Often, the installing can become a big barrier, especially for “just testing”.

Luckily, we have a cluster with GPUs and a way to provide software for researchers. We’ve made Whisper available on the cluster as a module, so it’s trivial to use it for any audio data you may have. All one needs to do is:

It might look complicated, but all you need to do is copy and paste. The first words request the resources, the middle specifies your file, and the last are some standard options to make it do things like use our pre-downloaded model files. Yes - this still requires knowledge of how to use a cluster in general, but once you’ve got that knowledge, transcribing audio is trivial. We have a self-study course on cluster usage, and users can always drop by and ask us for help, for example our daily garage each day.

See the Whisper on Triton documentation for more information on the use.

We are also preparing a way to do this through the cluster web interface Open OnDemand, which will remove most of the need to know how a cluster works and make the tool even more accessible to other communities.

We hope to make other tools available like this.

Whisper is just one of the latest tools, but you’ve probably noticed that large language models are very popular these days. There are, in fact, some that can run locally on our own cluster, and our goal is to deploy more of these so that they can be easily tested and used. The intention isn’t to make a replacement for existing LLM services, but make internal for testing, research, and development use easier.

Local installs have various benefits, including lower cost (since we already own the hardware), being able to ensure reproducibility longer-term (since models are locally downloaded and preserved), and being able to use without various registrations. The downside is that the most popular ones ones aren’t available for local use.

Contact us if you need other models deployed, or if you have trouble using what’s already out there. We are still in an early phase, and there will probably be some difficulties in availability, accessibility, and reusability. Contact us early if you notice anything that’s not right. We both help installing things and help using them as a research engineer partner.

It’s clear that artificial intelligence and machine learning tools will become more critical tools for other research. The difficulty in deploying and using them could become a barrier, and that is where Aalto Scientific Computing comes in. It’s our goal to make sure the infrastructure that researchers need is ready and able to be used by everyone, not just those with classic HPC experience.

Here we go over some implementation details, which may help others who want to deploy similar things on their own clusters. If you just want to use things, you don’t need to read on.

We installed whisper in a container, so that all dependencies are packaged together and things are portable. The model definitions themselves are not included in the container, but mounted in. We try to find options that allow one to specify the model and model directory, so that the user can try out different models without downloading each one. The Lmod module file prints out some help when loaded.

We’ve got two versions installed: normal Whisper, and Whisper-diarization (which can identify speakers in the transcript).

Whisper and diarization both have multiple different implementations. It’s bit of guesswork to try to see which one is the easiest to get running / works the best (not about quality of transcript, but easy of deployment in container and with local models). This led to a change to another implementation of diarization midway since the current one is more active in development and seems overall slightly better. A lot of the work was fortunately transferable to the new implementation.

There were the common issues with getting the right dependencies in a container and getting the GPUs to work there. This is pretty standard by now.

Most implementations of whisper want to download models when running it. This might make sense for general user, but doesn’t really make sense on cluster. Depending on the implementation, getting it to use local models is not always trivial. Since GPU execution of diarization uses several models at once, there doesn’t seem to be a simple way to have it use local models at all without changing the code. It also required some sleuthing to find where exactly the models are downloaded. If a code uses Hugging Face, these environment variables can be useful.

Making a module that is both easy/practical to use for users without also losing options is usually bit tricky: we want users to be able to do anything, for “the right thing” to happen automatically, and not build some opaque framework to make it happen. Singularity-wrapper fortunately helps quite a bit in doing lot of background stuff such as binding directories, gpu flags, etc. cleanly without users having to care about it, while still giving the option to run the container straight through Apptainer/Singularity if finer control is necessary.

Testing if the containers work is somewhat annoying. Diarization in particular saves a lot of cache files all over the place, which all need to be purged when testing GPU running. Otherwise the GPU will stay idle since everything it would do is already in cache. This also affects clean-up after users run the code.

A minor inconvenience for us (but possibly large for users) is that the syntax for each Whisper CLI implementation tends to differ slightly. This makes swapping between implementations slightly annoying since you have to check every time what was the syntax for flags.

Read more ...

SciComp Kickstart - 2023 plans and yearly strategy

2023 Apr 26

It’s time for our “kickstart course” - let’s talk about what that is, why, and why you might want to attend.

The full name is “Introduction to scientific computing and HPC” (high-performance computing), and it used to be called “HPC Kickstart” and was taught without the first day, thus the short name “kickstart” we still use. Some years day 1 had a different name, but was still taught together with days 2-3 as a package.

Our goal isn’t just to teach some skills, but to form a community around scientific computing - with researchers who have a common language to work together and help each other, supported by Aalto Scientific Computing in the background.

Course page in 2023.

Day 1 is not about high-performance computing things, but the basic skills needed to do scientific computing: things like Linux usage, data management, the types of tools available for different problems. For almost anyone doing any kind of programming/scientific computing kind of work, regardless of background. These kind of skills aren’t taught in academic degree programs. We teach these on day 1 because otherwise, new researchers have to learn from each other or re-invent.

Days 2 and 3 are about high-performance computing, more precisely basic cluster usage (with a focus of the basics). This is focused on the kinds of tools our community usually uses.

The topics are refined after many years of both teaching and support of junior researchers. Because of the way academic careers work (much diversity of paths), these topics (even day 1) aren’t just for new researchers but everyone can find something to learn or brush up on.

For the past years, we have been trying to keep up this yearly summer schedule. This usually happens the first full workweek:

Monday: HR introductions, other formalities for new summer workers - many departments seem to something like this. This may happen early than Monday of the kickstart week, since sometimes that comes too late.

Tuesday afternoon: Kickstart course day 1, the general scientific computing introduction. Applicable to everyone doing scientific computing.

Wednesday-Thursday afternoons: The HPC cluster usage part, which fewer people will attend compared to Tuesday.

Friday: we don’t have scheduled programs on Fridays, but sometimes there are communities who host advanced tutorials here about what their local users need. In 2023, there is at least an advanced GPU course then.

We are aware that there is a scheduling conflict with the CS summer day which is scheduled on the Tuesday of the 2023 HPC kickstart course. We did contact every department in January/February, yet this was still a surprise to us. In past years, we have adjusted our schedule to similar events, but this is not possible this year despite our best efforts.

We will still try to support researchers as much as possible. Recordings of previous years are available on youtube, and we also release videos the same evening as the course precisely to support everyone regardless of these conflicts. Researchers can still join us for day 2 and 3 even if you did not join day 1. However, please pay particular care to the instructions about setting up the Triton connection in advance.

We hope that this blog post can explain our goals to a larger audience so that we can reach even more people in the future, so that we can expand to onboarding even more young researchers even more systematically. You can reach us at scip@aalto.fi, and each spring we reach out to the main departments to schedule each summer’s course.

Read more ...

ASC development day, 2023 March

2023 Mar 07

We recently had an internal “development day”, which is a our new name for getting together to talk about longer term plans. This is our second “development day”. Overall, it went well, and we think that we are on an overall good path. There are three particular focus areas for the future:

Teaching: This was also a focus last time, and probably will still be in the future. We are overall happy with our decision last time to focus less on many small/medium courses, and instead focus on large, collaborative courses and then focused, individualized support for advanced use cases. Smaller courses happen mainly when we see specific needs that can’t be filled other ways (or we make them large, open, collaborative courses if there is a broad need).

Triton v3: The software/OS/management side of our cluster will be almost completely reworked in the next year (we aren’t getting rid of any hardware just for this). This will take a fair amount of our time, but is needed because existing systems are starting to show their age.

LUMI usage: LUMI is a flagship project of EuroHPC and provides huge resources available to the same people that can use Triton. Triton is still needed for ease of use of everyday projects, but we should actively look for people who can benefit from it and help them port to there. Our recent evaluations lead to the conclusion that our porting help is still needed there.

Teaching has long been one of the pillars of ASC’s support. It’s still needed, but the focus seems to be changing. No longer is a room with 10-20 (or ever 50) people considered a lot. People seem both more able and willing to find advanced material themselves, and more in need of basic principles (git, Python for SciComp, etc). Perhaps this is also partly caused by the remote work period emphasizing how all this material is available online anyway. Our basic philosophy:

Focus on large courses for new researchers, for example using the CodeRefinery MOOC strategy. This reaches the most people, helps the beginners the most, produces high-quality open source material for asynchronous reference, and has good possibilities for co-teaching. Example include CodeRefinery, our SciComp/HPC kickstart course, and Python for Scientific Computing.

Advanced, one-on-one, or small-group support via SciComp garage and the Research Software Engineering service. This isn’t just for projects, but is also a useful service for people learning from other advanced material in their work - basically, we work as mentors. One-on-one support is both more rewarding for us and probably more useful to the user (relative to time demands on both ends). Anyway, advanced courses often aren’t offered right when people need them, so we are left in this position anyway.

What about small/medium-sized courses, and advanced courses?

The first two points above squeeze out medium-sized courses for the most part, in our opinion. By the time our audience is an intermediate or advanced level, they seem to be able to figure things out themselves + ask for help when needed - if they can figure out what they need to do. This point deserves further study, though. Instead, we point to other existing material.

We will make sure that we have good recommendations for advanced self-study courses and generally chart out the resources so that our users don’t have to. This is mostly done by our Hands-on Scientific Computing course.

In the past, we have supported community members to give courses on topics of which they are experts. Continue this as appropriate (see the next point).

Continue the possibility of on-demand courses taught by us if someone requests them, and other smaller courses if we see a strong need. Contact us!

Triton is our HPC cluster, and is notable for being a Ship of Theseus: it’s continually upgraded while being the same cluster. This has resulted in the software running it getting a bit out of date. This software was originally developed as broader partnerships, and as these partnerships have changed, we need to take more responsibility for it ourselves.

Users shouldn’t see any major change from this, though part of it is improving our (user) software installation tools, which should make increased responsiveness to software installation requests.

As said above, Lumi is a significant resource, yet our users have not come to us asking for our help in using it. Over the past six months, we have found some Triton users who would benefit from it and helped extend their workflows to work on LUMI. We do this by first testing some applications ourselves, then looking at Triton usage for large users and reaching out directly.

Currently our focus is on GPU-intensive applications, which is made more interesting because LUMI has AMD GPUs. We’ve gotten local AMD GPUs for our own testing and in general are well prepared to support this.

While LUMI is a HPC system and has a typical HPC system interface, it serves so many different users that the software stack is very limited, so that most users need to install their own software and figure out how to run it on AMD GPUs. This is why we recommend most users access LUMI through us (we’re paid to save you time, after all), though of course anyone interested can use it directly.

Read more ...

Aalto SciComp stickers and patches

2023 Feb 20

We have stickers (and patches!) to support Aalto Scientific Computing. (You can get them from our IT offices in CS, NBE, and Physics) But why invest in this? Well, it’s fun, but there should be a deeper reason.

While our main goal is to maintain Aalto University Triton HPC cluster, provide courses and direct support to researchers, we cannot scale to solve all problems and make the best decisions without a community: you! Thus, our new promotional material is designed so that the members of our community can show their support for scientific computing at Aalto University. We hope that by providing a way for the community to show this interest, people can find - and support - each other better.

We have the typical hexagonal stickers, which you can use on all the typical sticker things.

We also have patches, for those who are interested - in Finland they are a big thing on [student overalls](https://en.wikipedia.org/wiki/Student_boilersuit), but you could also sew them on your backpack or purse. Please send us pictures to inspire us all! (some have Velcro backing for that kind) of attachment, ask us for that style.

You may notice that for the patches some have a black background and some have a white background. Black-background means “Ask me anything about the tools of scientific computing, I am happy to help or at least point you the right direction (as much as I can)!”

Here’s our idea:

Anyone may take the white background ones

Black background is for:

Aalto Scientific Computing team staff

Volunteers at our events (for example helpers at our workshops)

Anyone who is interested in using their time to help others in scientific computing (regardless of their skills)

(clever people will notice that the first two are included in the third, and actually anyone can be the third if they want).

The idea is that we, and our community, can’t work alone. Everyone needs to support each other in order to work at the level we want. The in-group experts are an undervalued resource in this, often not getting the credit or recognition they deserve in supporting everyone. This is our small method of recognizing those supporters, and we hope that in the future we support them ever more - both career-wise and supporting them in supporting others.

Yes, we should have gotten black-background stickers. We’ll do that next time…

Read more ...

Posted in 2023

ASC development day, 2023 August

libwebp security vulnerability and computational scientists

Aalto public servers requiring passwords with SSH keys

Preparing for new Triton

The Aalto RSE hiring process

Whisper deployed on Triton, LLMs coming

SciComp Kickstart - 2023 plans and yearly strategy

ASC development day, 2023 March

Aalto SciComp stickers and patches

Aalto Scientific Computing Blog

Recent Posts

Categories

Archives