Aalto Scientific Computing Blog

Triton v3 is now default

2024-05-06T00:00:00+00:00

Triton has a major update. You can read our previous info about this at Preparing for new Triton, and our “what has changed” in Triton issue #1593.

You might get SSH host key warnings.

What is Triton v3

It has the same name, and importantly the same user accounts and data, but all the software and operating system is changed. In particular:

All software modules are different
Any software which has been complied will need to be re-compiled.

Why, and why now?

Triton’s previous operating system was released in 2014. Security support runs out at the end of 2024 May, and it has to be updated. Stability is good for research, so we try to reduce the number of changes (compare)

We realize that a change is very disruptive and painful, especially since the expectation is that Triton never changes. But an old operating system makes problem for users too, and they have gotten more and more over the years.

What to do

Most of the transition for different types of software is described in Triton issue #1593.

Triton v3 SSH host key warnings

2024-05-06T00:00:00+00:00

When updating Triton, many users will get a message like this (or similar things if you use other SSH clients like PuTTY):

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.

What it means

SSH (Secure SHell) is made to be secure, and that means one it verifies the server you are connecting to via its ssh host key. The representation of this key is the fingerprint, like SHA256:OqCehC2lbHdl8mYGI/G9vlxTwew3H3KrvxKDkwIQy9Y. This means that the NSA or someone can’t intercept the connecting and get your password by pretending to be Triton. This is a good thing.

OpenSSH (the command line program on Linux, MacOS, Windows) saves these connection IDs (fingerprints) in $HOME/.ssh/known_hosts. Other programs may store the keys somewhere else.

What you should do when you see the warning

The warning looks scary but the first thing to ask is “should the server I am connecting to have changed?”. If you have been directed to this blog post, then probably yes, it has. You should always think if the fingerprint should change, and if there is no reason for them to have changed, contact your administrators. You can usually verify the keys online, for example Triton ssh key fingerprints.

If you are on command line OpenSSH (Linux), it will propose a command that will remove the old host key:

$ ssh-keygen -R "triton.aalto.fi"

For other programs, follow whatever prompts it might give to replace the host key fingerprint.

What are the current SSH keys?

When you get a “The authenticity of host ‘triton.aalto.fi’ can’t be established”, verify the SSH key fingerprints that are presented, then click “yes” to permanently save them (until they change next, they can always be updated). The fingerprints for Triton v3 are:

SHA256:3u8iICwjmvJ/+9YGxqqK+3r7FmrDflcgpoGl5ygtAWw login4.triton.aalto.fi (RSA)
SHA256:OqCehC2lbHdl8mYGI/G9vlxTwew3H3KrvxKDkwIQy9Y login4.triton.aalto.fi (ECDSA)
SHA256:ibL4dBsdrwRjbJCBWL1J5p/Sg4PGHWxTG6HF65yPcps login4.triton.aalto.fi (ED25519)

Research Software Engineer project funding: what’s been working

2024-02-21T00:00:00+00:00

The “Research Software Engineer” service provides technical collaborators for researchers to complement their scientific knowledge. Read about Aalto RSE here. The idea of this service was that it would be available to everyone, but some projects who made extensive use would fund it themselves.

If you are a group leader reading this, Aalto RSE can help you release research code, debug it, make it reusable, rescue old code from former members and make it usable again, make it run on our cluster or CSC’s clusters, manage data, prepare data for easy use, and so on. If it’s not long (less than a month), our work is free, if it’s more than a month, the below applies.

Early project funding

When we started, we hoped for around 50% project funding. The idea was that a lot of the funding for the service would come from the research projects themselves. This hasn’t really worked out so well, because a) we accomplish the vast majority of our projects quickly, in less than a few weeks, and b) finance would understandably not like to deal with small transactions for small amounts of time.

What actually happened was that we basically have received only a small amount of the project funding we would have wanted. On the other hand, this also means we have supported a far wider variety of projects than we would have otherwise. It also means we are better accomplishing our other goal: tactical support right where and when it’s needed most, with the least amount of administrative overhead. This actually better matches our mission of helping the researchers who need us most.

Future funding prospects

For any long projects (more than a month or so), we still follow do our original plan: we can receive funding from grants (or basic funding) to do long-term projects. This is usually 40-80% of a RSE’s time, spread out over more than a month (and it can also be bursty: lots of work at some times, waiting for the next task at other times). We have done this for projects, and we know we can do it in the future.

But there’s another thing that has worked well: retainer-type funding instead of project-based funding. You have extra funding that needs to be used? You know your group needs support but you can’t name a single specific project to use all the time? Hire RSEs on long-term retainers and we’re there for you as needed. You will always get priority for all the quick questions you have (in Scicomp garage or otherwise), and you get the highest priority for your medium projects, we can attend your other group meetings, and so on. As your team wants, we’ll make high-impact improvements here and there. This could be (for example) anywhere from 10-40% time over a long period.

We have worked out how to do both one-off projects and retainers with Finance. As for as external funders are concerned, our staff count as researchers, so we can use any funding you might have.

If you think you have a project or want us on retainer, let us know: Research Software Engineers or Starting a project with us.

Why are our projects so short?

This is a valid questions. Compared to many RSE groups, we seem to be focusing on many small questions for a broad audience that knows a lot about the problems they need to solve. Thus, we can come in to something already set up well, provide help, and mostly back off and be available for maintenance long-term. The units that fund us (schools, departments) have been happy with this, so we’ve kept it up. On the other hand, we are pretty fast. There have been projects where a summer worker was going to be hired, that we could end up doing (learning the framework + all the main tasks) in two weeks. The way we work together as a team also makes things quite fast. Thus, a project has to be quite deep in order to exceed a month of work.

Kickstart 2023 wrap-up and thoughts for the future

2024-02-15T00:00:00+00:00

Our kickstart course came and went with very few problems. This post summarizes our general thoughts on the course and its format.

If you want to join the course next year (as an attendee, or as an organization who will send your learners to us (and maybe co-teach) follow us on Mastodon. This is the third year we’ve done the livestream format, and it’s not likely to stop anytime soon.

This was originally written in June 2023 but publication was forgotten until 2024.

History of the course

The course has run since around 2015 or so. Until mid 2020, it was always in-person only. Until (and including) 2022, it ran twice a year, January and June, but now it runs only in June (increased availability of videos + the material compensates). It runs in June so that it aligns with new summer research interns starting. Until around 2020, it was mostly about using the HPC cluster at Aalto University, but since then there has been more emphasis on day 1 covering generic skills needed for scientific computing and the big picture of things.

General feedback

Our general feedback remains quite positive. Our streaming + coteaching + collaborative notes format is still well received, and there seems to be little reason to go back for courses of smaller scale. Instead of just lectures, written material (tutorials in info on scicomp.aalto.fi) + livestream + videos is a good combination.

Not enough time

There is never enough time - not much else to say. Each year there is a different trade-off between how much we cover and how brief we are. (There are always people who say we should go more in-depth, and some who say we go too much in-depth. Such is life.)

Reduce repetition

Repetition is good, but not when it’s a sign that we can’t stop talking and keep saying the same thing over and over. The best lessons seemed to be the ones that were taught most quickly, since it has a high density of new information. We should strive to make more lessons faster, and leave details to the reading.

Integrated support and teaching

Because the teachers also do support, for anything difficult, we can easily tell learners: “Do what you can, come by our SciComp garage to ask for help with anything else. This overall reduces the demands from teaching: a person doesn’t have to know everything, but know enough to get started and to know when they may need more help for more advanced tools. This really is good for both of us.

Linux shell and other prerequisites

As usual, we expected our learners to read our shell crash course in advance. We also had a new tutorial on using the cluster from the shell. This helped some, but it was still a problem.

Reflection: this will always be a problem in any course that has a wide enough audience. We should accept and provide positive support for those not ready, and not try to exclude them. It’s OK to see a course and then strive to get the prerequisites later.

Should the course be divided into two?

Internally, we had this thought of dividing the course in two: a basic part at the start of the summer, and an advanced part at the end of the summer - since brand new researchers may have trouble understanding everything. On the other hand, the fact we have videos means that people can come back and review the material when they are ready. So in some sense, learners can divide the course however they would like by stopping when they think it’s no longer necessary and coming back. This could be mentioned more explicitly in our introductions.

Attendance

Attendance goes down day-by-day. This is definitely OK - it doesn’t hurt anyone. It’s expected that day 1 was suitable for the most people (even those not doing HPC work), and then the course topics got continually more specific as we went further and further.

As mentioned above, this is even be expected and encouraged - better to have someone attend day 1, than not.

Exercise level

Our exercises are quite basic overall, but we got few complains about this. Basic exercises are better than something too advanced or realistic, that requires many things to come together.

This year, we tried to have a complete solution for every exercise (script and/or commands), even if it’s directly said above in the lesson. This seemed to be good, since for people very short of time, they still have some chance to copy and paste and do the exercises. For those passively following, they can at least see what would have been done.

Other feedback from the notes

Day 3 / end of course feedback positive feedback (o is the way a person votes for/agrees with that option:

it’s great that the material is so easily accessible also after the course to go through things in my own pace again oo
Really good format with the streaming and the shared document for questions. ooooo
The cat kept me focused in the lecture
Live interaction with the instructes were very helpful and exercises were nice
I really appreaciate the instructors took the time to explain the jargons, instead of just letting them fly around. o
The fact that the instructors were really nice contributed to the good course experience. Thanks for that! o
(day 1) After studying remotely for 1,5 year and having lots of online classes, I highly appreciate the amazing audio quality here. Many thanks for that!
(day 1) The framework is better than any other workshop I’ve ever attended - in terms of interaction and audio quality. HackMD is great.
(day 1) The (twitch) vertical screen thing is genius and should be used in way more (online) lectures o

Most common negative feedback: not enough time! In fact, that’s almost only thing to improve. Except we can’t, so I think we win pretty well. And videos/material allows follow-up.

ASC development day, 2023 August

2023-10-30T00:00:00+00:00

We had another development day (previous: ASC development day, 2023 March). It went mostly like the last one, and we have less important news for the world, but below is the summary anyway.

Stats

We have about 1550 people with accounts, with 202 new account requests in the last six months.
Most routine issues tend to be about software installation, which is good (this is the actually hard part, it’s good people ask us).
We are still on track for about 500 garage visits per year. We don’t try too hard to keep track of them all, we might get about 75% of them.
The number of interactive and Jupyter users are increasing, while Open OnDemand is decreasing. This is the wrong direction from what we’d like. We will open OOD to connections from all of Finland to make this easier.

Triton v3

Triton v3 is still on the way. This isn’t a new cluster, but a new operating system which individual nodes will be migrated to slowly (while maintaining the same accounts and data). Most of this happens in the background, but the change of base operating system images will require most code to be recompiled, which will require attention from many users. The transition can be made slowly, both old and new OSs will run for a time being. There won’t be a change in total amount of computing power.

An upcoming blog post will discuss this more, and the effects on users. Now is the time to start preparing. We still expect the transition to happen sometime in the autumn.

We are thinking to merge home and scratch directories, to make a common quota for both. This would improve usability by reducing the frequency of home quota affecting usage. We’d welcome any other usability suggestions.

Practically, we are using the chance to automate things even more, which should make it easier to manage in the future.

Teaching

Teaching has gone well. For this academic year, we’d like to add back in a few smaller, special-purpose courses (not just to teach them, but also to get good quality video recordings for the future).

Goals:

Developing and delivering the “workflows” course with CodeRefinery
Short courses to record (e.g. rerun of debug series, once a week, record and publish).
Update Debugging linking the different debugging course repositories.

LUMI

LUMI is the new EU cluster with plentyful GPU resources. A user can essentially get as many GPU resources as they need with no waiting, but since the GPUs are AMD, there is some initial barrier. Our general feeling remains: “we won’t recommend our users directly go and use LUMI, but we recommend they talk with us first and we help them use it”.

Next steps:

Continue encouraging users to contact us.
RSEs will ask the top GPU user each week if they would like support with taking LUMI into use. We’ll go and do all the setup for them.
Slide on infoscreens around the buildings?

libwebp security vulnerability and computational scientists

2023-09-28T00:00:00+00:00

Recently, a major security vulnerability (CVE-2023-5129) has been found in libwebp, an image decoding library for the .webp format. This is major, since this library is embedded in many apps and web browsers and allows remote code execution just by opening a file. For computational scientists, there is still some impact - and it’s harder to compensate for. In short, just by processing an image in the .webp format, someone can take over your computer or session.

libwebp is the current issue, but the problem is general: computational scientists often create software environments and use them for a long time. These environments aren’t usually browsing the web (the most likely attack vector here), but they do involve lots of code installed from different projects. How does one manage security in this case?

This post may be updated

How it affects scientists

If you use web browsers or apps on your own desktops, laptops, phones, etc. - make sure update them!

If you don’t use images in your research, there probably isn’t much impact.

If you do, this is what could happen:

You make a Python / Anaconda environment which uses libwebp somehow - directly installed through Conda, or some other application.
You download a dataset containing images. You process them as part of your research with the old environment.
The malicious image runs an exploit. It has access to your whole user account on that computer: extract any data, add SSH keys for remote access, corrupt/delete data (which might not be backed up from the cluster…).

Many things have to happen here, but it’s very possible for it to happen. You could lose access to non-backed up data or code or other confidential or sensitive data could be compromised, since code from one project from your user account has access to all projects from your account.

One would normally fix things by updating software. But when you are dealing with a research environment that can’t easily be updated, what should you do? This is the real question here.

What to do

It’s a multi-layered problem, and the answer will depend on your work. libwebp is what we are thinking about now, but the problem is general: there are other security problems that occasionally come up that can affect more scientific code. How do you prepare for next time?

Update your environments (conda, virtualenv, etc). You could try to see if libwebp is inside of them (conda list | grep webp), but especially for Pip packages it might not be apparent.
Make your environments reproducible: If you define your dependencies in requirements.txt (Python), environment.yml (conda), or whatever is suitable for your language, you can easily re-generate environments to bring everything up to date. (delete old one, re-create).
If you pin versions of dependencies (like numpy==1.20.0), it’s possible it can pull in older versions of other dependencies.
Containerize your workflows. If code runs inside of a container, it keeps it isolated from the rest of the operating system and user account. (but containers aren’t usually designed for strict security, but it’s better than nothing).
If you use pre-built modules on the cluster, try not to use old versions. We’ll update some recent modules, but we can’t update all of the old ones. At least webp is in the default anaconda modules.
If you write or maintain software in general, keep it up to date as much as reasonable! Don’t make others get into a place where they are having to use old versions of libraries to make it work.
In general, think about your dependencies. Be at least a little bit suspicious before you install random other software, that may possibly pull in lots of other dependencies. Of course, as a researcher, you may not have much choice.

Updating your environments

These commands seem to be able to update an environment to a newer libwebp. It seems to work on newer environments, but we don’t know for sure. Instead of mamba, conda in theory works but is to slow it may not be practical:

$ mamba env export > environment.,yml
$ perl -i -pe 's/(libwebp(-base)?)=.*$/\1=1.3.2/g' environment.yml
$ mamba env update -f environment.yml

Summary

There is a major security vulnerability in libwebp. While the impact on computational scientists may not be that much, a bigger issue is the difficulty of keeping all of the environments up to date so that next time this happens, it’s easier to respond.

We hope to have more security recommendations for computational scientists in the future. If anyone is interested in collaborating on this, let us know.

Aside: What’s affected?

Common apps which embed Chrome or libwebp: Chrome, Firefox, VSCode, Zulip, Slack, Discord… things that use Electron to embed a web browser are affected, and that’s many things.

Aalto public servers requiring passwords with SSH keys

2023-09-27T00:00:00+00:00

From 2023-09-25, publicly accessible Aalto server login is changing and will now require a password in addition to SSH keys. This will have a significant usability impact on some users. This post is made as a landing page for users who need immediate, practical help and for whom the aalto.fi page isn’t findable or detailed enough. The official contact is the IT Services service desk

The reference page SSH has been updated to include detailed reference information for every common operating system and SSH client. Secure Shell is one of the standard methods of connecting to remote servers and it is important that users of all skill levels are able to use it securely.

This change is not from Science-IT, but since it will affect many of our users but is not being publicized or supported very much, we are preemptively doing some major user support.

What’s happening

What is not happening is: requiring locally encrypted SSH keys (although this is highly recommended).

What is happening: When you connect to an SSH server from outside Aalto networks, you will need to have an SSH key set up and send your Aalto password to the remote server interactively.

What to do

If you already have an SSH key set up, you’ll start to be asked to enter a password every time you connect.

You can always connect to the Aalto VPN in advance to prevent this, but there may be cases where this isn’t a practical solution.

If you do not have an SSH key set up, you should:

Follow SSH to generate an SSH key - we have heavily revised this page to cover almost every common SSH arrangement.
Place your SSH key on any common Aalto server (kosh, etc. - not Triton since that doesn’t share home directories with the public servers)
- You could connect by VPN, and then use normal password to connect and add the key.
- You could use https://vdi.aalto.fi with a Linux computer to place the key.
- You could place the key while on an Aalto network (as usual, this means eduroam or aalto only from an Aalto computer).
- You could use another computer that’s already set up with an SSH key to place the key.
The key will then be available on all common Aalto shell servers (and other workstations), since they share the home directory.
Re-read SSH, in particular the SSH key agent, ProxyJump and Multiplexing sections, to see how to configure your SSH to minimize the number of times you need to enter passwords.

Motivations

This was needed for security as evidenced by recent history. Password-only login is simply not feasible anymore (nor for some time). Removing passwords as an option is good security practice that most organizations should adopt these days.

But why an ssh key and remote password instead of a properly encrypted SSH key? An SSH key requires something you have (the key) and something you know (the password), doesn’t it? And doesn’t require sending a plaintext password to the remote server. This was decided by whoever is setting this up, probably partly due to the fact that it is not possible to enforce passwords on SSH keys via the server config.

In general (outside of Aalto), you should use SSH keys everywhere and be wary of ever sending plaintext passwords to remote servers (even in conjunction with an SSH key). Security is important, and by using SSH keys with local encryption of the key you are doing your part.

This is affecting important workflows

We apologize for the difficulty in getting work done and want to help you as much as possible (though Science-IT was not the ones that designed this or communicated it).

There are, unfortunately, some trivial workarounds that involve putting your password in plain text on your computer to script things. However, please note that writing passwords down (outside of password managers) is bad security practise and against the Aalto password guidelines. It is better to contact us to help design a better and more secure workflow, or ask IT Services and ask them to consider other use cases.

Preparing for new Triton

2023-09-12T00:00:00+00:00

Sometime in autumn of 2023 (e.g. October/November), we will do a major update of Triton: updating the basic operating system, and thus almost everything else. There are big benefits to this: newer basic operating system software, but also such a basic update affects almost every user. For a short time, this will make a lot of work for almost every user. This post gives advance warning and a chance of feedback of how to make the update most usable.

This post is just advance warning and things to prepare already. All actual instructions will come later.

What will happen

We will update the basic operating system from CentOS 7 to something else (Red Hat 9). We’ve ordered all new management hardware to make the backend more reliable and manageable. Along with this comes with an update of the software build system, which should allow us to deploy software to our users even better. We’ll also update our configuration management system for more reproducibility.

We also hope to think about the usability of the new system: remove a lot of old options and add in new, simpler ways of doing what people need.

All data and storage will remain the same, so there is no big data migration needed.

The old and new clusters will be accessible at the same time (two different login nodes), with the same filesystems mounted (same data available) and some compute resources still available there, so that people can slowly migrate. But the old one won’t stay running too long, to avoid long maintenance effort or splitting of the resources.

Reproduciblity

The biggest problem with big cluster updates like this is reproducibility: does you work from a month ago still work in one month? If not, this is a big problem. It’s even worse if there is a much longer gap before you come back to it (paper revisions, anyone?).

You could say there are two things that can go wrong with a cluster upgrade or change:

Specific software/code that needs to be compiled and installed: Software needs re-compiling for new clusters or new cluster OS updates.
Whole workflows: you need to make all the pieces work together. Different paths and workflow managers may need updating.

What you can do:

Manage any messes you have earlier rather than later. It’s better if you slowly clean up over time, so you can focus on the differences once the change happens.
Know what software you are using. It’s easier for us to re-install something we have already installed when someone can tell us the exact name and version that they are using.
Tests for your software. Some way to validate that it works correctly.
Contact Aalto RSE for hands-on help supporting the transition. Come to the garage early and often.

Feedback and future usability

If there are any annoyances about Triton that you’d like us to consider for the upgrade, now is the time to let us know so we can plan them. We especially value feedback on usability problems.

Discuss with us in our chat, or open a Triton issue.

This post has been updated with minor corrections, changes be found in git history.

The Aalto RSE hiring process

2023-08-21T00:00:00+00:00

This post describes the hiring process of Aalto RSE. The goal is to make hiring more equitable by providing the background information so that everyone can apply successfully. For those not applying to us, it might still provide some valuable insight about how to market your skills as a PhD making a sideways career move. What’s said here may not apply to every organization, but it might give you some things to think about.

Disclaimer: This page is a rough average description of the past, not a promise to always do this in the future.

Background

Aalto RSE has usually hired people who have postdoc experience and will transition to a more applied software/data/computing oriented role (as opposed to being focused on writing papers). For many people, we are the first experience of job applications post-degree and thus people have to learn how to present their skills in a new, non-academic context.

One should start by reading about us - we have lots of information publicly available about what we do and how we think. This should be understood in order to do the next steps well.

The cover letter

The cover letter is the most important thing we read, and the first and most important filter. It’s read before the CV.

At the level we are at, almost everyone’s CV and achievements are effectively equivalent. Does it matter who got the most fancy papers? Who has the most awards? The classes people took? When most of a person’s knowledge has come from self-study, probably not. The cover letter is the chance to interpret your skills in the context of the job you are applying for.

When reading the cover letter, the first question we ask is “does this person know what they are applying to and know why they think they are a good fit?” (It’s always interesting to get letters which clearly don’t understand the job, but on the other hand it’s an easy filter.) The first paragraph should answer this question and that the rest of the letter will go into detail about why. Start with the most important information, don’t make it hard for us.

Beyond that, talk about interests and skills as relevant to the organization. Discuss special projects, including non-academic ones or random things that you are interested in (this is especially true for us, since we are the transition from academia to practical work). Our job advertisement gives you some specific ideas that you can talk about. Anything specifically important to the job should be pointed out here and not just left in the CV.

If you don’t exactly fit the stated job requirements: here is the chance to explain it. The job requirement has to say roughly what we need (to not waste people’s time when applying, and because our hiring decisions must be justifiable based on the requirements), but there are many cases where someone with a different experience can accomplish our actual goal (as said in the job ad or found in your background research). A person that can say this, that they are adaptable, and will have a very good chance.

We have adopted some system of anonymous recruiting. We request that cover letters are submitted without identifying information (name, signature, etc) so that one person gives them numbers, and a broader group tries to take a non-biased look at them. After this initial impression, we bring in the rest of the application. Don’t make assumptions about what the reader will know about your background, just say it.

The letter should be as short as possible to get the information across. One page is usually about the shortest we get, and a bit less than two pages is typical. But if it’s engaging, we’ll read as much as you write. Remember, most important information first, don’t make us hunt for things.

Update 2024: Do you want to use AI to write your cover letter? Please think again. Since LLMs became a thing, cover letters have become harder to read, longer, and more generic-sounding. It’s better to write in your own voice and be shorter than rely on what AI gives you.

The rest of the job application

The CV serves as non-anonymous reference information, but they are hard to read and all look pretty similar. To be honest, we don’t worry that much about the format and contents here: get us basic factual information in the most efficient way. For our particular jobs, non-academic skills such as software/data tools are more important than scientific articles, etc. Remember, we are busy and have plenty of applications, make it easy to read.

Open Science isn’t just good for research, it’s good for you, too. If you can point to public repositories of work you have done, this is very useful. Things like Gitlab/Github profiles with activity and your own projects, links to data you have released, etc. They don’t have to be perfect - something is better than nothing. The best case would be a few projects which are well-done (and you know it and point them out to us), and plenty more stuff that may be of lower quality to show you can get simple stuff done simply. Not everyone is fortunate to have a field where they can practice open science throughout their career, but even publishing a project or two before they apply for a job with us is very useful.

Despite what the previous section said, we do try to dig through applications that seem on-topic but don’t say everything we are looking for, to give them the most fair shot we can.

The filtering process

We always need to heavily filter the list down. Some relevant filtering includes:

Do they know what job they are applying for? Can they connect their skills to the job?
Have they touched on the main points in our job advertisement and the linked “Become a RSE” page?
Are they interested in teaching, mentoring, and real collaborative projects? Do they know what kind of teaching and mentoring we do?
Is there enough knowledge about the research process?
Any relevant skills about this call’s particular topic (if there is any)?
How do their skills and experience match what our team is currently missing, regardless of the open call?
How similar has their previous work been to “research engineering” (helping the research process) instead of only focusing on academic promotion?

The recruitment team makes several passes over and we discuss how to filter down. We try to get a good variety of candidates.

Interviews

Sometimes, there is some initial recorded “video interviews”, which provide some initial familiarity in both directions before the actual interviews. We know these are non-interactive and a recording isn’t a conversation so this is harder than an interview, but we consider that when watching them. One shouldn’t worry too much about these, if we do them.

Our actual interviews are not designed to be stressful. We have some prepared questions and go through them in a friendly manner. You have a chance to ask questions to use at the beginning and end (and any other time too). The questions are designed to hear about your experiences and not trick or test you.

We don’t currently ask technical challenge questions. The number of things which you’d need to know is so broad, it’s more important that you can learn things quickly. Since we usually interview relatively advanced people, we can instead look at existing projects they have done and check references, without having to do a technical challenge. This may change depending on the type of candidates we are interviewing, but just like the main interviews we are more interested in how people think, rather than raw knowledge.

In the future, there might be more “meet the team” kind of events.

We want to respond to people as soon as possible, but there’s a simple fact: we don’t want to tell anyone “no” until we are very sure we have an acceptance (we don’t want to tell someone “no” and then hire them later), and we have very many qualified candidates. So there is often an unfortunately long delay in hearing back. We hope that everyone knows within a month, though (and ideally ~2 weeks if all goes well).

If you don’t make it

We get a relatively large number of applications, with a lot of good people. So far (before 2023), we have been hiring at a relatively high level - researchers with postdoc experience who have been some sort of RSE-like experience with helping others with research (beyond only focusing on making papers for themselves) and technology. Don’t let this discourage you. There are many qualified applications, so if you don’t get selected, that doesn’t mean that you were unqualified. We look at everyone, regardless of their level, for every position. The fit to our particular job is more important that anything else, so keep trying until you get the right fit - it’s just a numbers game.

Old job application text

For reference, this is an older job application text, so that you can see how the things above are integrated. (to be updated with the 2023 version soon)

RSE job advertisement, 2022

[ standard header removed ]

Aalto Scientific Computing is looking for a

Research Software Engineer/Supporter

To a permanent, full-time position.

Are you more of a programmer than your researcher colleagues? Are you more of a researcher than commercial developers? Do you fit in both, but have a home in neither? Be a Research Software Engineer with us and find your home. If you are looking for a career path which combines the interesting parts of both fields, this is a good choice.

Aalto Scientific Computing is an elite “special forces” unit of Research IT, providing high-performance computing hardware, management, research support, teaching, and training. Our team consists of a core of PhD staff working with top researchers throughout the university. Our services are used by every school at Aalto University and known throughout Finland and the Nordics. All our work is open-source by default and we take an active part in worldwide projects.

In this position, you will:

Provide software development and consulting as a service, depending on demand from research groups.
Provide one-on-one research support from a software, programming, Linux, data, and infrastructure perspective: short-term projects helping researchers with specific tasks, so that the researchers gain competence to work independently.
As needed and depending on interest, teaching and other research infrastructure support.
Continually learn new skills as part of our team.

Primary qualifications: There are two main tracks, and candidates of diverse backgrounds are encouraged to apply – every candidate will be evaluated according to their own unique experiences.

PhD degree with research experience in some computational field and much knowledge of practical computing strategies for research, or
Software developer or computational scientist with a strong software/open source/Linux background, scientific computing experience, and some experience in research. Masters degree or similar experience.

This particular call emphasizes the ability to work in machine learning and AI environments. The ideal candidate will be working closely with machine learning researchers, and thus a background in machine learning is highly desirable.

Important skills:

Ability to tackle any problem with a researcher’s mindset and a developer’s passion for technology.
Experience or knowledge of the principles of open source software, open science, and software development tools such as version control.
Please see https://scicomp.aalto.fi/rse/become-a-rse/ for more information on what kind of skills we value - or more precisely what you are likely to learn.

What we offer:

You will join the dynamic Aalto Scientific Computing team, where you will learn from some of the best research IT specialists in Finland.
Co-working within top-quality research groups, getting experience in a wide variety of fields and developing an extensive network of scientific contacts. This includes contacts to the Aalto startup scene and community.
A way to be close to the research process while focusing on interesting computational problems and not the publication process.
Our program will offer you a chance to improve your software skills – you are expected to engage in plenty of professional development.
Open Source is our expectation. All (or most) of your code may be open source and may be added to your public CV, depending on the needs of researchers.

Salary will be according to experience, for a recently graduated PhD similar to a postdoc salary. Work hours are flexible, but are expected to sync with the audience being served. Primary workplace is Otaniemi, Espoo (Helsinki region), Finland. Aalto University has a hybrid work policy which allows 60% remote work possibility, and our team takes good advantage of this flexibility.

To apply successfully:

Please include a separate cover letter (~1-2 pages). Please try to write your cover letter avoiding information like name, gender, nationality or other demographic information that is not directly related to why you would be the right person for this position (this includes, for example, a signature on the letter) unless you think it benefits you. This will assist in anonymous recruitment possibilities. The letter should include for example:
- Why being a Research Software Engineer is for you,
- past research experience, if any
- past technical teaching or mentoring experience,
- past software development experience (even informal self-learning),
- past Linux, command line, or scripting experience,
- highlight one (or a few) collaborative projects you have taken part in and your role within it, and
- what you bring and what you intend to learn.
A normal professional or academic CV including
- a list of your technical and programming tools and level of proficiency (e.g. basic/proficient/expert). This is the time to show the breadth of your experience.
- Github link or other public sample code. If not available, whatever is possible to demonstrate past programming experience. Please highlight one or two of your outstanding research software projects.

[ standard footer removed ]

Whisper deployed on Triton, LLMs coming

2023-08-08T00:00:00+00:00

Whisper now easily available for researchers

LLMs and other tools next

We hope to make other tools available like this.

Whisper is just one of the latest tools, but you’ve probably noticed that large language models are very popular these days. There are, in fact, some that can run locally on our own cluster, and our goal is to deploy more of these so that they can be easily tested and used. The intention isn’t to make a replacement for existing LLM services, but make internal for testing, research, and development use easier.

Local installs have various benefits, including lower cost (since we already own the hardware), being able to ensure reproducibility longer-term (since models are locally downloaded and preserved), and being able to use without various registrations. The downside is that the most popular ones ones aren’t available for local use.

The role of ASC

Contact us if you need other models deployed, or if you have trouble using what’s already out there. We are still in an early phase, and there will probably be some difficulties in availability, accessibility, and reusability. Contact us early if you notice anything that’s not right. We both help installing things and help using them as a research engineer partner.

It’s clear that artificial intelligence and machine learning tools will become more critical tools for other research. The difficulty in deploying and using them could become a barrier, and that is where Aalto Scientific Computing comes in. It’s our goal to make sure the infrastructure that researchers need is ready and able to be used by everyone, not just those with classic HPC experience.

Tech details: difficulties and solutions

Here we go over some implementation details, which may help others who want to deploy similar things on their own clusters. If you just want to use things, you don’t need to read on.

We installed whisper in a container, so that all dependencies are packaged together and things are portable. The model definitions themselves are not included in the container, but mounted in. We try to find options that allow one to specify the model and model directory, so that the user can try out different models without downloading each one. The Lmod module file prints out some help when loaded.

We’ve got two versions installed: normal Whisper, and Whisper-diarization (which can identify speakers in the transcript).

Whisper and diarization both have multiple different implementations. It’s bit of guesswork to try to see which one is the easiest to get running / works the best (not about quality of transcript, but easy of deployment in container and with local models). This led to a change to another implementation of diarization midway since the current one is more active in development and seems overall slightly better. A lot of the work was fortunately transferable to the new implementation.

There were the common issues with getting the right dependencies in a container and getting the GPUs to work there. This is pretty standard by now.

Most implementations of whisper want to download models when running it. This might make sense for general user, but doesn’t really make sense on cluster. Depending on the implementation, getting it to use local models is not always trivial. Since GPU execution of diarization uses several models at once, there doesn’t seem to be a simple way to have it use local models at all without changing the code. It also required some sleuthing to find where exactly the models are downloaded. If a code uses Hugging Face, these environment variables can be useful.

Making a module that is both easy/practical to use for users without also losing options is usually bit tricky: we want users to be able to do anything, for “the right thing” to happen automatically, and not build some opaque framework to make it happen. Singularity-wrapper fortunately helps quite a bit in doing lot of background stuff such as binding directories, gpu flags, etc. cleanly without users having to care about it, while still giving the option to run the container straight through Apptainer/Singularity if finer control is necessary.

Testing if the containers work is somewhat annoying. Diarization in particular saves a lot of cache files all over the place, which all need to be purged when testing GPU running. Otherwise the GPU will stay idle since everything it would do is already in cache. This also affects clean-up after users run the code.

A minor inconvenience for us (but possibly large for users) is that the syntax for each Whisper CLI implementation tends to differ slightly. This makes swapping between implementations slightly annoying since you have to check every time what was the syntax for flags.

SciComp Kickstart - 2023 plans and yearly strategy

2023-04-26T00:00:00+00:00

It’s time for our “kickstart course” - let’s talk about what that is, why, and why you might want to attend.

The full name is “Introduction to scientific computing and HPC” (high-performance computing), and it used to be called “HPC Kickstart” and was taught without the first day, thus the short name “kickstart” we still use. Some years day 1 had a different name, but was still taught together with days 2-3 as a package.

Our goal isn’t just to teach some skills, but to form a community around scientific computing - with researchers who have a common language to work together and help each other, supported by Aalto Scientific Computing in the background.

Course page in 2023.

Topics of SciComp Kickstart

Day 1 is not about high-performance computing things, but the basic skills needed to do scientific computing: things like Linux usage, data management, the types of tools available for different problems. For almost anyone doing any kind of programming/scientific computing kind of work, regardless of background. These kind of skills aren’t taught in academic degree programs. We teach these on day 1 because otherwise, new researchers have to learn from each other or re-invent.

Days 2 and 3 are about high-performance computing, more precisely basic cluster usage (with a focus of the basics). This is focused on the kinds of tools our community usually uses.

The topics are refined after many years of both teaching and support of junior researchers. Because of the way academic careers work (much diversity of paths), these topics (even day 1) aren’t just for new researchers but everyone can find something to learn or brush up on.

Yearly schedule

For the past years, we have been trying to keep up this yearly summer schedule. This usually happens the first full workweek:

Monday: HR introductions, other formalities for new summer workers - many departments seem to something like this. This may happen early than Monday of the kickstart week, since sometimes that comes too late.
Tuesday afternoon: Kickstart course day 1, the general scientific computing introduction. Applicable to everyone doing scientific computing.
Wednesday-Thursday afternoons: The HPC cluster usage part, which fewer people will attend compared to Tuesday.
Friday: we don’t have scheduled programs on Fridays, but sometimes there are communities who host advanced tutorials here about what their local users need. In 2023, there is at least an advanced GPU course then.

This year’s scheduling conflict

We are aware that there is a scheduling conflict with the CS summer day which is scheduled on the Tuesday of the 2023 HPC kickstart course. We did contact every department in January/February, yet this was still a surprise to us. In past years, we have adjusted our schedule to similar events, but this is not possible this year despite our best efforts.

We will still try to support researchers as much as possible. Recordings of previous years are available on youtube, and we also release videos the same evening as the course precisely to support everyone regardless of these conflicts. Researchers can still join us for day 2 and 3 even if you did not join day 1. However, please pay particular care to the instructions about setting up the Triton connection in advance.

Future

We hope that this blog post can explain our goals to a larger audience so that we can reach even more people in the future, so that we can expand to onboarding even more young researchers even more systematically. You can reach us at scip@aalto.fi, and each spring we reach out to the main departments to schedule each summer’s course.

ASC development day, 2023 March

2023-03-07T00:00:00+00:00

We recently had an internal “development day”, which is a our new name for getting together to talk about longer term plans. This is our second “development day”. Overall, it went well, and we think that we are on an overall good path. There are three particular focus areas for the future:

Teaching: This was also a focus last time, and probably will still be in the future. We are overall happy with our decision last time to focus less on many small/medium courses, and instead focus on large, collaborative courses and then focused, individualized support for advanced use cases. Smaller courses happen mainly when we see specific needs that can’t be filled other ways (or we make them large, open, collaborative courses if there is a broad need).
Triton v3: The software/OS/management side of our cluster will be almost completely reworked in the next year (we aren’t getting rid of any hardware just for this). This will take a fair amount of our time, but is needed because existing systems are starting to show their age.
LUMI usage: LUMI is a flagship project of EuroHPC and provides huge resources available to the same people that can use Triton. Triton is still needed for ease of use of everyday projects, but we should actively look for people who can benefit from it and help them port to there. Our recent evaluations lead to the conclusion that our porting help is still needed there.

Teaching

Teaching has long been one of the pillars of ASC’s support. It’s still needed, but the focus seems to be changing. No longer is a room with 10-20 (or ever 50) people considered a lot. People seem both more able and willing to find advanced material themselves, and more in need of basic principles (git, Python for SciComp, etc). Perhaps this is also partly caused by the remote work period emphasizing how all this material is available online anyway. Our basic philosophy:

Focus on large courses for new researchers, for example using the CodeRefinery MOOC strategy. This reaches the most people, helps the beginners the most, produces high-quality open source material for asynchronous reference, and has good possibilities for co-teaching. Example include CodeRefinery, our SciComp/HPC kickstart course, and Python for Scientific Computing.
Advanced, one-on-one, or small-group support via SciComp garage and the Research Software Engineering service. This isn’t just for projects, but is also a useful service for people learning from other advanced material in their work - basically, we work as mentors. One-on-one support is both more rewarding for us and probably more useful to the user (relative to time demands on both ends). Anyway, advanced courses often aren’t offered right when people need them, so we are left in this position anyway.
What about small/medium-sized courses, and advanced courses?
- The first two points above squeeze out medium-sized courses for the most part, in our opinion. By the time our audience is an intermediate or advanced level, they seem to be able to figure things out themselves + ask for help when needed - if they can figure out what they need to do. This point deserves further study, though. Instead, we point to other existing material.
- We will make sure that we have good recommendations for advanced self-study courses and generally chart out the resources so that our users don’t have to. This is mostly done by our Hands-on Scientific Computing course.
- In the past, we have supported community members to give courses on topics of which they are experts. Continue this as appropriate (see the next point).
- Continue the possibility of on-demand courses taught by us if someone requests them, and other smaller courses if we see a strong need. Contact us!

Triton v3

Triton is our HPC cluster, and is notable for being a Ship of Theseus: it’s continually upgraded while being the same cluster. This has resulted in the software running it getting a bit out of date. This software was originally developed as broader partnerships, and as these partnerships have changed, we need to take more responsibility for it ourselves.

Users shouldn’t see any major change from this, though part of it is improving our (user) software installation tools, which should make increased responsiveness to software installation requests.

LUMI

As said above, Lumi is a significant resource, yet our users have not come to us asking for our help in using it. Over the past six months, we have found some Triton users who would benefit from it and helped extend their workflows to work on LUMI. We do this by first testing some applications ourselves, then looking at Triton usage for large users and reaching out directly.

Currently our focus is on GPU-intensive applications, which is made more interesting because LUMI has AMD GPUs. We’ve gotten local AMD GPUs for our own testing and in general are well prepared to support this.

While LUMI is a HPC system and has a typical HPC system interface, it serves so many different users that the software stack is very limited, so that most users need to install their own software and figure out how to run it on AMD GPUs. This is why we recommend most users access LUMI through us (we’re paid to save you time, after all), though of course anyone interested can use it directly.

Aalto SciComp stickers and patches

2023-02-20T00:00:00+00:00

We have stickers (and patches!) to support Aalto Scientific Computing. (You can get them from our IT offices in CS, NBE, and Physics) But why invest in this? Well, it’s fun, but there should be a deeper reason.

Stickers and patches, pick up from either the Physics, Neuroscience and Biomedical Engineering, or Computer Science departments.

While our main goal is to maintain Aalto University Triton HPC cluster, provide courses and direct support to researchers, we cannot scale to solve all problems and make the best decisions without a community: you! Thus, our new promotional material is designed so that the members of our community can show their support for scientific computing at Aalto University. We hope that by providing a way for the community to show this interest, people can find - and support - each other better.

We have the typical hexagonal stickers, which you can use on all the typical sticker things.

We also have patches, for those who are interested - in Finland they are a big thing on [student overalls](https://en.wikipedia.org/wiki/Student_boilersuit), but you could also sew them on your backpack or purse. Please send us pictures to inspire us all! (some have Velcro backing for that kind) of attachment, ask us for that style.

Black background vs white background?

You may notice that for the patches some have a black background and some have a white background. Black-background means “Ask me anything about the tools of scientific computing, I am happy to help or at least point you the right direction (as much as I can)!”

Here’s our idea:

Anyone may take the white background ones
Black background is for:
- Aalto Scientific Computing team staff
- Volunteers at our events (for example helpers at our workshops)
- Anyone who is interested in using their time to help others in scientific computing (regardless of their skills)

(clever people will notice that the first two are included in the third, and actually anyone can be the third if they want).

The idea is that we, and our community, can’t work alone. Everyone needs to support each other in order to work at the level we want. The in-group experts are an undervalued resource in this, often not getting the credit or recognition they deserve in supporting everyone. This is our small method of recognizing those supporters, and we hope that in the future we support them ever more - both career-wise and supporting them in supporting others.

Yes, we should have gotten black-background stickers. We’ll do that next time…

What code has to teach us #1: the impact of implicit behavior

2021-04-14T00:00:00+00:00

“The master has failed more times than the beginner has even tried”

– Stephen McCranie

As Research Software Engineers (RSEs), we read and write a lot of code. In this series of blog posts, we are going to share some snippets that taught us important lessons, and thereby impart that wisdom unto you. These snippets are taken from actual research code, responsible for producing results that end up in peer-reviewed scientific articles. That is to say, results that we should have some confidence in to be correct. However, problems have a way of cropping up in the most unexpected places and when they do, there is a chance to learn from them.

The impact of implicit behavior

I was in the metro zooming through Lauttasaari when I received an email from my professor that made my heart skip a beat. We just submitted a paper to Nature Communications and were all still a little giddy about finally sending off the project we had been working on for 3 years. She and the first author had been chatting about the cool methods we had been using for the project and a question arose: were we 100% certain that we “removed copies of the selected stimuli from the train set”? If we hadn’t, we would have to quickly pull back our submission, but surely we had, right? I thought we did. At least, I distinctly remember writing the code to do it. Just to be on the safe side, I decided to double check the code.

Below is the analysis script in question. It reads some data, performs some preprocessing, feeds into the a machine learning algorithm called zero_shot_decoding, and stores the output. I present it here to you in full, because there are many subtleties working together that make this situation so scary. The question I pose to you, dear reader, is this: were the highlighted lines (118–120) executed, or did we have to pull our submission?

 import numpy as np
 from scipy.io import loadmat, savemat
 from scipy.stats import zscore
 from zero_shot_decoding import zero_shot_decoding
 #print('Code version:'+ subprocess.check_output(['git', 'rev-parse', 'HEAD']))

 # Default location of the norm data (see also the --norms command line parameter)
 norm_file = '../data/corpusvectors_ginter_lemma.mat'

 # Handle command line arguments
 parser = argparse.ArgumentParser(description='Run zero-shot learning on a single subject.')
 parser.add_argument('input_file', type=str,
                     help='The file that contains the subject data; should be a .mat file.')
 parser.add_argument('-s', '--subject-id', metavar='Subject ID', type=str, required=True,
                     help='The subject-id (as string). This number is recorded in the output .mat file.')
 parser.add_argument('--norms', metavar='filename', type=str, default=norm_file,
                     help='The file that contains the norm data. Defaults to %s.' % norm_file)
 parser.add_argument('-o', '--output', metavar='filename', type=str, default='results.mat',
                     help='The file to write the results to; should end in .mat. Defaults to results.mat')
 parser.add_argument('-v', '--verbose', action='store_true',
                     help='Whether to show a progress bar')
 parser.add_argument('-b', '--break-after', metavar='N', type=int, default=-1,
                     help='Break after N iterations (useful for testing)')
 parser.add_argument('-n', '--n_voxels', metavar='N voxels', type=int, default=500,
                     help='Number of voxels. Used only for results file name.')
 parser.add_argument('-d', '--distance-metric', type=str, default='cosine',
                     help=("The distance metric to use. Any distance implemented in SciPy's "
                           "spatial.distance module is supported. See the docstring of "
                           "scipy.spatial.distance.pdict for the exhaustive list of possitble "
                           "metrics. Here are some of the more useful ones: "
                           "'euclidean' - Euclidean distance "
                           "'sqeuclidean' - Squared euclidean distance "
                           "'correlation' - Pearson correlation "
                           "'cosine' - Cosine similarity (the default)"))
 args = parser.parse_args()

 verbose = args.verbose
 if args.break_after > 0:
     break_after = args.break_after
 else:
     break_after = None

 print('Subject:', args.subject_id)
 print('Input:', args.input_file)
 print('Output:', args.output)
 print('Norms:', args.norms)
 print('Distance metric:', args.distance_metric)


 m = loadmat(args.input_file)
 if 'brainVecsReps' in m:
     # File without stability selection enabled
     print('Stability selection DISABLED')
     X = np.array([m['brainVecsReps'][0][i] for i in range(m['brainVecsReps'][0].shape[0])])
     n_repetitions, n_stimuli, n_voxels = X.shape
     voxel_ids = []

     # Drop all voxels that contain NaN's for any items
     non_nan_mask = ~np.any(np.any(np.isnan(X), axis=1), axis=0)
     non_nan_indices = np.flatnonzero(non_nan_mask)
     X = X[:, :, non_nan_mask]

     # Normalize betas across items
     X = zscore(X, axis=1, ddof=1)

     # Average over the repetitions
     X = X.mean(axis=0)

     X_perm = None
     splits = None

 elif 'mask_voxels' in m:
     # File without stability selection enabled
     print('Stability selection DISABLED')
     X = m['mask_voxels']
     voxel_ids = m['voxel_ids']
     n_stimuli, n_voxels = X.shape
     X_perm = None
     splits = None

 elif 'top_voxels_perm' in m:
     # File with stability selection enabled
     print('Stability selection ENABLED')
     X_perm = m['top_voxels_perm']
     X = m['top_voxels_all']
     voxel_ids = m['top_voxel_ids']
     n_stimuli, n_voxels, _ = X_perm.shape

     assert os.path.isfile('leave2out_index.npy')
     splits = np.load('leave2out_index.npy')

 elif 'brainVecs' in m:
     # File with single-trial data
     print('Stability selection DISABLED, single-trial data')
     X = m['brainVecs']
     voxel_ids = m['voxindex']
     n_stimuli, n_voxels = X.shape
     X_perm = None

     def generate_splits(n_stimuli, block_size=60):
         """Generate train-set, test-set splits.

         To save computation time, we don't do the full 360*359/2 iterations.
         Instead we will do the leave-2-out scheme block-wise and use the rest
         of the data for training.
         """
         assert n_stimuli % block_size == 0
         n_blocks = n_stimuli // block_size
         for x in range(n_stimuli):
             for y in range(x + 1, n_stimuli):
                 # Don't make the model distinguish between duplicate stimuli
                 if x % block_size == y % block_size:
                     continue

                 test_set = [x, y]
                 train_set = np.setdiff1d(np.arange(n_stimuli), test_set)

                 # Remove copies of the selected stimuli from the train set
                 train_set = np.setdiff1d(train_set, [i * block_size + (x % block_size) for i in range(n_blocks)])
                 train_set = np.setdiff1d(train_set, [i * block_size + (y % block_size) for i in range(n_blocks)])

                 yield train_set, test_set

     splits = generate_splits(n_stimuli)

 else:
     raise RuntimeError('Could not find any suitable data in the supplied input file.')

 # Load the norm data
 m = loadmat(args.norms)
 y = m['newVectors']

 if not np.isfinite(y).all():
     raise RuntimeError('The norm data contains NaNs or Infs.')
 if not np.isfinite(X).all():
     raise RuntimeError('The brain data contains NaNs or Infs.')

 pairwise_accuracies, model, target_scores, predicted_y, patterns = zero_shot_decoding(
     X, y, X_perm, verbose=verbose, break_after=break_after, metric=args.distance_metric, cv_splits=splits
 )

 savemat(args.output, {
     'pairwise_accuracies': pairwise_accuracies,
     'weights': model.coef_,
     'feat_scores': target_scores,
     'subject': args.subject_id,
     'inputfile': args.input_file,
     'alphas': model.alpha_,
     'voxel_ids': voxel_ids,
     'predicted_y': predicted_y,
     'patterns': patterns,
 })

Lessons this code has to teach us

The first thing that went through my head, as it probably went through yours, was: this code is so long and complicated, answering this seemingly simple question is going to take some time to figure out. And I won’t blame you for giving up right then and there. Hunched over my laptop while the metro passed through Ruoholahti, I tried to trace the logic of the script.

First problem: much of the behavior of the script is dictated by the command line arguments. Luckily, their values are saved in the output file, so I could check that they were correct.

Note

Lesson: always error on the side of caution when deciding whether it is worth storing something in the result file.

That brings us to the big if-statement. Did the correct branch execute? Well, that depends on what was in the m dictionary, which translates to what variables were defined in the MATLAB file used as input to the script. If we had used the wrong variable name, i.e. brainVecsReps instead of brainVecs, when creating the input file, the wrong branch would have executed and the script would have been happily computing the wrong thing. And we would never know. If we had used the wrong input file, or the wrong version of the input file, the wrong branch would have executed without any indication that something was wrong. So many opportunities for small mistakes to lead to a big error.

Note

Lesson: have the user be explicit in what they want to do, so the script can check the user’s intent against the inputs and raise a nice big error if they screwed up. In this case, there should really have been either a command line parameter determining which branch to execute, or even better, this should have been four separate scripts.

In the end I ended up searching the logfile for the line Stability selection DISABLED, single-trial data which, thankfully, was there, so the correct branch did execute.

Note

Lesson: be liberal with print-statements (or other logging directives) in your scripts; cherish the resulting logfiles.

I breathed a sigh of relieved as the metro pulled into the central railway station.

This if-statement is a work of insanity. What was I thinking determining what the script should be doing based on a mostly random naming scheme of some variables in a MATLAB file? I got lucky that time. But from that moment on, I would heed this lesson:

Note

Explicit is better than implicit.

– The Zen of Python, by Tim Peters