211-3606/24-2M System administrator with strong linux skills for the SCIENCE HPC Center

University of Copenhagen (DIKU)
Published
February 12, 2024
Location
Copenhagen, Denmark
Category
Default  
Job Type
Application URL
https://employment.ku.dk/staff/?show=161024

Description

System administrator with strong linux skills for the SCIENCE HPC Center, at the Department of Computer Science at the University of Copenhagen (DIKU)

The SCIENCE HPC Center at DIKU is looking for a system administrator to manage systems for scientific computing. You will become part of a team that operates large high-performance computing (HPC) systems as well as other systems for scientific computing.

The job and your role
The position as system administrator is centered around the installation and operations of Linux-based systems with many different hardware configurations (CPU, GPU, network, storage etc.).

We have a standard HPC setup with Slurm as cluster manager and the parallel file system Lustre[1] on ZFS is used for storage. The compute nodes are either purely CPU-based or accelerated with GPUs and we use various interconnects (Infiniband, Ethernet, RoCE).

We also manage a large storage system named ERDA or Electronic Research Data Archive[2] which is an in-house developed open-source project. This is based on many open-source projects (Python, OpenSSH, Jupyter, etc.) and Lustre on ZFS is used as storage backend.

To glue everything together, more traditional infrastructure is also operated (DNS, LDAP, mail, NFS, etc.). Much of this is run on virtual machines (KVM) when it makes sense.

You will be part of a team of dedicated employees in the SCIENCE HPC Center. The team consists of five system administrators and system developers, two support employees, and one administrative employee.

Your job will mainly involve tasks keeping everything up and running, implement new hardware and new services. This can include longer periods of development/integration of new technologies.

The SCIENCE HPC Center aims to use open-source solutions. I.e.:

  • OS: Mainly Rocky Linux, but also Redhat.
  • Storage: Lustre, ZFS and NFS. A bit of Ceph.
  • Virtualization: KVM.
  • Container: Podman and Singularity.
  • Cluster manager: Slurm.

There are many technologies to embrace, and you are not expected to be an expert in all of them, but it is an advantage if you have experience with some of them. A high level of experience with Linux, server, storage, and networking is required. In addition, you must be able to independently acquire knowledge and familiarize yourself with new technologies.

The tasks include:

  • Operation of the cluster manager (Slurm).
  • Operation of storage systems (Lustre, ZFS, NFS, Ceph).
  • Network operations (switches, etc.).
  • User support.
  • Development of new systems.
  • Tape-based storage.
  • HPC user support

Who are you?

  • You are passionate about helping users as best as possible.
  • You speak Danish or English.
  • You can express yourself in writing in English.
  • You can collaborate with others.
  • You can independently identify and solve issues.
  • You can independently acquire new knowledge.
  • You have experience with Linux system operations.
  • Experience with HPC operations is a big advantage.
  • A technical or scientific higher education is an advantage.
  • You have worked with IT security at the technical level.

Who are we?
SCIENCE HPC Centre is part of the Department of Computer Science (DIKU) at University of Copenhagen’s Faculty of Science. DIKU is Denmark’s first computer science department and our researchers and graduates have contributed to society’s accelerating digital transformation, and our research environment and results have made us one of Europe’s leading computer science departments with international influence. DIKU is located at Østerbro in Universitetsparken 1, where we share the campus with several other science programs and departments. We have approx. 300 employees (including PhD students and instructors), of which the administration makes up approx. 40 permanent employees.

SCIENCE HPC Center fulfills the role of Faculty of SCIENCE's central data management and e-infrastructure facility. The center is responsible for the development and operations of facilities for storing, sharing, calculating, analyzing, and publishing research data. The aim is to ensure that the SCIENCE faculty's researchers and students can easily archive, share, and process large amounts of data across the university, also with external partners. The SCIENCE HPC Center currently operates systems with a total of approx. 25,000 CPU cores and more than 25PB of storage.

Currently, we have 9 employees, with 6 of them working full-time at the center. Due to the geographical distance, some team members primarily work from home. Consequently, most of our daily communication occurs through chat and email. To address the limited face-to-face interaction, we conduct a weekly online status meeting. Additionally, when circumstances necessitate a physical presence at the office, we take the opportunity for some extra socializing. We value distraction free work environment with flexible hours and very few meetings and phone calls. We have a hands-on approach from procurement to decommissioning of hardware, including deployment, maintenance, and upgrades.

We offer:

  • A multifaceted position providing significant independence and a diverse range of tasks.
  • Inclusion in a dedicated and specialized team.
  • Collaborative colleagues who highly value professional cooperation and maintain an informal, open, and respectful communication style.
  • An academic environment near researchers and students.
  • We have a hands-on approach from procurement to decommissioning of hardware, including deployment, maintenance, and upgrades.

Terms of employment
The employment and salary are made in accordance either with the Circular on the Collective Agreement concerning academics in the state sector concluded between the Ministry of Finance and AC (the Danish Confederation of Professional Associations), or on the multi-union collective agreement for the Confederation of Teachers Unions and the Danish Confederation of Public Employees of 2010 (LC/CO10) and the trade union agreement for public service IT employees (PROSA).

Employment (according to qualifications) will be either as IT officer, academic employee, or special consultant. The salary is based on seniority. Negotiation for salary supplement is possible. The working time is 37 hours per week on average. The working hours are flexible.

The expected starting date is 1 May 2024 or as soon as possible thereafter. The position is a 2-year position.

UCPH is in the process of designing a new shared administrative organisation at UCPH, which is expected to lead to changes within the next 12 - 18 months. The purpose of the administrative reform is, among other things, to strengthen cross-organisational cooperation, increase efficiency, and ensure good career paths for administrative staff members at UCPH. The reform will lead to cutbacks on administration. You are encouraged to contact Anders Pall Skött, +45 23 81 08 86 to learn more about the connection between this position and the administrative reform.

Application deadline: 3 Match 2024  23.59 CET. Applications submitted after the deadline will not be accepted.

Working place: Department of Computer Science, Faculty of Science, University of Copenhagen. Universitetsparken 5, 2100 Copenhagen Ø,

Further information
Contact Chief Consultant, Technical Manager Hans Henrik Happe happe@science.ku.dk, +4535325419 or Head of Section Anders Pall Skött anders.pall@di.ku.dk, +45 23 81 08 86.

Your application
Please send an electronic application with attachment via Jobportalen (Click on the link "Apply Now" at the bottom of the post).

The University wishes our staff to reflect the diversity of society and thus welcomes applications from all qualified candidates regardless of personal background.

Apply
Drop files here browse files ...

Related Jobs

GPU Cluster System Admin   Stanford, CA
February 8, 2024
Linux HPC systems administrator   Centre Informatique, Bâtiment Amphimax. Route de la Sorge, Lausanne, Switzerland
February 7, 2024
Are you sure you want to delete this file?
/