Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


HPC Systems Administrator

Milwaukee School of Engineering Published: February 17, 2019
Location
Milwaukee, Wisconsin
Job Type

Description

Milwaukee School of Engineering (MSOE) invites applications for a full-time HPC Systems Administrator to join our Electrical Engineering and Computer Science department. The HPC Systems Administrator will lead efforts related to the daily operation of a small to medium-sized GPU-based high-performance computing (HPC) cluster. The individual in this position will provide engineering and administration support for HPC hardware and software. They will be responsible for day-to-day operations, including assisting faculty and students with troubleshooting HPC workloads, developing and delivering training, and planning for growth of the system. Successful candidates will identify, engage and support the needs of faculty and students to support a variety of academic and research workloads.

Essential Duties and Responsibilities

  • Day-to-day operations of the systems including systems administration and monitoring
  • Installing, testing, and maintaining software applications on the cluster
  • Configuration and monitoring of the cluster scheduling and queuing system
  • Documenting system administration procedures for routine and complex tasks
  • Develop and delivery of training materials related to running and monitoring workloads for users
  • Create, administer, archive, and delete user network accounts (~500), user groups and file systems
  • Clearly communicates and enforces policies and procedures
  • Leverages automation tools where valuable and appropriate
  • Provide reactive support and excellent customer service to all students, faculty, and staff
  • Works closely with campus Information Technology department

Qualifications

  • Bachelor's degree in a related field with two years of professional experience.
  • Experience installing, configuring, and maintaining enterprise Linux systems.
  • Experience configuring and managing network attached storage systems and networks.
  • Experience managing and troubleshooting high performance network equipment.
  • Experience with application and systems programming languages such as Python, Java, or Javascript.
  • Strong understanding of open source and commercial database systems including MySQL, Oracle, and Microsoft SQL Server.
  • Experience with identity management technology such as LDAP, Kerberos, etc.
  • Demonstrated ability to manage the full stack (datacenter rack equipment, server hardware, OS, network, and security) of multi-tenant Linux-based systems both individually and within a team environment.

Preferred

  • Experience working in an academic environment.
  • Familiarity with container solutions such as Docker and Singularity.
  • Experience with cluster management tools such as Slurm.
  • Two years' experience in providing support for a GPU-based cluster.
  • Familiarity with Nvidia NGC containers and/or CUDA.
  • Ability to write technical documentation in a clear and concise manner.
  • Self-motivated and works independently and as part of a team. Demonstrates problem-solving skills. Able to learn effectively and meet deadlines.
  • Understanding of system performance monitoring and actions that can be taken to improve or correct performance.
  • Demonstrated skills associated with adapting equipment and technology to serve user needs. Demonstrated comprehensive understanding of how system management actions affect other systems, system users and dependent/related functions.
  • Demonstrated experience writing and editing complex scripts used to perform system maintenance and administration.
  • Applied experience with these or similar technologies: Amazon Web Services;.Net framework; application load balancers; distributed version control and continuous integration/continuous deployment platforms; systems configuration management.
  • General knowledge of other areas of IT. Thorough understanding of and experience with systems-related issues and actions that can be taken to improve or correct performance.
  • Advanced knowledge of computer security best practices and policies including demonstrated experience securing server-based software.

To apply, please visit https://www.milwaukeejobs.com/apply/add/35926478

It is the policy of MSOE to provide equal employment opportunity to all individuals regardless of their race, ethnicity, color, creed, religion, sex, age, national origin, physical or mental disability, military and veteran status, sexual orientation, gender identity, genetic characteristics, marital status or any other characteristic protected by local, state or federal law. This policy applies to all jobs at the University and to all the terms, benefits, and conditions of employment/enrollment.

PI107771777

Apply
Drop files here browse files ...

Related Jobs

HPC Systems Administrator   Chicago, Illinois
February 12, 2019

Resource Links:

Are you sure you want to delete this file?
/