KAUST is seeking an HPC Systems Administrator in our Job of the Week.
The HPC senior systems administrator will work with other Advanced Computing Infrastructure team members to administer the 850 node Ibex+ supercomputer, storage systems, other support computers, networking fabrics and other related systems and services. The required competencies of the senior administrator are: high degree of systems design skill with respect to service dependencies and overall systems availability; ability to assess and assign priorities of various tasks based on the operational requirements of the systems; mentor junior staff in the areas of systems technology and interacting with end users. Additionally, the required competencies also include those of the “Systems Administrator” which are: a high level of Linux administration experience (RHEL6, RHEL7 or equivalent); experience managing high performance data storage systems; ability to effectively use the Slurm workload manager; automation experience (including appropriate scripting languages); monitoring of large scale systems; interaction with end users as required; and ability to adapt to changing priorities of tasks as specified by the team lead.
Major Responsibilities:
- Providing a high level of technical competency and mentoring junior staff in all aspects of systems infrastructure administration.
- Work with other team members, as necessary, to maintain the Advanced Computing Infrastructure. This includes the areas of: supporting end users; operational maintenance of existing systems; architectural and design components of upgrades (hardware and software); planned decommissioning of obsolete systems; ensuring the systems are accurately monitored; participate in design and evaluation exercises to maximise utilisation of the infrastructure; and implement and asses test bed systems to permit assessment of architectural ideas and concepts.
- Work closely with end users and provide educational support, as necessary, for them to make more efficient use of the resources.
- Maintain a broad knowledge of the current best practices of HPC systems.
- Ensure systems are configured and maintained in compliancy with university and laboratory policies.
- Maintain and participate in continuous development of IT skills as related to HPC systems.
- As required, develop solutions to requirements that meet or exceed the expectations of university research staff.
- Provide individual or group training on a variety of topics related to HPC infrastructure
- Be mindful of university and laboratory safety polices at all times, and ensure any issues that arise are dealt with in compliance of all relevant policies.
- Interact with other university IT groups professionally.
- Work closely with the HPC team lead and other team members in the development of high level plans.
- Continually maintain and monitor the HPC data storage fabric (including design, testing and analysis as required).
- Develop infrastructure and mechanisms to reliably report on systems utilisation as required
Looking for a new gig? Our Jobs Board helps companies of all sizes hire the best talent and offers the best opportunity for job seekers to get hired.
Are you paying too much for your job ads? Priced at just $99.99 dollars for 90 days, ads on our insideHPC Jobs board are a great way to reach the top supercomputing professionals.