Lead HPC Systems Operations Engineer

Penn State University

University Park Campus
Vice President for Research
The Institute for CyberScience
The Institute for CyberScience (ICS) seeks a Lead HPC Systems Operations Engineer to join our team. You will be an integral contributor to a dynamic and growing team of specialists supported by an exciting $60 million Penn State investment in advanced research cyberinfrastructure (ICS-ACI). Penn State’s elite researchers use our high-performance research cloud to solve real-world problems by conducting simulations, data mining, and other high-performance computing operations. You will participate as an important member of our community of specialists to support this ground-breaking work. As our Lead HPC Systems Operations Engineer, you will work as part of a team to: Lead the operations of researher-focused HPC systems overseeing a team of systems administrators and engineers; define enhancements and provide engineered system solutions for HPC operations; identify requirements and lead implementation projects to improve HPC operations and researcher utilization; deliver HPC systems documentation and compliance to support research and education; and, investigate complex HPC system problems and/or user issues and lead innovative solution approaches. This job will be filled as a level 3, or level 4, depending upon the successful candidate's competencies, education, and experience. Typically requires a Bachelor's degree or higher in an Engineering or Science discipline (Master's degree preferred) or higher plus five years of related experience, or an equivalent combination of education and experience for a level 3. Additional experience and/or education and competencies are required for higher level jobs. Desired skills include: Knowledge and operation of cyberinfrastructure including: large-scale, multi-user compute clusters; high-speed networks (e.g. Infiniband); parallel file systems (e.g. GPFS); cluster resource managers and schedulers (e.g. Moab, PBS, SLURM). Experience with the following: Linux operating systems (e.g. RHEL); monitoring tools (e.g. Nagios, Solarwinds); automated configuration management (e.g. Puppet); scripting languages (e.g. Python, bash). Application of accepted engineering practices that enable the design, development, implementation, and analysis of engineered systems, software, and interconnects. Experience with the following is a plus: Cloud computing platforms, such as OpenStack; distributed Windows computing infrastructure; network based services (e.g. DNS, LDAP, NFS); virtualization technologies and concepts (e.g. VMware); software installation and maintenance in a multi-user Linux environment. Ability to explain concepts to users with varied HPC experience; strong interpersonal skills and the ability to work well in a team environment. Training, education and professional development opportunities are available and encouraged. To learn more about working for ICS, please visit http://ics.psu.edu/careers.

