DUG runs a 24/365 production service that requires maximum uptime and optimal system performance. The global team are responsible for the operation of 4 compute centres around the world, including the world’s largest privately-owned compute cluster.
The successful candidate will play a major role in both day-to-day problem-solving and project-based work, including:
Linux compute clusters.
Networking and cabling.
NFS and Lustre storage.
Server Cabinets and Tanks.
Power supplies and distribution units.
SLURM.
Day-to-day problem-solving for the DUG production and cluster environment
Leading the worldwide IT monitoring team.
Proactive and reactive maintenance and troubleshooting.
Supporting end-users with any production difficulties.
Monitoring and tuning performance.
Participating in major system deployments worldwide, as needed.
General IT assistance such as:
Linux (95%), Mac OS (4%), Windows (1%) desktops and laptops.
Telephony.
Audio/visual systems.
Keeping management informed and recommending corrective actions.
Identifying potential improvements in hardware, software, and procedures.
Providing rostered on-call support.
Comply with the Company’s HSE regulations and policy.
Cooperate with supervisory and management personnel to ensure the Company’s safety responsibilities are fulfilled.
Additional duties as required.