Cloud Network Engineer II
|  Microsoft | |
|   United States, Texas, Irving  | |
|   7000 State Highway 161 (Show on map) | |
|  Oct 16, 2025 | |
| OverviewThe High Performance Computing and Artificial Intelligence (HPC and AI) team is focused on building the next-generation distributed artificial intelligence supercomputer. Our goal is to enable breakthroughs in artificial intelligence by delivering unmatched computational power, scalability, and reliability. We design and develop advanced infrastructure that supports high-performance model training at scale, laying the groundwork for innovations that expand the boundaries of what artificial intelligence can achieve.We are seeking a Cloud Network Engineer II who is passionate about designing and developing the infrastructure that powers large-scale artificial intelligence and high-performance computing systems. In this role, you will contribute to the design, deployment, and operation of network infrastructure, automation workflows, observability frameworks, and performance optimization systems. These components are essential for achieving ultra-low latency, high throughput, and efficient data movement at petabyte scale in distributed workloads.As a Cloud Network Engineer II on the HPC and AI Infrastructure team, you will work at the intersection of artificial intelligence supercomputing and large-scale networking. Your contributions will directly impact the reliability and performance of distributed clusters, leveraging high-speed fabrics such as Ethernet and InfiniBand, and accelerated compute platforms including NVIDIA and AMD graphics processing units. This is a unique opportunity to help build the network infrastructure that ensures speed, reliability, and high availability at exascale levels, while collaborating across hardware, infrastructure, and platform teams.Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. ResponsibilitiesNetwork Design & Implementation: Architect and deploy high-throughput, low-latency physical network topologies (e.g., Clos, FatTree) using technologies such as InfiniBand and Ethernet to support AI model training and HPC workloads.Infrastructure Automation: Develop and maintain automation frameworks for provisioning, validating, and monitoring physical network infrastructure at scale, ensuring consistency and reliability across data centres.Operational Readiness: Serve as a Designated Responsible Individual (DRI) for physical network systems-monitoring health, responding to incidents, performing root-cause analysis, and driving improvements in availability and observability.Tooling & Instrumentation: Build and integrate tooling for telemetry, diagnostics, and performance tuning of physical network components, enabling real-time visibility into link health, congestion, and jitter.Cross-Functional Collaboration: Partner with hardware engineering, DataCentre operations, and software-defined networking teams to ensure seamless integration of physical and logical network layers.Documentation & Standards: Own the documentation of physical network designs, cabling standards, and deployment procedures. Lead design reviews and ensure alignment with compliance and safety standards.Innovation & Research: Stay current with advancements in optical networking, high-speed interconnects, and AI/HPC fabric technologies. Evaluate and integrate emerging solutions to improve scalability, efficiency, and performance. | |
 
                             
  
 