Byron Hsu @hsu_byron
At xAI, we are building the world’s most advanced inference system on tens of thousands of GPUs. It has been a fun journey to support the Grok 4 Fast long-context model end-to-end, from autoscaling, disaggregated serving, to model parallelism.
Please DM me or apply to the below links if you are passionate about the following goals:
- P2P communication on any transports (NVL, MNNVL, RDMA) and any parallelism (DP, TP, EP, PP, etc.).
- Fully automated and resilient deployment system with an advanced Kubernetes operator and scheduler.
- Near-zero engine startup time leveraging RDMA and caching.
- Near-zero weight sync time in RL systems through overlapping and RDMA.
2025年09月20日 08:34