Shengyang Sun @ssydasheng
We built 200k-GPU clusters;
We scaled up & curated higher-quality data;
We scaled compute by 100x;
We developed training & test-time recipes;
We made everything RL native;
We stabilized infrastructure and speeded up;
That's how you turn RL into the pre-training scale.
Yet I am always amazed by this figure everytime I see it.
Try Grok-4 and Grok-4-Heavy.
2025年07月10日 14:23 https://pbs.twimg.com/media/GveMF7jX0AAoCGF.jpg