司会者:
"It's important to realize there are two types of training compute. One is the pre-training compute, that's from Grok-2 to Grok-3. Um, but from Grok-3 to Grok-4, we actually putting a lot of compute in reasoning, in RL."
別の話者:
"Yeah, and just like you said, this is literally the fastest moving field, and Grok-2 is like the high school student by today's standard. (...) By training Grok-2, that was the first time we scaled up like the pre-training. We realized that if you actually do the data ablation really carefully, and the infra, and also the algorithm, we can actually push the pre-training quite a lot by amount of 10x to make the model the best pre-trained based model. And that's why we built Colossus, the world's supercomputer with 100,000 H100s. And then with the best pre-trained model, and we realized if you can collect these verifiable outcome reward, you can actually train this model to start thinking from the first principle, start to reason, correct its own mistakes, and that's where the Grok-3 reasoning comes from. And today we ask the question, what happens if you take expansion of Colossus with all 200,000 GPUs, put all these into RL, 10x more compute than any of the models out there on reinforcement learning, unprecedented scale, what's gonna happen?"