by Haytham ElFadeel - [email protected]

2024

Recent discussions suggest that scaling laws in AI might be slowing. But does this mean innovation is hitting a ceiling?

My take: the underlying scaling relationships haven’t “broken”—what’s changing is where the best ROI is. Some popular benchmarks are saturating and pretraining gains can look incremental, so the field is reallocating effort toward dimensions that return more capability per unit compute. Right now, the lowest-hanging fruit looks a lot like scaling reinforcement learning / post-training optimization (especially for reasoning and tool-use), not just “more pretraining FLOPs”.

This article explores: (1) what “slowing” actually means, (2) the general pattern of technological progress, and (3) what’s next for language models.

Part 1 — Is it really slowing, and if so why?

Advancement in math, reasoning, and language has felt slower in 2024 compared to the rapid progress of 2022–2023. At the same time, we’ve also seen major progress on new frontiers (e.g., multimodality and improved real-time interaction).

A key nuance: “progress” is not a single scalar.

If you track a saturated benchmark, year-over-year gains will naturally flatten even if the underlying model quality is improving. Engineers also shift attention to new capability (e.g. vision, audio, planning, tool use, long-horizon tasks), so the headline benchmark deltas can understate real innovation.

Human nature and the scaling laws…

Some argue that scaling laws are slowing or even breaking. This is not true—at least, not yet.

Interestingly, this narrative echoes similar "doom and gloom" predictions in other fields. Take Moore’s Law, which predicts that the number of transistors on integrated circuits doubles roughly every two years. Critics have predicted its demise for over 20 years, yet Moore's Law remains alive, supported by robust roadmaps from industry leaders like IMEC, TSMC, and Intel [ref]. As Peter Lee, Microsoft Research VP, once joked: "The number of people predicting the death of Moore’s Law doubles every two years".

This phenomenon isn’t unique to technology. The U.S. Bureau of Mines in 1919 and M. King Hubbert in 1956 predicted that we will run out of Oil soon. Thomas Malthus in 1798 and Paul Ehrlich in 1968 predicted that we will run out of food "The battle to feed all of humanity is over. In the 1970s hundreds of millions of people will starve to death in spite of any crash programs embarked upon now".

What all those predictions missed is human ingenuity, our ability to think creatively, to innovate and find solutions.

What the scaling law tell us…

image.png

The scaling law [ref] predicts a ~20% reduction in loss for every order-of-magnitude increase in compute (i.e. increase in compute here means a combination of model size, amount of training data, training time/compute). It’s important to remember, the scaling law doesn’t guarantee uniform improvements across all downstream tasks. To put that in perspective, if we make the simplistic assumption it translates directly for a given benchmark with 80% accuracy, with the order of magnitude increase of compute the new accuracy will be 84%.

Scaling also faces practical challenges, Software moves much faster than Hardware. While Nvidia's latest GPUs (Blackwell) offer 3x-6x performance improvements over their predecessors (Hopper), training larger models is hard and constrained by hardware, power, and economics. This has spurred innovation in areas like parallelism (to maximize the cluster utilization and efficiency), reward optimization (RLHF, DPO), synthetic training data (to maximize training signal and compute), and inference-time techniques like (Chain of Thought - CoT, Tree of Thought - ToT, and Reflection),

“Slowing” as an ROI story: we’re optimizing the best axis