Microsoft's CTO Scott on the Future of Large Language Models and the 'Law of Scale'

By:Mia Published 2024-07-16T09:26:40Z

TapTechNews July 16th news, Microsoft's Chief Technology Officer (CTO) Kevin Scott restated his firm belief in the law of scale of large language models (LLMs) to continue driving the progress of artificial intelligence when interviewed by a podcast under Sequoia Capital last week, despite some in the field suspecting that progress has stalled. Scott played a key role in facilitating Microsoft's $13 billion technology-sharing deal with OpenAI.

Microsofts CTO Scott on the Future of Large Language Models and the 'Law of Scale'_0

Scott said, Others may have different views, but I don't think we've reached the point of diminishing marginal returns on scaling. I want people to understand that there's an exponential improvement process here, unfortunately, you can only see it once every few years because it takes time to build supercomputers and train models with them.

In 2020, OpenAI researchers explored the law of scale of LLMs, which indicates that as the model gets larger (more parameters), has more training data, and has more powerful computing power, the performance of the language model often improves predictably. This law means that just by increasing the model size and training data, the artificial intelligence capabilities can be significantly enhanced without the need for fundamental algorithmic breakthroughs.

However, since then, other researchers have also questioned the long-term validity of the law of scale. Nevertheless, this concept remains the cornerstone of OpenAI's artificial intelligence research and development philosophy. Scott's optimistic attitude forms a sharp contrast to the views of some critics in the field of artificial intelligence. Some people believe that the progress of large language models has stalled at the level of models like GPT-4. This view is mainly based on informal observations and some benchmark test results of the latest models such as Google's Gemini 1.5 Pro, Anthropic's Claude Opus, and OpenAI's GPT-4o. Some people think that these models have not made leapfrog progress like previous generations of models, and the development of large language models may be approaching the stage of diminishing marginal returns.

TapTechNews noted that the famous critic in the field of artificial intelligence, Gary Marcus, wrote in April this year: GPT-3 is significantly better than GPT-2, and GPT-4 (released 13 months ago) is also significantly stronger than GPT-3. But then what?

Scott's position shows that tech giants like Microsoft still believe it is reasonable to invest in large artificial intelligence models, and they are betting on continuous breakthroughs. Considering Microsoft's investment in OpenAI and the heavy marketing of its own artificial intelligence collaboration tool Microsoft Copilot, the company strongly hopes to maintain the public perception of continuous progress in the field of artificial intelligence, even if the technology itself may encounter bottlenecks.

Another well-known critic in the field of artificial intelligence, Ed Zitron, recently wrote on his blog that one reason some people support continued investment in generative artificial intelligence is that OpenAI has some technology that we don't know about, a powerful and mysterious technology that can completely crush all skeptics' doubts. He wrote, But that's not the case.

The public perception of the slowdown in the improvement of the capabilities of large language models and the results of benchmark tests may partly be due to the fact that artificial intelligence has only recently come into the public eye, while in fact, large language models have been developed for many years. OpenAI has been continuously de veloping large language models for three years after the release of GPT-3 in 2020 until the release of GPT-4 in 2023. Many people may have only begun to realize the powerful functions of models like GPT-3 after the chatbot ChatGPT developed based on GPT-3.5 went online at the end of 2022, so they would feel a huge improvement when GPT-4 was released in 2023.

Scott refuted the view that the progress of artificial intelligence has stalled in the interview, but he also admitted that due to the fact that new models often take several years to develop, the data points in this field are indeed updated relatively slowly. Nevertheless, Scott is still full of confidence in the improvement of future versions, especially in areas where the current models perform poorly.

The next breakthrough is coming, and I can't exactly predict when it will happen, nor do I know how much progress it will make, but it will almost certainly improve those aspects that are not perfect at present, such as the model being too costly or too fragile to use with confidence, Scott said in the interview, All these aspects will be improved, the cost will be reduced, and the model will become more stable. Then we will be able to achieve more complex functions. This is exactly what every generation of large language models has achieved through scaling.

Microsoft Kevin Scott Large Language Models Scale