Frank Long is a vice president at the Goldman Sachs Global Institute, where he focuses on AI and the intersection of emerging technology and geopolitics.
Introduction: A generational infrastructure buildout
Two opposing trends are raising fundamental questions about the future of AI. While tech giants are channeling hundreds of billions of dollars of capital expenditure into AI infrastructure, the cost of running AI models is plummeting. Is the market channeling the appropriate supply of capital, given potential demand for AI compute? Is it repeating the excesses of the telecom boom and overbuilding? Or is it in fact still underestimating just how much computational demand is coming?
These questions may shape the global economy in the years to come. For the last year, the eight largest technology companies in the world have made up >30%¹ of the total market value of the S&P 500. In 2025 alone, Google², Microsoft³, Amazon⁴, and Meta⁵ are expected to invest a combined $315 billion on capex, much of it on AI, while global semiconductor revenue has grown to a projected $705 billion per Gartner forecasts⁶. Outside of the tech sector, commercial real estate is booming with new data centers, the power sector is adjusting to surging electricity demands⁷, and CEOs across industries are planning to boost AI spending⁸. Given the scale of this phenomenon, how should we understand the major themes in AI today?
Consistent breakthroughs drive down the cost to run AI
Breakthroughs in algorithms have repeatedly driven down the cost to run equivalent AI models. These cost gains are most clearly illustrated in benchmarked model performance vs. dollars of compute spend. Recently DeepSeek’s R1 model reiterated to the world just how significant cost improvements from algorithmic optimizations can be.
AI chat experiences (e.g., ChatGPT) alone will not justify the amount of AI infrastructure investment
The rapid cost improvements in AI strongly suggest that chat experiences alone will not justify the extensive infrastructure currently being built. Even with mass adoption, they simply won't generate enough computational demand to require the amount of infrastructure being built given the steady drumbeat of algorithmic optimizations.
New AI use cases are necessary to justify the infrastructure spend
To fully utilize all of the infrastructure currently being built, new compute-intensive AI use cases would need to reach mass adoption. Reasoning models, agentic systems, and physical-world applications like robotics are prime examples. Without use cases like these taking off, we risk a significant mismatch between our built infrastructure and actual market demand.
There will be more "DeepSeek moments"
Projections of future AI infrastructure capacity requirements must incorporate the assumption of dramatic declines in the cost to run AI models. The rate of improvement has been astounding. From 2012 to 2021, NVIDIA GPUs improved single-chip inference cost by 1,000 ×. Such remarkable gains would be the equivalent of turning a car that gets 20 miles per gallon into one that could drive you across the United States seven times on a single gallon.
These cost gains were driven largely by software and design optimizations, rather than the shrinking of transistors described in Moore's Law . Shrinking transistors accounted for only 2.5X of this gain, while the rest — about 400× — resulted from tight integration between hardware and software design⁹. These optimizations range from building dedicated circuitry¹⁰ for deep learning to ideas seemingly as simple as rounding numbers¹¹ when many decimal points of precision are unnecessary. Optimizations like these underpin modern deep learning and have enabled the development of more complex AI models than previously thought feasible.
In today’s AI boom, cost advances are best reflected in the declining price of running models with comparable capabilities. AI models are priced in tokens (i.e., pieces of text that are slightly smaller than the average word). Companies using generative AI generally pay for every token processed as input and produced as output. The “price per token” of AI has declined steeply since the public debut of ChatGPT in late 2022. In November 2022, OpenAI priced its original GPT-3 “Davinci” model at $20 per million tokens¹². Four months later, in March 2023, GPT-3.5 Turbo was released with comparable capabilities and a tenth the cost.
As AI advances, a consistent pattern emerges: Companies introduce powerful models at premium prices, then rapidly optimize for cost. GPT-4 initially cost $60 per million tokens; within a year, the comparable GPT-4o model cost just $2.50 — a 24× reduction¹³. DeepSeek's R1 matched OpenAI's o1 benchmarked capabilities at a cost¹⁴ of $2.19 versus $60 per million tokens. Not to be outdone, OpenAI subsequently released o3 Mini at just $1.10 per million tokens . The combined forces of market competition, hardware breakthroughs, and software innovations are delivering cost improvements far beyond what Moore’s Law predicted.
Chat experiences alone cannot justify infrastructure investments
Even if experiences like ChatGPT scaled to match global search (5 trillion annual queries¹⁵), we are facing a potential overbuilding scenario without additional compute-intensive use cases emerging. While it's difficult to determine the exact cost of processing an AI chat query, perhaps the best proxy is the revenue AI model, which providers earn to process the tokens in each request. Assuming token revenue is gross margin positive¹⁶, this number represents the model provider's profits as well as all their costs. Therefore, we can think of token revenue as inclusive of the cost to operate cloud services — everything from data centers, power, and the semiconductors that underpin the service of running model inference to sell tokens.
For an average ChatGPT query on leading AI models today, the model providers would charge about half a cent via their Application Programming Interfaces (APIs)¹⁷>. At 5 trillion queries, that would be around $27 billion annually of revenue for the model providers, some portion of which would be spent on compute cost. However, given that continuous efficiency gains could conceivably indicate another 1,000× compute cost reduction for the model providers over 10 years, the compute cost for model inference in this scenario could fall well below a billion dollars a year.
This implies a world where AI applications enjoy the high margins of today's software services, but also an infrastructure oversupply on the other side of the equation, where cloud providers are leading a massive compute build-out while the cost to run AI models is plummeting faster than chat demand can grow. We will need to see new use cases emerge that demand more compute.
More compute, more capabilities, new agentic experiences
If it is unlikely that today’s AI chat experiences will justify current and projected AI infrastructure investments, then those investments are necessarily predicated on next-generation use cases. We are seeing the emergence of AI agents. These are systems that can initiate and execute complex, multi-step tasks on their own. Imagine having a tireless research agent that:
Financial analysts could ask, “What do I need to know about Company X this quarter?” Lawyers could ask, "What are the relevant precedents and legal arguments related to case A, considering jurisdiction B and legal framework C?" Scientific researchers could ask, “Can you find and summarize all the related work to help me prepare for my experiment?” Current deep research tools from Google and OpenAI can scan hundreds of webpages to compile comprehensive reports for such queries.
Likewise, coding agents can execute multiple edits across numerous files from a single instruction. This offers potential uses that go well beyond software engineering because so many applications that consumers use are structured as code under the hood, including financial models and presentations.
Today’s AI agents perform only a handful of steps before human intervention is needed to keep the model on task after only a few minutes. Further research breakthroughs could get agents closer to operating continuously, executing for hours, or even days, and autonomously charting their next steps like a capable and tireless direct report.
Advanced reasoning models are one path to unlocking autonomous agents that have garnered significant excitement. Late last year, OpenAI’s o1 and DeepSeek’s R1 recaptured the market’s attention, much like ChatGPT did in 2022. Unlike traditional chat LLMs that produce a single response, reasoning models take deliberate, multi-step approaches to generate answers, an approach that consumes significantly more tokens but is closer to how a human might solve a tough problem.
With new approaches to reasoning, we’ve seen new leaps in benchmark performance. OpenAI’s o3 system scored 75.7% on the challenging ARC-AGI benchmark¹⁸ in cost-restricted mode (compared to previous GPT models that scored only in the single digits). It was able to score up to 87.5% when given more time to compute “reasoning tokens”. This breakthrough shows that using more compute to “think more” can lead to entirely new levels of problem-solving from AI, opening up a wider array of potential use cases.
These use cases rely on ever-greater computational resources. Picture a system processing 1,000 times more prompts, analyzing contexts 1,000 times larger, and refining outputs with 1,000 times more detailed intermediate steps than today’s chatbots. Imagine how each interaction then becomes a multi-step computation that scans vast data sets, dynamically generates ideas, and continuously refines responses over extended periods. This could require millions of times more compute provided by more powerful chips, more “DeepSeek moments” of software efficiency gains, and massive data centers running AI workloads.
The buildout, and the bitter lesson
For those that believe in the emergence of these capabilities and use cases, today’s investments in increased compute would therefore be reasonable, not only due to the history of innovation and technological trends, but also based on academic research. According to the Bitter Lesson¹⁹, posited by the Turing Award winning computer scientist Richard Sutton, approaches that scale with increased computation—like search and learning—consistently outperform methods that rely solely on embedded human knowledge. While human-centric techniques deliver short-term wins, they usually plateau . Embracing systems that self-discover and adapt unlocks new frontiers. Therefore, more compute translates directly into more breakthrough capabilities and use cases. The infrastructure buildout may be predicated on this core idea.
AI agents are at the cutting edge of technological development. But they are not just for technologists. Every individual, across all fields, can be empowered by thousands of AI agents in their daily life and work. These agents could navigate vast data troves and autonomously iterate on complex tasks, transforming and improving everything from basic business processes to frontier research.
AI agents may not just amplify human ingenuity. They may spark industrial-revolution–level GDP growth. That’s a future that would justify the generational AI infrastructure buildout.
1MorningStar, “SPY Composition,” https://www.morningstar.com/etfs/arcx/spy/quote
2CNBC Reporting on Alphabet Announcements, https://www.cnbc.com/2025/02/04/alphabet-expects-to-invest-about-75-billion-in-capex-in-2025.html
3CNBC Reporting on Microsoft Announcements, https://www.cnbc.com/2025/01/03/microsoft-expects-to-spend-80-billion-on-ai-data-centers-in-fy-2025.html
4CNBC Reporting on Amazon Announcements, https://www.cnbc.com/2025/02/06/amazon-expects-to-spend-100-billion-on-capital-expenditures-in-2025.html
5DCD Reporting on Meta Announcement, https://www.datacenterdynamics.com/en/news/zuckerberg-says-meta-will-spend-hundreds-of-billions-of-dollars-on-ai-infrastructure-over-the-long-term/
6Gartner Forecasts Worldwide Semiconductor Revenue to Grow 14% in 2025, https://www.gartner.com/en/newsroom/press-releases/2024-10-28-gartner-forecasts-worldwide-semiconductor-revenue-to-grow-14-percent-in-2025
7McKinsey, “AI power: Expanding data center capacity to meet growing demand”, https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/ai-power-expanding-data-center-capacity-to-meet-growing-demand
8McKinsey, “Superagency in the workplace: Empowering people to unlock AI’s full potential”, https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/superagency-in-the-workplace-empowering-people-to-unlock-ais-full-potential-at-work
9Keynote at Hot Chips 2023, Bill Dally, Chief Scientist of NVIDIA, https://www.youtube.com/watch?v=rsxCZAE8QNA
10H.T. Kung, “Why Systolic Architectures” https://www.eecs.harvard.edu/~htk/publication/1982-kung-why-systolic-architecture.pdf
11HuggingFace, “uantization Overview”, https://huggingface.co/docs/optimum/en/concept_guides/quantization
12OpenAI “Deprecations” webpage, https://platform.openai.com/docs/deprecations
13OpenAI Pricing Webpage, https://openai.com/api/pricing/
14Pricing from various inference providers from OpenRouter, https://openrouter.ai/deepseek/deepseek-r1
15SearchEngineLand, “Google now sees more than 5 trillion searches per year” https://searchengineland.com/google-5-trillion-searches-per-year-452928
16Reporting from The Information estimated Anthropic’s gross margin at around 50%. DeepSeek’s open source technical reports claim an even larger margin.
17We are assuming that the average prompt takes roughly 10 paragraphs (to account for previous messages in the thread) and 3 paragraphs of output generated by the model in response. We’re assuming each of these paragraphs is about 487 tokens in length. The pricing we are using is for OpenAI’s latest chat model GPT-4o that is available to premium users. The pricing is ”Input:$2.50 / 1M tokens” and “Output:$10.00 / 1M tokens”. These assumptions give an estimated 0.54 cents per query.
18Arc Prize, “OpenAI o3 Breakthrough High Score on ARC-AGI-Pub”, https://arcprize.org/blog/oai-o3-pub-breakthrough
19Richard Sutton, “The Bitter Lesson”, http://www.incompleteideas.net/IncIdeas/BitterLesson.html
This article has been prepared by Goldman Sachs Global Institute and is not a product of Goldman Sachs Global Investment Research. This article for your information only and should not be copied, distributed, published, or reproduced, in whole or in part. This article does not purport to contain a comprehensive overview of Goldman Sachs’ products and offerings. The views and opinions expressed here are those of the author and may differ from the views and opinions of other departments or divisions of Goldman Sachs and its affiliates. This article should not be used as a basis for trading in the securities or loans of any companies named herein or for any other investment decision and does not constitute an offer to sell the securities or loans of the companies named herein or a solicitation of proxies or votes. Goldman Sachs is not providing any financial, economic, legal, investment, accounting, or tax advice through this article or to its recipient. Certain information contained here may constitute “forward-looking statements” and there is no guarantee that these results will be achieved. Goldman Sachs has no obligation to provide any updates or changes to the information herein. Neither Goldman Sachs nor any of its affiliates makes any representation or warranty, express or implied, as to the accuracy or completeness of the statements or any information contained in this article and any liability therefore (including in respect of direct, indirect, or consequential loss or damage) is expressly disclaimed.
Our newsletter with insights and intelligence from
across the firm
By submitting this information, you agree to receive marketing emails from Goldman Sachs and accept our privacy policy. You can opt-out at any time.