Artificial Intelligence

AI Agents Forecast to Boost Tech Cash Flow as Usage Soars

Agentic AI is expected to drive a 24-fold increase in token consumption by 2030 as consumers and enterprises adopt the technology, according to Goldman Sachs Research.
AI chipmakers’ token unit costs are falling, setting the stage for gross margin improvements at hyperscalers as demand for agent applications rises.
For the next 12 to 18 months, there is likely to be a shortage of chips as semiconductor makers build new plants to catch up with demand.
Business adoption of agentic AI will leapfrog consumer use, but it is expected to take time as companies manage organizational challenges.

The growth of agentic artificial intelligence (AI) is poised to dramatically increase the volume of computing performed by hyperscalers running large language models, or LLMs, according to Goldman Sachs Research. The shift will also boost cash flow at these big tech companies.

This explosion in the use of tokens—units of text processed by LLMs—is expected to occur at a pivotal moment in the AI growth story. Investors are wary about the immense sums hyperscalers are spending on chips and data centers to process AI applications.

With consumers and enterprises adopting AI agents, token consumption is expected to multiply 24 times, to 120 quadrillion tokens per month, between 2026 and 2030, says Jim Schneider, the senior equity analyst covering US semiconductor and IT services at Goldman Sachs Research.

With the cost of computing falling at the same time, AI players are positioned for a period of “margin inflection,” says Schneider.

“The concern in the generalist investor community is the sustainability of capex because the free cash flows of hyperscalers have been compressed,” Schneider says. “What fixes that? The answer lies in the underlying economics of the problem. If you raise gross margins, you raise operating cash flow, and that gives you more headroom to spend.”

We spoke with Schneider about the importance of token consumption, how agentic AI for enterprises is expected to leapfrog consumer uses, and why a number with 16 zeroes is a key metric for investors.

There’s a lot of buzz around agentic AI but is it a measurable business?

There is a lot of noise but let’s start with the basics: With agentic AI you have autonomous agents that do not simply respond to a query you have—"tell me about this, tell me about that”—but also perform a sequence of tasks—"go do this and go do that.” Conceptually, those two things are quite different. The problem is that there have been very few numbers to quantify this trend in terms of potential upside to business outcomes.

So what we have done is model some common use cases such as online travel and when a customer calls into a call center asking for help. We used tools to simulate the real-world implementation of agentic AI in the consumer and enterprise spheres and played them out and then calculated a token count.

What is a token count?

Tokens are units of compute. Think of them as units of information to be processed. Agentic AI requires a lot of tokens because many queries are repeated in sequence. It’s like taking a simple chatbot request and blowing it up 10-fold, 20-fold, 50-fold.

What we found is that by 2030 agentic AI will multiply token consumption 12 times on the consumer side, so things like online shopping, cell phone takeovers, and similar functions. Combined with adoption by enterprises, that is 120 quadrillion tokens processed per month.

That’s a big number.

I don’t know what’s beyond quadrillion, but it’s a lot.

So demand means more revenue for AI infrastructure providers, and this is rising as the costs are falling?

Correct. We have been following the falling unit cost of computing tokens for some time now. Semiconductor providers are delivering lower cost per token of 60%-70% per year for inference, which is the process of using trained LLMs to get results. That is a very, very rapid rate of decline. This is happening because of improving chip efficiencies and because of new architectural efficiencies in AI data center architecture.

We believe these improving economics are likely to spur positive gross margin inflection over the next 3-12 months. So we’re at an interesting inflection point.

Can the AI chipmakers keep up with demand?

I think that they can over the long term. To build a new chip factory might take three years. Obviously, the rate at which things are moving is faster than that. And if we rewind six months and we were just talking about chatbots we would have enough capacity to easily handle that.

The issue is that the use cases are moving quite quickly. We weren’t talking about agents a year ago, now we are. What is happening is that the industry is reacting with the capacity that was needed six months ago. But the goalposts are shifting and the chipmaking production system can’t react that fast. Over the next 12 months or longer, we will be in shortage. I think in two years we could be caught up.

Does this set the stage for improved cash flow performance by hyperscalers?

Yes. Right now semiconductor makers in the space are doing 70%-plus gross margins, so there is no problem for those companies. The problem is with hyperscalers and how most of their free cash flow is being consumed by capex. This impacts gross margins. But there is a crossover coming. We see an inflection of those gross margins because the costs are falling faster than the prices.

What is driving the demand for agentic AI in the consumer market?

If you think about it, a lot of the consumer activity is about online queries. Many today are traditional search. We see traditional search falling as a percentage of queries by 2030, and that is going to get replaced by things like large language model queries. But next comes agentic use cases.

There are already smartphone takeover agents in China that perform a bunch of tasks in the background for you: “Book me a flight to Singapore” or “Clean up my main inbox and filter all the junk mail and organize all the email into business priorities.”

These things are becoming more autonomous in nature. We are entering a phase of “always-on” background agents that perform tasks when they’re needed.

And so you can imagine the mix of all these queries changing pretty dramatically over the next five years. We have modelling that shows daily queries to LLMs growing at a 40% compound annual growth rate to 11 billion by 2030.

Why is the growth of agentic AI in the enterprise sphere taking longer?

The reason is because applying agentic AI in business is more complex. Writing code or a piece of software is far more involved than booking a flight to Singapore. Even handling a customer service call is more complicated.

It not only has to work, it has to be tested, retested, and integrated with other pieces of code, tested again, documented. And it must also work in the context of compliance, rules, budgetary parameters, and other requirements of the enterprise.

The important point is that the adoption rates are still relatively low today, especially in small to medium-sized businesses. In 2030, we forecast that 12% of knowledge workers will be using agentic AI yet by 2040 that figure will be 37%. You have this very long tail adoption.

Is there a risk the benefits of rising demand and falling costs will not materialize for all AI providers?

There are risks that the improvements to margins are not generated across all AI workloads. In other words, we expect the adoption of agentic AI in the enterprise space to be uneven. Things like coding, for example, are very efficient because the agent can go off and do things autonomously and independently come back very quickly.

With things like customer service agents, text-based chatbots are already quite efficient. But there are other jobs with technical factors in play that make them less attractive for agentic AI. We found the case of a real-time voice agent where the human cost was actually less than the LLM cost today due to what we call “time dependency” and “latency characteristics” in the software. So the economics are not nearly as favorable for that.

Stepping back, the big takeaway is that soaring demand for agentic AI may reset assumptions about what comes next for this industry.

Yes. I think the margin inflection for hyperscalers and model providers is very different from the prevailing market narrative that AI usage will simply drive an increasing and unsustainable cost burden.

The evolution is going to be uneven and a bit non-linear; not all the players are at the same level. You are going to start to see differentiation between the hyperscalers, especially when it comes to their operating cash flows. Every player will get dragged along to the upside but at different rates.

FAQ

What is the forecast for AI token consumption?

How will rising demand for AI agents change the ‘token economics’ for tech companies?

Semiconductor providers are delivering cost reductions of 60%-70% per year per token for inference, which is the process of using trained models to generate results, according to Goldman Sachs Research. At the same time, demand for AI agents is surging. This combination positions hyperscalers and model providers for a "gross margin inflection" as revenues exceed costs. This is significant because investors have been concerned about the sustainability of massive capital expenditures on data centers. Higher gross margins translate to higher operating cash flow.

Why is enterprise adoption of agentic AI lagging behind consumer use?

Are there enough chips to satisfy the demand for AI agents?

This article is being provided for educational purposes only. The information contained in this article does not constitute a recommendation from any Goldman Sachs entity to the recipient, and Goldman Sachs is not providing any financial, economic, legal, investment, accounting, or tax advice through this article or to its recipient. Neither Goldman Sachs nor any of its affiliates makes any representation or warranty, express or implied, as to the accuracy or completeness of the statements or any information contained in this article and any liability therefore (including in respect of direct, indirect, or consequential loss or damage) is expressly disclaimed.

Related Tags

What We Do

GLOBAL BANKING & MARKETS

ASSET & WEALTH MANAGEMENT

PLATFORM SOLUTIONS

RESEARCH & PERSPECTIVES

Serving Clients

Insights

OUR SERIES

TOPICS IN FOCUS

FORMATS

Exchanges

Our Firm

ABOUT US

FEATURED

10,000 Small Businesses

Careers

WORKING AT GS

Artificial Intelligence

AI Agents Forecast to Boost Tech Cash Flow as Usage Soars

There’s a lot of buzz around agentic AI but is it a measurable business?

What is a token count?

That’s a big number.

So demand means more revenue for AI infrastructure providers, and this is rising as the costs are falling?

Can the AI chipmakers keep up with demand?

Does this set the stage for improved cash flow performance by hyperscalers?

What is driving the demand for agentic AI in the consumer market?

Why is the growth of agentic AI in the enterprise sphere taking longer?

Is there a risk the benefits of rising demand and falling costs will not materialize for all AI providers?

Stepping back, the big takeaway is that soaring demand for agentic AI may reset assumptions about what comes next for this industry.

Artificial Intelligence

AI Agents Forecast to Boost Tech Cash Flow as Usage Soars

There’s a lot of buzz around agentic AI but is it a measurable business?

What is a token count?

That’s a big number.

So demand means more revenue for AI infrastructure providers, and this is rising as the costs are falling?

Can the AI chipmakers keep up with demand?

Does this set the stage for improved cash flow performance by hyperscalers?

What is driving the demand for agentic AI in the consumer market?

Why is the growth of agentic AI in the enterprise sphere taking longer?

Is there a risk the benefits of rising demand and falling costs will not materialize for all AI providers?

Stepping back, the big takeaway is that soaring demand for agentic AI may reset assumptions about what comes next for this industry.

Subscribe to Briefings