AI Cheat Sheet: Large Language Foundation Model Training Costs

By PYMNTS | February 7, 2025

Foundation models are large artificial intelligence (AI) models trained on humongous datasets that enable them to do a wide range of tasks across various industries.

They are called foundation models because they form the backbone for other models that businesses and individuals can customize to meet their business or personal needs.

Foundation models are also sometimes called frontier models if they represent the most advanced AI systems.

There are different types of foundation models: text-generation models such as earlier GPT models and Claude, image-generation models like Stable Diffusion and DALL-E, video-generation models like Sora and Veo, and code generation like Code Llama, among many others.

This list will focus on major large language foundation models, with natively multimodal Gemini as the one exception:

OpenAI’s GPT and o1 series
Anthropic’s Claude family
Google’s Gemini
Meta’s Llama series
Mistral’s flagship model

Once foundation models are initially trained (or pre-trained in industry lingo), many organizations choose to further train the models to give them specific capabilities. This further training is called fine-tuning.

Advertisement: Scroll to Continue

Examples of fine-tuned, industry-specific models include the following:

Harvey AI (legal chatbot built on OpenAI models)
Jasper AI (marketing content based on multiple models)
Code Assist (coder based on Gemini)
Zoom AI Companion (meeting assistant based on Claude and others)

Breaking Down Model Costs

Training a foundation model has become more expensive as AI models scale up and become more sophisticated. (Here’s a GPT pricing calculator.)

Anthropic CEO Dario Amodei has said it could even cost a billion dollars or more to train a super-sophisticated model. Recently, Chinese AI startup DeepSeek caused a stir after disclosing it was able to do it for $5.6 million, but this doesn’t include all the costs and its claims are being disputed.

The cost of training can be broken down into the following parts:

Computing infrastructure (AI chips, data centers, cloud computing)
Model training time
Energy consumption and cooling costs
Data acquisition and processing
Fine-tuning and evaluation
Storage and networking
Engineering staff and research costs

Here’s a list of major foundation models and their estimated training costs, being used by U.S. companies and available through U.S. cloud computing giants AWS, Microsoft Azure and Google Cloud:

Also included is their parameter count. Parameters are internal numerical variables AI models adjust during training to generate better responses. The higher the parameter count, the more capable is the AI model.

OpenAI

OpenAI is the creator of ChatGPT, the AI chatbot that ushered in a watershed moment in AI. ChatGPT became the fastest-growing consumer app in history, reaching 100 million monthly active users in two months after debuting in late November 2022. Microsoft is its largest investor thus far, having put in at least $13 billion, but SoftBank is reportedly preparing a bigger investment.

AI model: OpenAI o1

Released: 2024

Parameters: Unknown

Estimated training cost: Unknown

AI model: GPT-4, 4o, 4o-mini, 4-turbo

Released: 2023 and 2024

Parameters: 1.7 trillion to 4 trillion

Estimated training cost: $78 million just for GPT-4

AI model: GPT-3, 3.5

Released: 2020

Parameters: 175 billion for GPT-3

Estimated training cost: Ranges from $4.6 million to $12 million to $15 million

AI model: GPT-2

Released: 2019

Parameters: 1.5 billion

Estimated training cost: Around $40,000

AI model: GPT-1

Released: 2018

Parameters: 117 million

Estimated training cost: Less than $50,000

Google

Google is one of the most influential players in AI development and has an unmatched bench of AI researchers. ChatGPT would not have existed without Google’s research. The company invented an architecture called “Transformer” in its seminal paper, “Attention Is All You Need,” which became the basis for OpenAI’s GPT large language model series. The term GPT is the acronym for “Generative Pre-trained Transformer.”

AI model: Gemini 2 Flash

Released: 2024

Parameters: Not disclosed

Estimated training cost: Not disclosed

AI model: Gemini 1 (Ultra, Pro, Nano), 1.5

Released: 2023

Parameters: From 1.8 billion to 1.5 trillion

Estimated training cost: $191 million just for Ultra

Anthropic

Widely seen as the closest startup rival to OpenAI, Anthropic was founded by former OpenAI employees who were key contributors to OpenAI’s early research on its large language models. Amazon and Google are its main investors. What makes Anthropic different is its commitment to developing safer AI by imbuing its LLMs with “constitutional AI,” a method it invented to train AI models in a way that aligns with human ethical values.

AI model: Claude, 2 and 3 (Haiku, Sonnet, Opus)

Released: 2023 and 2024

Parameters: Not disclosed

Estimated training cost: Tens of millions for Sonnet 3.5

Amazon

As a pioneer in cloud services through AWS, Amazon has taken a more pragmatic approach to generative AI. Rather than compete in the development of foundation models, it backed Anthropic instead and offered the models of other companies like Claude and Llama on its platform for customers to use. It has since come out with its own Nova LLM family.

AI model: Nova (Micro, Lite, Pro, Premier)

Released: 2024

Parameters: Not disclosed

Estimated training cost: Not disclosed

Microsoft

Although it has released some AI models, Microsoft is not a big player in AI foundation model development. Instead, it prefers to offer OpenAI’s models. Microsoft was an early backer of OpenAI, investing $1 billion in 2019 — three years before ChatGPT brought AI to the masses. It has since substantially increased its investment to over $13 billion, and now is the exclusive provider of OpenAI’s models to enterprise customers.

AI model: Phi-1, 1.5, 2, 3-Mini, 3-Small, 3-Medium

Released: 2023 and 2024

Parameters: 1.3 billion to 14 billion

Estimated training cost: Not disclosed

AI model: Phi-3.5-Mini, 3.5-MoE

Released: 2024

Parameters: 3.8 billion for Mini, 42 billion for MoE

Estimated training cost: Not disclosed

Mistral

Mistral is the French equivalent of OpenAI. Founded by former researchers from Meta and Google DeepMind, Mistral is making its mark as an open-source foundation model developer that is becoming a key player in LLMs, offering efficient, open models that challenge the closed systems of OpenAI and Google.

AI model: Mistral 7B

Released: 2023

Parameters: 7.3 billion

Estimated training cost: Not disclosed

AI model: Mixtral 8x7B

Released: 2023

Parameters: 46.7 billion

Estimated training cost: Not disclosed

AI model: Mistral Large 2

Released: 2024

Parameters: 123 billion

Estimated training cost: Not disclosed

DeepSeek

DeepSeek is a Chinese AI startup whose inexpensive foundation model development costs took Silicon Valley by surprise, especially since it used slower Nvidia chips and still performed on par with the top models from OpenAI, Anthropic and others. While DeepSeek’s claims are now being challenged, the tech giants admit they could learn from the Chinese startup’s innovative techniques. DeepSeek is accessible on the cloud platforms of AWS, Microsoft Azure and Google Cloud.

AI model: V3

Released: 2024

Parameters: 671 billion

Estimated training cost: $5.6 million (being challenged)

AI model: R1

Released: 2025

Parameters: 671 million

Estimated training cost: Undisclosed

AI Cheat Sheet: Large Language Foundation Model Training Costs

Get the Full Story

Breaking Down Model Costs

OpenAI

Google

Anthropic

Meta

Amazon

Microsoft

Mistral

DeepSeek

Recommended

Trending News

The Big Story

Featured News

Subscribe

Partner with PYMNTS

Topics

Featured

Stay Current

AI Cheat Sheet: Large Language Foundation Model Training Costs

Get the Full Story

Breaking Down Model Costs

OpenAI

Google

Anthropic

Meta

Amazon

Microsoft

Mistral

DeepSeek

Recommended

Trending News

The Big Story

Featured News

Subscribe

Partner with PYMNTS