AI Cheat Sheet: Large Language Foundation Model Training Costs

AI, LLMs, Generative AI

Foundation models are large artificial intelligence (AI) models trained on humongous datasets that enable them to do a wide range of tasks across various industries.

They are called foundation models because they form the backbone for other models that businesses and individuals can customize to meet their business or personal needs.

Foundation models are also sometimes called frontier models if they represent the most advanced AI systems.

There are different types of foundation models: text-generation models such as earlier GPT models and Claude, image-generation models like Stable Diffusion and DALL-E, video-generation models like Sora and Veo, and code generation like Code Llama, among many others.

This list will focus on major large language foundation models, with natively multimodal Gemini as the one exception:

  • OpenAI’s GPT and o1 series
  • Anthropic’s Claude family
  • Google’s Gemini
  • Meta’s Llama series
  • Mistral’s flagship model

Once foundation models are initially trained (or pre-trained in industry lingo), many organizations choose to further train the models to give them specific capabilities. This further training is called fine-tuning.

Examples of fine-tuned, industry-specific models include the following:

Breaking Down Model Costs

Training a foundation model has become more expensive as AI models scale up and become more sophisticated. (Here’s a GPT pricing calculator.)

Anthropic CEO Dario Amodei has said it could even cost a billion dollars or more to train a super-sophisticated model. Recently, Chinese AI startup DeepSeek caused a stir after disclosing it was able to do it for $5.6 million, but this doesn’t include all the costs and its claims are being disputed.

The cost of training can be broken down into the following parts:

  • Computing infrastructure (AI chips, data centers, cloud computing)
  • Model training time
  • Energy consumption and cooling costs
  • Data acquisition and processing
  • Fine-tuning and evaluation
  • Storage and networking
  • Engineering staff and research costs

Here’s a list of major foundation models and their estimated training costs, being used by U.S. companies and available through U.S. cloud computing giants AWS, Microsoft Azure and Google Cloud:

Also included is their parameter count. Parameters are internal numerical variables AI models adjust during training to generate better responses. The higher the parameter count, the more capable is the AI model.

OpenAI

OpenAI is the creator of ChatGPT, the AI chatbot that ushered in a watershed moment in AI. ChatGPT became the fastest-growing consumer app in history, reaching 100 million monthly active users in two months after debuting in late November 2022. Microsoft is its largest investor thus far, having put in at least $13 billion, but SoftBank is reportedly preparing a bigger investment.

AI model: OpenAI o1

Released: 2024

Parameters: Unknown

Estimated training cost: Unknown

AI model: GPT-4, 4o, 4o-mini, 4-turbo

Released: 2023 and 2024

Parameters: 1.7 trillion to 4 trillion

Estimated training cost: $78 million just for GPT-4


AI model: GPT-3, 3.5

Released: 2020

Parameters: 175 billion for GPT-3

Estimated training cost: Ranges from $4.6 million to $12 million to $15 million


AI model: GPT-2

Released: 2019

Parameters: 1.5 billion

Estimated training cost: Around $40,000


AI model: GPT-1

Released: 2018

Parameters: 117 million

Estimated training cost: Less than $50,000


Google

Google is one of the most influential players in AI development and has an unmatched bench of AI researchers. ChatGPT would not have existed without Google’s research. The company invented an architecture called “Transformer” in its seminal paper, “Attention Is All You Need,” which became the basis for OpenAI’s GPT large language model series. The term GPT is the acronym for “Generative Pre-trained Transformer.”

AI model: Gemini 2 Flash

Released: 2024

Parameters: Not disclosed

Estimated training cost: Not disclosed


AI model: Gemini 1 (Ultra, Pro, Nano), 1.5

Released: 2023

Parameters: From 1.8 billion to 1.5 trillion

Estimated training cost: $191 million just for Ultra


Anthropic

Widely seen as the closest startup rival to OpenAI, Anthropic was founded by former OpenAI employees who were key contributors to OpenAI’s early research on its large language models. Amazon and Google are its main investors. What makes Anthropic different is its commitment to developing safer AI by imbuing its LLMs with “constitutional AI,” a method it invented to train AI models in a way that aligns with human ethical values.

AI model: Claude, 2 and 3 (Haiku, Sonnet, Opus)

Released: 2023 and 2024

Parameters: Not disclosed

Estimated training cost: Tens of millions for Sonnet 3.5


Meta

Meta is a big player in open-source large language models with its Llama family of models. After pivoting to the metaverse in 2021, Meta CEO Mark Zuckerberg has increasingly put his focus on AI after the success of ChatGPT. He took the open-source route to encourage other developers to use and improve his models, which he can adopt for his social media apps.

AI model: Llama

Released: 2023

Parameters: 7 billion, 13 billion, 33 billion and 65 billion

Estimated training cost: $30 million


AI model: Llama 2

Released: 2023

Parameters: 7 billion, 13 billion and 70 billion

Estimated training cost: More than $20 million


AI model: Llama 3, 3.1, 3.2, 3.3

Released: 2024

Parameters: 1 billion to 405 billion

Estimated training cost: At least $500 million (big jump from Llama 2 due to big jump in size and complexity)


Amazon

As a pioneer in cloud services through AWS, Amazon has taken a more pragmatic approach to generative AI. Rather than compete in the development of foundation models, it backed Anthropic instead and offered the models of other companies like Claude and Llama on its platform for customers to use. It has since come out with its own Nova LLM family.

AI model: Nova (Micro, Lite, Pro, Premier)

Released: 2024

Parameters: Not disclosed

Estimated training cost: Not disclosed


Microsoft

Although it has released some AI models, Microsoft is not a big player in AI foundation model development. Instead, it prefers to offer OpenAI’s models. Microsoft was an early backer of OpenAI, investing $1 billion in 2019 — three years before ChatGPT brought AI to the masses. It has since substantially increased its investment to over $13 billion, and now is the exclusive provider of OpenAI’s models to enterprise customers.

AI model: Phi-1, 1.5, 2, 3-Mini, 3-Small, 3-Medium

Released: 2023 and 2024

Parameters: 1.3 billion to 14 billion

Estimated training cost: Not disclosed


AI model: Phi-3.5-Mini, 3.5-MoE

Released: 2024

Parameters: 3.8 billion for Mini, 42 billion for MoE

Estimated training cost: Not disclosed

Mistral

Mistral is the French equivalent of OpenAI. Founded by former researchers from Meta and Google DeepMind, Mistral is making its mark as an open-source foundation model developer that is becoming a key player in LLMs, offering efficient, open models that challenge the closed systems of OpenAI and Google. 

AI model: Mistral 7B

Released: 2023

Parameters: 7.3 billion

Estimated training cost: Not disclosed


AI model: Mixtral 8x7B

Released: 2023

Parameters: 46.7 billion

Estimated training cost: Not disclosed


AI model: Mistral Large 2

Released: 2024

Parameters: 123 billion

Estimated training cost: Not disclosed


DeepSeek

DeepSeek is a Chinese AI startup whose inexpensive foundation model development costs took Silicon Valley by surprise, especially since it used slower Nvidia chips and still performed on par with the top models from OpenAI, Anthropic and others. While DeepSeek’s claims are now being challenged, the tech giants admit they could learn from the Chinese startup’s innovative techniques. DeepSeek is accessible on the cloud platforms of AWS, Microsoft Azure and Google Cloud. 

AI model: V3

Released: 2024

Parameters: 671 billion

Estimated training cost: $5.6 million (being challenged)


AI model: R1

Released: 2025

Parameters: 671 million

Estimated training cost: Undisclosed