Foundation models are large artificial intelligence (AI) models trained on humongous datasets that enable them to do a wide range of tasks across various industries.
They are called foundation models because they form the backbone for other models that businesses and individuals can customize to meet their business or personal needs.
Foundation models are also sometimes called frontier models if they represent the most advanced AI systems.
There are different types of foundation models: text-generation models such as earlier GPT models and Claude, image-generation models like Stable Diffusion and DALL-E, video-generation models like Sora and Veo, and code generation like Code Llama, among many others.
This list will focus on major large language foundation models, with natively multimodal Gemini as the one exception:
Once foundation models are initially trained (or pre-trained in industry lingo), many organizations choose to further train the models to give them specific capabilities. This further training is called fine-tuning.
Examples of fine-tuned, industry-specific models include the following:
Training a foundation model has become more expensive as AI models scale up and become more sophisticated. (Here’s a GPT pricing calculator.)
Anthropic CEO Dario Amodei has said it could even cost a billion dollars or more to train a super-sophisticated model. Recently, Chinese AI startup DeepSeek caused a stir after disclosing it was able to do it for $5.6 million, but this doesn’t include all the costs and its claims are being disputed.
The cost of training can be broken down into the following parts:
Here’s a list of major foundation models and their estimated training costs, being used by U.S. companies and available through U.S. cloud computing giants AWS, Microsoft Azure and Google Cloud:
Also included is their parameter count. Parameters are internal numerical variables AI models adjust during training to generate better responses. The higher the parameter count, the more capable is the AI model.
OpenAI is the creator of ChatGPT, the AI chatbot that ushered in a watershed moment in AI. ChatGPT became the fastest-growing consumer app in history, reaching 100 million monthly active users in two months after debuting in late November 2022. Microsoft is its largest investor thus far, having put in at least $13 billion, but SoftBank is reportedly preparing a bigger investment.
AI model: OpenAI o1
Released: 2024
Parameters: Unknown
Estimated training cost: Unknown
AI model: GPT-4, 4o, 4o-mini, 4-turbo
Released: 2023 and 2024
Parameters: 1.7 trillion to 4 trillion
Estimated training cost: $78 million just for GPT-4
AI model: GPT-3, 3.5
Released: 2020
Parameters: 175 billion for GPT-3
Estimated training cost: Ranges from $4.6 million to $12 million to $15 million
AI model: GPT-2
Released: 2019
Parameters: 1.5 billion
Estimated training cost: Around $40,000
AI model: GPT-1
Released: 2018
Parameters: 117 million
Estimated training cost: Less than $50,000
Google is one of the most influential players in AI development and has an unmatched bench of AI researchers. ChatGPT would not have existed without Google’s research. The company invented an architecture called “Transformer” in its seminal paper, “Attention Is All You Need,” which became the basis for OpenAI’s GPT large language model series. The term GPT is the acronym for “Generative Pre-trained Transformer.”
AI model: Gemini 2 Flash
Released: 2024
Parameters: Not disclosed
Estimated training cost: Not disclosed
AI model: Gemini 1 (Ultra, Pro, Nano), 1.5
Released: 2023
Parameters: From 1.8 billion to 1.5 trillion
Estimated training cost: $191 million just for Ultra
Widely seen as the closest startup rival to OpenAI, Anthropic was founded by former OpenAI employees who were key contributors to OpenAI’s early research on its large language models. Amazon and Google are its main investors. What makes Anthropic different is its commitment to developing safer AI by imbuing its LLMs with “constitutional AI,” a method it invented to train AI models in a way that aligns with human ethical values.
AI model: Claude, 2 and 3 (Haiku, Sonnet, Opus)
Released: 2023 and 2024
Parameters: Not disclosed
Estimated training cost: Tens of millions for Sonnet 3.5
Meta is a big player in open-source large language models with its Llama family of models. After pivoting to the metaverse in 2021, Meta CEO Mark Zuckerberg has increasingly put his focus on AI after the success of ChatGPT. He took the open-source route to encourage other developers to use and improve his models, which he can adopt for his social media apps.
AI model: Llama
Released: 2023
Parameters: 7 billion, 13 billion, 33 billion and 65 billion
Estimated training cost: $30 million
AI model: Llama 2
Released: 2023
Parameters: 7 billion, 13 billion and 70 billion
Estimated training cost: More than $20 million
AI model: Llama 3, 3.1, 3.2, 3.3
Released: 2024
Parameters: 1 billion to 405 billion
Estimated training cost: At least $500 million (big jump from Llama 2 due to big jump in size and complexity)
As a pioneer in cloud services through AWS, Amazon has taken a more pragmatic approach to generative AI. Rather than compete in the development of foundation models, it backed Anthropic instead and offered the models of other companies like Claude and Llama on its platform for customers to use. It has since come out with its own Nova LLM family.
AI model: Nova (Micro, Lite, Pro, Premier)
Released: 2024
Parameters: Not disclosed
Estimated training cost: Not disclosed
Although it has released some AI models, Microsoft is not a big player in AI foundation model development. Instead, it prefers to offer OpenAI’s models. Microsoft was an early backer of OpenAI, investing $1 billion in 2019 — three years before ChatGPT brought AI to the masses. It has since substantially increased its investment to over $13 billion, and now is the exclusive provider of OpenAI’s models to enterprise customers.
AI model: Phi-1, 1.5, 2, 3-Mini, 3-Small, 3-Medium
Released: 2023 and 2024
Parameters: 1.3 billion to 14 billion
Estimated training cost: Not disclosed
AI model: Phi-3.5-Mini, 3.5-MoE
Released: 2024
Parameters: 3.8 billion for Mini, 42 billion for MoE
Estimated training cost: Not disclosed
Mistral is the French equivalent of OpenAI. Founded by former researchers from Meta and Google DeepMind, Mistral is making its mark as an open-source foundation model developer that is becoming a key player in LLMs, offering efficient, open models that challenge the closed systems of OpenAI and Google.
AI model: Mistral 7B
Released: 2023
Parameters: 7.3 billion
Estimated training cost: Not disclosed
AI model: Mixtral 8x7B
Released: 2023
Parameters: 46.7 billion
Estimated training cost: Not disclosed
AI model: Mistral Large 2
Released: 2024
Parameters: 123 billion
Estimated training cost: Not disclosed
DeepSeek is a Chinese AI startup whose inexpensive foundation model development costs took Silicon Valley by surprise, especially since it used slower Nvidia chips and still performed on par with the top models from OpenAI, Anthropic and others. While DeepSeek’s claims are now being challenged, the tech giants admit they could learn from the Chinese startup’s innovative techniques. DeepSeek is accessible on the cloud platforms of AWS, Microsoft Azure and Google Cloud.
AI model: V3
Released: 2024
Parameters: 671 billion
Estimated training cost: $5.6 million (being challenged)
AI model: R1
Released: 2025
Parameters: 671 million
Estimated training cost: Undisclosed