AI Cheat Sheet: Large Language Foundation Model Training Costs

AI, LLMs, Generative AI

Foundation models are large artificial intelligence (AI) models trained on humongous datasets that enable them to do a wide range of tasks across various industries.

    Get the Full Story

    Complete the form to unlock this article and enjoy unlimited free access to all PYMNTS content — no additional logins required.

    yesSubscribe to our daily newsletter, PYMNTS Today.

    By completing this form, you agree to receive marketing communications from PYMNTS and to the sharing of your information with our sponsor, if applicable, in accordance with our Privacy Policy and Terms and Conditions.

    They are called foundation models because they form the backbone for other models that businesses and individuals can customize to meet their business or personal needs.

    Foundation models are also sometimes called frontier models if they represent the most advanced AI systems.

    There are different types of foundation models: text-generation models such as earlier GPT models and Claude, image-generation models like Stable Diffusion and DALL-E, video-generation models like Sora and Veo, and code generation like Code Llama, among many others.

    This list will focus on major large language foundation models, with natively multimodal Gemini as the one exception:

    • OpenAI’s GPT and o1 series
    • Anthropic’s Claude family
    • Google’s Gemini
    • Meta’s Llama series
    • Mistral’s flagship model

    Once foundation models are initially trained (or pre-trained in industry lingo), many organizations choose to further train the models to give them specific capabilities. This further training is called fine-tuning.

    Advertisement: Scroll to Continue

    Examples of fine-tuned, industry-specific models include the following:

    Breaking Down Model Costs

    Training a foundation model has become more expensive as AI models scale up and become more sophisticated. (Here’s a GPT pricing calculator.)

    Anthropic CEO Dario Amodei has said it could even cost a billion dollars or more to train a super-sophisticated model. Recently, Chinese AI startup DeepSeek caused a stir after disclosing it was able to do it for $5.6 million, but this doesn’t include all the costs and its claims are being disputed.

    The cost of training can be broken down into the following parts:

    • Computing infrastructure (AI chips, data centers, cloud computing)
    • Model training time
    • Energy consumption and cooling costs
    • Data acquisition and processing
    • Fine-tuning and evaluation
    • Storage and networking
    • Engineering staff and research costs

    Here’s a list of major foundation models and their estimated training costs, being used by U.S. companies and available through U.S. cloud computing giants AWS, Microsoft Azure and Google Cloud:

    Also included is their parameter count. Parameters are internal numerical variables AI models adjust during training to generate better responses. The higher the parameter count, the more capable is the AI model.

    OpenAI

    OpenAI is the creator of ChatGPT, the AI chatbot that ushered in a watershed moment in AI. ChatGPT became the fastest-growing consumer app in history, reaching 100 million monthly active users in two months after debuting in late November 2022. Microsoft is its largest investor thus far, having put in at least $13 billion, but SoftBank is reportedly preparing a bigger investment.

    AI model: OpenAI o1

    Released: 2024

    Parameters: Unknown

    Estimated training cost: Unknown

    AI model: GPT-4, 4o, 4o-mini, 4-turbo

    Released: 2023 and 2024

    Parameters: 1.7 trillion to 4 trillion

    Estimated training cost: $78 million just for GPT-4


    AI model: GPT-3, 3.5

    Released: 2020

    Parameters: 175 billion for GPT-3

    Estimated training cost: Ranges from $4.6 million to $12 million to $15 million


    AI model: GPT-2

    Released: 2019

    Parameters: 1.5 billion

    Estimated training cost: Around $40,000


    AI model: GPT-1

    Released: 2018

    Parameters: 117 million

    Estimated training cost: Less than $50,000


    Google

    Google is one of the most influential players in AI development and has an unmatched bench of AI researchers. ChatGPT would not have existed without Google’s research. The company invented an architecture called “Transformer” in its seminal paper, “Attention Is All You Need,” which became the basis for OpenAI’s GPT large language model series. The term GPT is the acronym for “Generative Pre-trained Transformer.”

    AI model: Gemini 2 Flash

    Released: 2024

    Parameters: Not disclosed

    Estimated training cost: Not disclosed


    AI model: Gemini 1 (Ultra, Pro, Nano), 1.5

    Released: 2023

    Parameters: From 1.8 billion to 1.5 trillion

    Estimated training cost: $191 million just for Ultra


    Anthropic

    Widely seen as the closest startup rival to OpenAI, Anthropic was founded by former OpenAI employees who were key contributors to OpenAI’s early research on its large language models. Amazon and Google are its main investors. What makes Anthropic different is its commitment to developing safer AI by imbuing its LLMs with “constitutional AI,” a method it invented to train AI models in a way that aligns with human ethical values.

    AI model: Claude, 2 and 3 (Haiku, Sonnet, Opus)

    Released: 2023 and 2024

    Parameters: Not disclosed

    Estimated training cost: Tens of millions for Sonnet 3.5


    Meta

    Meta is a big player in open-source large language models with its Llama family of models. After pivoting to the metaverse in 2021, Meta CEO Mark Zuckerberg has increasingly put his focus on AI after the success of ChatGPT. He took the open-source route to encourage other developers to use and improve his models, which he can adopt for his social media apps.

    AI model: Llama

    Released: 2023

    Parameters: 7 billion, 13 billion, 33 billion and 65 billion

    Estimated training cost: $30 million


    AI model: Llama 2

    Released: 2023

    Parameters: 7 billion, 13 billion and 70 billion

    Estimated training cost: More than $20 million


    AI model: Llama 3, 3.1, 3.2, 3.3

    Released: 2024

    Parameters: 1 billion to 405 billion

    Estimated training cost: At least $500 million (big jump from Llama 2 due to big jump in size and complexity)


    Amazon

    As a pioneer in cloud services through AWS, Amazon has taken a more pragmatic approach to generative AI. Rather than compete in the development of foundation models, it backed Anthropic instead and offered the models of other companies like Claude and Llama on its platform for customers to use. It has since come out with its own Nova LLM family.

    AI model: Nova (Micro, Lite, Pro, Premier)

    Released: 2024

    Parameters: Not disclosed

    Estimated training cost: Not disclosed


    Microsoft

    Although it has released some AI models, Microsoft is not a big player in AI foundation model development. Instead, it prefers to offer OpenAI’s models. Microsoft was an early backer of OpenAI, investing $1 billion in 2019 — three years before ChatGPT brought AI to the masses. It has since substantially increased its investment to over $13 billion, and now is the exclusive provider of OpenAI’s models to enterprise customers.

    AI model: Phi-1, 1.5, 2, 3-Mini, 3-Small, 3-Medium

    Released: 2023 and 2024

    Parameters: 1.3 billion to 14 billion

    Estimated training cost: Not disclosed


    AI model: Phi-3.5-Mini, 3.5-MoE

    Released: 2024

    Parameters: 3.8 billion for Mini, 42 billion for MoE

    Estimated training cost: Not disclosed

    Mistral

    Mistral is the French equivalent of OpenAI. Founded by former researchers from Meta and Google DeepMind, Mistral is making its mark as an open-source foundation model developer that is becoming a key player in LLMs, offering efficient, open models that challenge the closed systems of OpenAI and Google. 

    AI model: Mistral 7B

    Released: 2023

    Parameters: 7.3 billion

    Estimated training cost: Not disclosed


    AI model: Mixtral 8x7B

    Released: 2023

    Parameters: 46.7 billion

    Estimated training cost: Not disclosed


    AI model: Mistral Large 2

    Released: 2024

    Parameters: 123 billion

    Estimated training cost: Not disclosed


    DeepSeek

    DeepSeek is a Chinese AI startup whose inexpensive foundation model development costs took Silicon Valley by surprise, especially since it used slower Nvidia chips and still performed on par with the top models from OpenAI, Anthropic and others. While DeepSeek’s claims are now being challenged, the tech giants admit they could learn from the Chinese startup’s innovative techniques. DeepSeek is accessible on the cloud platforms of AWS, Microsoft Azure and Google Cloud. 

    AI model: V3

    Released: 2024

    Parameters: 671 billion

    Estimated training cost: $5.6 million (being challenged)


    AI model: R1

    Released: 2025

    Parameters: 671 million

    Estimated training cost: Undisclosed