Understanding Open Source AI: What Actually Counts as Open?

On How to,

Here's something that might surprise you: OpenAI doesn't build open AI models. Their GPT and DALL·E offerings? Completely proprietary and closed-source. So what about Meta's Llama? Despite what Mark Zuckerberg says, it's not truly open source either—though it's more open than OpenAI's models, which is saying something.

When we talk about AI models, there are really three main categories:

Proprietary
Open source
Open

These distinctions apply to both large language models (LLMs) and text-to-image generators. The landscape is still forming, and the Open Source Initiative is currently hammering out a strict definition of what qualifies as truly open source AI. Let's break down where things stand right now.

What Does Open Source Actually Mean?

Before diving into open source AI, let's step back and define open source itself. It's not just trendy jargon—the Open Source Initiative maintains a formal definition that outlines the philosophy and core requirements. It's published under the Creative Commons Attribution 4.0 International License, but here's the essence.

Open source doesn't simply mean you can download or access code freely. True open source must be available to anyone who wants to use it, modify it however they see fit, and use it for any purpose. Open source licenses cannot restrict any "field of use"—and this is exactly where many supposedly open source AI models fall short.

The OSI maintains an approved license list, with major ones including Apache 2.0, MIT License, and GNU Public License.

What Are Proprietary AI Models?

Proprietary AI models represent some of the most powerful and widely-used systems available today. Private companies develop and operate them, keeping the source code, training methods, model weights, and even parameter counts under lock and key. You can only access proprietary models through official channels—chatbots, APIs, or applications built on top of these APIs.

Take OpenAI's GPT-4o. We don't know what training data went into it or how many parameters it has. The only way to use it is through ChatGPT, OpenAI's API, or third-party apps like Perplexity or Zapier Chatbots.

And yes, OpenAI charges for access. Want to use GPT-4o—arguably one of the best AI models out there? You're looking at $20 monthly for ChatGPT Plus, API fees, or integrating a paid service. You can't just download GPT-4o and run it on your own servers.

The same applies to virtually every other proprietary AI model:

GPT-4o mini and DALL·E 3 from OpenAI
Claude 3 and Claude 3.5 from Anthropic
Gemini and Imagen 3 from Google
Command R and R+ from Cohere
Midjourney

What Is Open Source AI?

Open source AI consists of models released under open source licenses. But here's where it gets complicated: researchers have discovered that many models claiming to be open source actually aren't. This deceptive practice is called "open-washing," and it's created serious confusion—even among AI writers.

Chart showing the "openness" spectrum of various AI models

Currently, the Open Source Initiative is working to define truly open source AI because existing licenses don't fully address the technical realities of modern AI models. Meeting genuine open source standards requires more than just sharing code—you need to provide training data, training code, model parameters, and much more. Code should be shared under open source licenses, while training data and documentation should use Creative Commons or similar open licenses.

Here's the thing about open source licenses: the strictest ones actually require you to publicly release everything you build with them and credit the original developers. That's the baseline.

What Are Open AI Models?

Open AI models occupy the middle ground between locked-down proprietary systems and the idealized vision of truly open source AI. (Until OSI releases their formal definition, OLMo 7B comes closest to that ideal.)

Simply put, open AI models are made available for free under certain conditions. Usually you can download them from Hugging Face and similar platforms, then run them locally after accepting whatever license terms come with them. You typically can fine-tune them on your own data to build custom versions, create chatbots, and develop applications. In most cases, you can examine model weights and system architecture to understand how they work (to a reasonable degree).

Open licenses still permit broad usage, but they include restrictions you won't find in true open source licenses. For example, Llama 3's license allows commercial use up to 700 million monthly active users and blocks certain use cases. You or I could build something with it, but Apple and Google couldn't. Similarly, Gemma 2's acceptable use policy prohibits "facilitating or encouraging users to engage in any form of criminal activity." That makes sense—Google doesn't want unethical Gemma-powered chatbots flooding social media.

These restrictions, while understandable, contradict open source philosophy. That's why this whole space has become contentious. Many researchers are developing classification systems to clarify exactly how open different models actually are. If any of these gain traction, we'll definitely keep you posted.

The Best Open and Open Source AI Models Today

Here's a rundown of the most noteworthy open and open source models available now. Where they fall on the open source-to-open spectrum is still being debated until we get a solid definition.

AI Model	Developer	Model Type	License	Parameters	Notes
Llama 3.1	Meta	LLM	Custom	8B, 70B, 405B	Usage restrictions and user cap limits
Gemma 2	Google	LLM	Custom	2B, 9B, 27B	Restricted user categories
Phi-3	Microsoft	LLM	MIT	3.8B, 7B, 14B
Mixtral 8x7B	Mistral	LLM	Apache 2.0	8x7B
Mistral 7B	Mistral	LLM	Apache 2.0	7B
DBRX	Databricks/Mosaic	LLM	Custom	Equivalent to 36B	Mixture of Experts—parameter counting is complex
OLMo 7B	Allen Institute for AI	LLM	Apache 2.0	7B	Closest you'll get to truly open source AI right now
FLUX.1 [schnell]	Black Forest Labs	Image Generator	Custom	N/A	Non-commercial use only
FLUX.1 [dev]	Black Forest Labs	Image Generator	Apache 2.0	N/A
Stable Diffusion	Stability AI	Image Generator	Custom	N/A	Earlier versions including 1.5, 2.1, and SDXL available under open licenses

Should You Use Open or Open Source AI?

While truly open source AI models aren't as plentiful as we'd like, the best open models are surprisingly competitive with proprietary alternatives. Llama 3 405B and FLUX.1 can go toe-to-toe with GPT-4o and DALL·E 3. What's interesting here is: if you have the technical chops to work with open source models, you can achieve similar results for significantly less money with substantially more freedom.

Description: Explore the differences between proprietary, open, and truly open source AI models. Learn why Meta's Llama isn't actually open source.

QTitHow

"The more we give, the more we receive"

Category

n8n Tutorial

Understanding Open Source AI: What Actually Counts as Open?

What Does Open Source Actually Mean?

What Are Proprietary AI Models?

What Is Open Source AI?

What Are Open AI Models?

The Best Open and Open Source AI Models Today

Should You Use Open or Open Source AI?

Related Articles

QTitHow

No Comment to " Understanding Open Source AI: What Actually Counts as Open? "

Popular Posts

AI News

Tech News