What is open-source AI?
Models, agents and capabilities
Open-source AI is AI that is made available under terms that let people use, study, modify, and share it, with enough access to the important pieces to make those freedoms meaningful. In AI, that usually means looking beyond code alone to model weights, training or inference code, documentation, and information about the training data. The term is still debated, so many models described as "open" are better described as open-weight rather than fully open-source.
What this means
With normal software, "open-source" usually means the source code is available under an approved licence that allows broad reuse and modification. AI is messier. A useful AI model is not just code. It also depends on learned parameters called weights, on the code used to train and run it, on model architecture, on documentation, and on at least some account of the data used to create it.
That is why open-source AI is not just "open-source software plus AI branding". The hard question is what has to be open for someone else to meaningfully study and modify the system. If you only release the weights, people can run the model and sometimes fine-tune it, but they may not be able to understand how it was built or reproduce something substantially equivalent. If you only offer an API, the system is not open-source in the usual sense, because users cannot inspect or modify the model itself.
This is where the term open-weight becomes useful. Open-weight models make the trained parameters available for download. That can be extremely valuable. It allows self-hosting, inspection, benchmarking, and adaptation. But open-weight is not automatically the same thing as open-source AI. The licence may still impose restrictions, and key ingredients such as the training recipe or data information may still be missing.
So when someone says a model is "open", the practical question is: open which parts, under what terms, and to what extent?
Why it matters
This matters because "open" affects real buying and operating decisions. A genuinely open model can reduce dependence on a single vendor, give more freedom over deployment, support local or on-premises use, and make deeper customisation possible. It can also improve scrutiny, because more people can inspect behaviour, benchmark performance, and identify weaknesses.
For many organisations, the appeal is straightforward. They want more control over where models run, how data is handled, how failures are investigated, and how much switching cost they accept later. Open models can help with all of that.
But openness also shifts responsibility. If you download and run a model yourself, you inherit more of the operational burden. You need infrastructure, patching, monitoring, access control, content safeguards, and a plan for updates. The commercial convenience of a hosted model does not disappear simply because an open option exists.
There is also a legal and governance angle. In AI, labels such as "open-source", "open model", "open weights", and "source-available" are not interchangeable. If a leadership team treats them as interchangeable, it can make poor assumptions about usage rights, support expectations, risk ownership, and procurement fit.
How it works
The most useful way to understand open-source AI is to break it into parts.
One part is code. This includes the code used to train the model, process the data, run inference, and sometimes fine-tune or evaluate the model. Without that code, you can still use some models, but your ability to inspect or modify them is limited.
Another part is weights. These are the learned parameters that make the model behave the way it does. Publishing weights lets others run the model directly, often on their own infrastructure. This is why open-weight releases matter so much in practice. They lower the barrier to experimentation, adaptation, and self-hosting.
A third part is data information. This is one of the most contested areas. Raw training datasets cannot always be redistributed because of privacy, copyright, contract, or safety concerns. As a result, the current open-source AI debate often focuses on whether detailed information about the training data can be enough to make the system meaningfully open, even when every training example is not downloadable.
The Open Source Initiative's Open Source AI Definition is an important attempt to answer this. It frames open-source AI around freedoms to use, study, modify, and share, and says those freedoms require access to the preferred form for making modifications. For machine learning systems, that includes code, parameters, and sufficiently detailed information about the data used to train the system. That is a more demanding standard than simply releasing weights.
This is also why open-source AI is different from open-source software in a narrow sense. For ordinary software, source code is usually the central artifact. For AI, behaviour emerges from the interaction of code, weights, data, and training choices. A team may share one or two of those pieces while keeping the rest closed. That can still be useful, but it does not settle the openness question.
Licensing then adds another layer. Some model families are released under familiar open-source licences such as Apache 2.0. That generally signals broad rights to use, modify, and redistribute. Other model families use custom community licences. Those may permit commercial use in many cases, but still include restrictions that fall short of open-source norms. If the licence says you cannot use the model to improve another large language model, or if it imposes extra conditions on redistribution, many open-source advocates will say it is not truly open-source.
This is why the phrase open-washing has appeared in the conversation. It refers to marketing something as open when the practical freedoms are more limited than that label suggests. A model may be easy to download and still not qualify as open-source under a stricter definition.
At the same time, leaders should not fall into the opposite mistake of dismissing open-weight models as unimportant because they are not fully open-source. In operational terms, open-weight models can still be very valuable. They may be self-hostable, tunable, fast to deploy, and commercially permissive enough for many real uses. For some organisations, that is the key requirement. For others, especially those focused on deep auditability, reproducibility, or research transparency, the fuller standard matters more.
Today's market therefore has several layers. There are API-only proprietary models. There are open-weight models with permissive licences. There are open-weight models with custom restrictions. And there are a smaller number of projects that aim for something closer to full openness across weights, code, training recipe, and data information.
The operational implications flow from those layers. If you want to run a model behind your firewall, an API-only vendor will not meet that need. If you want to retrain a model family and publish a derivative without custom restrictions, a permissive licence matters. If you need to understand provenance and reconstruct the build process, weights alone are not enough.
A further wrinkle is that openness does not guarantee quality, safety, or maintainability. An open model can be excellent or poor. It can be carefully documented or barely documented. It can have a thriving community or effectively no support. Openness changes access and control, not the laws of engineering.
So the practical definition is not abstract. Open-source AI is about whether another skilled team can meaningfully take the system, study it, adapt it, and share changes without needing further permission. That is a higher bar than public access alone.
Examples
A regulated business may choose an open-weight model for an internal document assistant because it wants to run inference inside its own environment and avoid sending prompts to an external hosted service. In that case, the value is deployment control, not ideological commitment to open-source.
A product team may adopt a permissively licensed model as the base for a domain-specific assistant, then fine-tune it on proprietary support material. Here the attraction is customisation and the ability to own the serving stack.
A research team may prefer a more fully open project because it needs to inspect training methods, compare checkpoints, or reproduce results. That is where fuller openness, including code and data information, matters more than mere access to weights.
A procurement team may also use openness as a resilience criterion. If a vendor relationship changes, a downloadable model plus a workable licence can create a clearer exit path than a deeply embedded API-only dependency.
Common misunderstandings
One misunderstanding is that open-source AI simply means "free". It does not. You may still pay for hosting, support, fine-tuning, security work, or commercial tooling around the model.
Another is that open-source AI is automatically safer because more people can inspect it. Extra scrutiny can help, but open access also makes misuse easier in some contexts. Openness changes who can inspect and adapt the model. It does not remove the need for governance.
A third mistake is to treat open-weight as identical to open-source. Open weights are important, but they are only one piece. If the licence is restrictive or the build process is opaque, calling the model open-source may overstate what users can actually do.
Teams also assume that a permissive licence settles every legal question. It does not. Training data provenance, downstream use, sector rules, privacy duties, and contractual terms still matter.
Finally, some leaders hear "open-source AI" and think "no vendor support". In reality, many commercial stacks are built around open models. Open and commercial are not opposites.
Risks and boundaries
The main risk is false certainty. A model labelled open may not be open in the way your organisation needs. You need to inspect the actual artefacts, the actual licence, and the actual freedoms granted.
There are governance risks too. If you self-host, patching, updates, model registry control, access management, and abuse prevention become your responsibility. If you fine-tune, you also inherit the need for evaluation and rollback discipline.
Copyright, privacy, and provenance remain active areas of risk. Even with a permissive release, questions about the data used in training may still matter to your legal and risk teams. That is especially important for customer-facing or high scrutiny uses.
There are performance boundaries as well. Some open models are excellent. Some are not. Many lag leading proprietary systems on particular tasks, while beating them on flexibility or cost. It depends on the job.
This article is not legal advice. If licensing rights, derivative model rights, or regulated data use are material to your decision, review the specific model terms and your intended use with appropriate counsel.
What to do next
Begin by defining what you actually need from "open". Do you need downloadable weights, a permissive licence, local deployment, reproducibility, transparent training inputs, or freedom to publish derivatives? Different use cases need different degrees of openness.
Then classify candidate models using plain categories. API-only. Open-weight with permissive licence. Open-weight with custom restrictions. More fully open across weights, code, and data information. This alone clears up many confused conversations.
Next, review the licence before the pilot, not after it. Check redistribution rights, derivative rights, naming requirements, prohibited use clauses, and any restrictions on using the model or its outputs to improve other models.
After that, test the operational side. Can your team actually deploy it, monitor it, update it, and secure it? An open model that your team cannot run responsibly is not a strategic asset.
Finally, decide where openness is strategic and where it is optional. For some workloads, a proprietary API may still be the better fit. For others, the control and flexibility of an open model may be worth the extra operating burden.
FAQs
Is open-source AI the same as open-weight AI?
No. Open-weight means the trained parameters are available. Open-source AI asks a broader question about code, data information, terms, and meaningful freedom to modify and share.
Can open-source AI be used commercially?
Often yes, but you must read the actual licence. Some releases are under permissive licences such as Apache 2.0, while others use custom terms with important restrictions.
Is Meta Llama open-source AI?
It is widely available and very influential, but many open-source advocates do not treat it as truly open-source because of its custom licence restrictions.
Does open-source AI mean the full training dataset is public?
Not always. This is one of the central debates. Some definitions focus on detailed data information rather than requiring every training example to be released.
Is self-hosting an open model always cheaper than using an API?
No. It depends on usage volume, engineering labour, hardware, reliability needs, and support expectations.
Are open models good enough for serious business use?
Many are. The right question is not open versus closed in the abstract. It is whether a specific model meets your performance, risk, and operating requirements.
Sources
The Open Source AI Definition 1.0 (Open Source Initiative). Primary source for the current OSI definition of open-source AI, including freedoms, preferred form for modification, and the required elements of data information, code, and parameters.
OSAID FAQ (Open Source Initiative). Primary source for why AI needs a distinct openness definition, why training data is not treated as software source code, and why detailed data information is still required.
Meta's Llama license is still not Open Source (Open Source Initiative). Primary source for the view that widely available models with licence restrictions should not automatically be described as open-source.
OLMo (Ai2). Primary source for a concrete example of a project presented as fully open across training data, architecture, and evaluation access.
OLMo 2 technical overview (Ai2). Primary source for OLMo 2 as a fully released family with model weights, full training data, code, recipes, logs, and checkpoints.
AI openness: A primer for policymakers (OECD). Secondary source for the broader policy framing that "open source" in AI is a contested and evolving term rather than a settled software analogue.
