
An “AI Model Component” Exception for Open Source Licenses
Key Takeaways
- The weights of an AI model integrated into free software are functionally equivalent to object code: they determine the program’s behavior but cannot be modified without the information used to produce them.
- Current free and open source licenses do not capture this reality: copyleft stops at the code, leaving AI components in a blind spot.
- This “AI Model Component Exception” extends existing obligations to AI model components.
Introduction
In our article devoted to the definitions of open source AI, part of the series “Open Source and AI,” we showed that Open Source licenses currently struggle to address the challenges of openness in AI. Open Data licenses such as the ODbL represent a “primary” avenue that can cover both the training database and potentially the AI model itself. As a complement, and to bring the approach closer to the free and open source software movement, we also developed the idea of an exception to Open Source licenses that, like an addendum, would extend the obligations relating to AI models integrated into the code.
This article materializes that idea by proposing a concrete clause: the AI Model Component Exception. It is an additional exception, drafted in English to be directly usable with any existing Open Source license.
Setting the Context: Free Software with an Opaque Component
The initial premise, which justifies the approach, is that a free/open source software integrating AI is not truly free in the sense that the holders of a copy can only partially enjoy the expected freedoms.
The code is under GPL-3.0 or Apache-2.0, the dependencies are documented, the SBOM is clean. But the AI model used does not share its “recipe” — the weights, embeddings, calibration files — even though only this recipe makes it possible to understand the program’s behavior.
This situation is strictly analogous to those that motivated each major extension of copyleft in the history of free software licenses:
| Historical Problem | Response Mechanism | Analogy with AI |
|---|---|---|
| Tivoization (verifying the signature of embedded software so that only the manufacturer’s version runs on the hardware) | GPLv3, Section 6 (obligation to provide signing keys and installation information) | Distributing weights without the data or training scripts is distributing a binary without the source |
| ASP Loophole (using free software as SaaS without ever “distributing” the corresponding source) | AGPL-3.0, Section 13 (obligation to provide code access for network users) | Using a model via an API without ever providing the information needed for modification |
| Library Locking (integrating a free library into a proprietary application without providing the interface information needed to run a modified library) | LGPLv3, Section 4 (obligation to provide the “Corresponding Application Code” and interface information) | Integrating a free model into software without providing what is needed to modify or replace it |
The common denominator is the same: a part of the software is effectively rendered non-free due to the lack of information needed to fully modify it, even though the formal license is free.
The anti-tivoization clause, the Affero clause, and the LGPL mechanism each clarified that existing obligations extended to these situations by resolving the ambiguities that technological evolution had created.
The weights of an AI model integrated into software are functionally equivalent to object code. They determine the program’s behavior but cannot be modified without the information used to produce them (architecture, training scripts, data, hyperparameters). This analogy, proposed notably by Andrew Marble, is shared by a growing part of the community: the weights are the “object code” of the model, and the data and training scripts are its “source code.”
Today, no court has yet ruled on whether the weights of a model trained on copyleft-licensed data constitute a “derivative work” within the meaning of copyright law. The question is complex and is the subject of ongoing academic work and litigation. The 16 million GPL/AGPL files found in training datasets pose the problem acutely.
By adding to copyleft licenses an exception that interprets and removes all ambiguity regarding the necessity of sharing model information, the obligation arises from the license contract, not from a legal qualification that is still debated. This approach occupies a precise space:
- Less radical than the FSF’s position, which holds that a free machine learning application implies that the training code and training data must be free (see their official position). The FSF and the Software Freedom Conservancy maintain an ambitious stance (“FOSS in, FOSS out, FOSS throughout”) that requires complete openness of the entire chain. Factually, this approach is closer to the situation that would combine an ODbL license on training databases and models with a copyleft Open Source license with the exception.
- More conservative than the “Contextual Copyleft” proposed by Shanklin, Hine, Novelli et al. (July 2025), which extends copyleft from training data to the resulting models — but which potentially requires covering more than the software that is the subject of the license, which could be considered contrary to the OSD (which notably sets as a criterion that the license cannot “contaminate” other independent works).
- Complementary to the revival of copyleft-next by Bradley Kuhn and Richard Fontana (July 2025), which aims to design a next-generation copyleft license. The proposal of an exception that removes the ambiguity seems simpler to implement than waiting for such a license to be finalized (however interesting the project may be).
The Two Key Concepts to Be Specified
The “AI Model Component” refers to any part of the software whose behavior is determined, in whole or in part, by a machine learning model: trained parameters (weights, biases, embeddings), model configuration files, calibration data, quantization tables, or any other artifact whose modification is necessary to adapt the behavior of that part of the program.
It is part of the program. The fact that its behavior results from learned parameters rather than from explicitly programmed instructions does not alter its character as software or diminish the rights granted to recipients under the applicable license. It is therefore fully covered by the Open Source license associated with the software.
The second concept is that of “Model Corresponding Information,” which encompasses the complete set of information, data, and instructions reasonably necessary for a skilled person to understand, reproduce, modify, and reintegrate the component into the program — which directly refers to the preferred form for making modifications, extended to the specificities of AI.
This includes:
- (a) The complete specification of the model architecture (topology, layer configuration, activation functions) in a documented, machine-readable format;
- (b) The trained parameters (weights, biases) in a standard or documented format;
- (c) The complete source code of the training, fine-tuning, and evaluation procedures, including preprocessing pipelines;
- (d) The training configuration (hyperparameters, optimizer, random seeds where applicable);
- (e) A sufficiently detailed description of the training data — provenance, collection method, processing steps — to enable the assembly of a functionally equivalent dataset, or the data itself where it can be lawfully redistributed;
- (f) The instructions necessary to retrain, fine-tune, or replace the component and reintegrate it into the program.
Point (e) is intentionally flexible: when data cannot be redistributed (privacy, copyright, volume), a description sufficient to reconstitute an equivalent is enough. This is a pragmatic approach, comparable to the way the GPL requires “scripts used to control compilation and installation” rather than the compiler itself.
The Exception Text
Published under a CC 0 license, this proposed exception is drafted in English as a standalone additional permission, graftable onto any free license through a reference mechanism to the “Applicable License.” It does not depend on any specific license.
AI Model Component Exception
Additional permission for software incorporating AI model components
The following additional permission applies to the software designated below (the “Program”) and is granted by the copyright holder(s) of the Program. This permission supplements — and shall be read together with — the license under which the Program is distributed (the “Applicable License”). To the extent that this permission conflicts with the Applicable License, the more protective provision prevails.
0. Definitions.
An “AI Model Component” means any part of the Program whose behavior is determined, in whole or in part, by a machine learning model — including but not limited to trained parameters (weights, biases, embeddings), model configuration files, calibration data, quantization tables, or any other artifact whose modification is necessary to adapt, correct, or improve the behavior of that part of the Program.
For the avoidance of doubt, an AI Model Component is part of the Program; the fact that its behavior results from learned parameters rather than from explicitly programmed instructions does not alter its character as software or diminish the rights granted to recipients under the Applicable License.
The “Model Corresponding Information” for an AI Model Component means the complete set of information, data, and instructions reasonably necessary for a skilled person to understand, reproduce, modify, and reintegrate that Component into the Program — that is, the preferred form of the AI Model Component for making modifications to it. This includes, without limitation:
(a) the complete specification of the model architecture (topology, layer configuration, activation functions, normalization schemes) in a documented, machine-readable format;
(b) the trained parameters (weights, biases) in a standard or well-documented format;
(c) the complete source code of the training, fine-tuning, and evaluation procedures used to produce the distributed parameters, including data preprocessing and augmentation pipelines;
(d) the training configuration (hyperparameters, optimization settings, random seeds where deterministic reproduction is feasible);
(e) a sufficiently detailed description of the training data — including its provenance, collection method, and processing steps — to enable a reasonably skilled person to assemble a functionally equivalent dataset, or the training data itself where it can be lawfully redistributed; and
(f) the instructions necessary to retrain, fine-tune, or replace the AI Model Component and to reintegrate the modified version into the Program, such that the Program functions with the modified Component in the same manner as it functioned with the original.
1. Obligation to provide Model Corresponding Information.
If the Program includes one or more AI Model Components, any person who distributes or otherwise makes available the Program (or any work based on it) must, in addition to fulfilling all obligations under the Applicable License, provide the Model Corresponding Information for each such Component. This information must be made available under the same conditions, through the same means, and at the same time as the source code (or equivalent material) required by the Applicable License.
This obligation applies regardless of the form in which the AI Model Component is distributed — whether as source code, binary files, serialized parameters, compiled or optimized intermediate representations, or any other form.
2. Scope and limitations.
This additional permission applies solely to AI Model Components that form an integral part of the Program — that is, components without which the Program would not function as intended, or whose removal would materially alter the behavior of the Program. It does not extend to:
(a) independent AI models merely aggregated with the Program on a volume of storage or distribution medium; or
(b) AI models invoked by the Program as separate and independent works through standard, publicly documented interfaces (including but not limited to APIs and network calls).
This permission creates no obligation with respect to outputs produced by the Program, nor with respect to AI models generated through the use of the Program as a tool. It pertains exclusively to components of the Program itself.
Where the provision of any element of the Model Corresponding Information is prevented by a legal impediment independent of the distributor’s will — in particular, obligations arising under data protection law or the sui generis database right — the distributor must provide all elements not so affected, together with a documented description of the missing elements and the precise legal grounds for their omission, so as to enable the recipient to reconstitute or independently obtain such elements.
3. No further restrictions.
The Model Corresponding Information must be provided under terms no more restrictive than the Applicable License. You may not impose any further restrictions on the exercise of the rights granted under the Applicable License with respect to the Model Corresponding Information — including by way of additional contractual obligations, technical protection measures, access controls, or terms of use that would prevent recipients from using, studying, modifying, or redistributing such information.
Any such restriction shall be deemed a further restriction within the meaning of the Applicable License, and shall be void and of no effect to the maximum extent permitted by applicable law.
Notes on the Exception
This proposed exception adopts the classic distinction in free software license law between “derivative work” and “mere aggregation.” A model integrated into the program (without which the program would not function as intended) is a component of the program and triggers the obligation. A model aggregated on the same distribution medium, or called via a public API, remains an independent work not covered by the exception. This is the transposition, for AI, of the boundary between linking and aggregation in software copyleft.
The legal exception (Section 2, final paragraph): it recognizes that in practice, certain elements of the corresponding information (personal data, databases protected by the sui generis right) cannot always be redistributed. It imposes a documentary obligation: provide everything that can be provided, and precisely document what is missing and why. This approach is inspired by the way the GDPR and Directive 2019/790 (TDM) coexist in practice: obligations overlap, and it is transparency about the limits that enables compliance.
The “no further restrictions” clause (Section 3) is directly inspired by GPL-3.0, Section 10 (“You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License”). It prohibits contractual, technical (DRM, access controls), or conditional restrictions on the model’s corresponding information. This is a safeguard against open-washing: publishing weights under a free license while making training information inaccessible through restrictive terms of use. It may be redundant with certain Open Source licenses that already contain this type of clause.
Conclusion — Clarifying, Not Creating
This proposed “interpretive” exception is thus both an open source governance tool and a regulatory compliance tool. The two logics converge: ensuring that users have the information necessary to understand, modify, and audit the software they use.
It naturally complements the requirements of the three European regulations analyzed in our article on the CRA/AI Act/NPLD articulation:
- The AI Act (Art. 53.1) requires providers of GPAI models to make available “sufficiently detailed technical documentation” and information about training data. The “Model Corresponding Information” goes beyond this requirement;
- The CRA requires SBOMs and traceability. The exception complements the software SBOM with an inventory of model components;
- The NPLD extends product liability to software. Documenting the AI components of the software strengthens traceability in the event of a defect (and protects the maintainer who has fulfilled their transparency obligations).
This proposal for an AI Model Component Exception does not create new obligations. It clarifies that the existing obligations of any copyleft Open Source license (providing the information necessary to exercise the freedoms over the software) naturally extend to AI model components, in the same way that the anti-tivoization clause clarified the extension to locked hardware, and the Affero clause clarified the extension to network services.
Thus, if an AI model component determines the program’s behavior, then the information necessary to modify that component is part of the source code in the broad sense (and must be provided under the same conditions).
The exception is published under a CC 0 license. It is designed to be adopted, adapted, and discussed by the community. We invite maintainers, foundations, and legal experts to evaluate it, critique it, and help it evolve. It can be used in the same manner as other Open Source exceptions (LicenseName WITH ExceptionName, e.g., GPL-3.0 WITH AI Model Component Exception).
Illustration credit: Remix of @2006, Senor Codo, blotter_explosion. This file is licensed under the Creative Commons Attribution-Share Alike 2.0 Generic license.