
An “AI Model Component” Note for Open Source Licenses
Key Takeaways
- The weights of an AI model integrated into free software are functionally equivalent to object code: they determine the program’s behavior but cannot be modified without the information used to produce them.
- Compliance practice stops at conventional source files, leaving AI components in a practical blind spot – even though the licenses’ own definitions (“preferred form for making modifications”) already reach them.
- The proposal sits in a long tradition of adapting free software licensing to a moving technical environment (GPLv3 anti-tivoization, AGPL §13, LGPL interface information). It departs from those precedents in kind: it does not modify any license. It is an interpretive clarification by the copyright holders, in line with well-known examples such as the Linux kernel
COPYINGnote on system calls or the FSF’s GPL FAQ.
Update (June 2026). Following feedback from the FSFE Legal Network and the Free Software Foundation, the proposal evolved into version 1.1-draft: renamed “Note”, scope of Section 2 arbitrated (functional test anchored in the copyright holders’ bundling decisions), Section 3 reframed as a delivery standard, a regime for third-party pre-trained models, and an expanded FAQ. This article keeps its original narrative; the canonical text lives in the repository.
Introduction
In our article devoted to the definitions of open source AI, part of the series “Open Source and AI,” we showed that Open Source licenses currently struggle to address the challenges of openness in AI. Open Data licenses such as the ODbL represent a “primary” avenue that can cover both the training database and potentially the AI model itself. As a complement, and to bring the approach closer to the free and open source software movement, we also developed the idea of an exception to Open Source licenses that, like an addendum, would extend the obligations relating to AI models integrated into the code.
This article materializes that idea by proposing a concrete clause: the AI Model Component Note. It is an interpretive note – filed, for SPDX tooling purposes, under the “exception” category – drafted in English to be directly usable with any existing copyleft license.
Setting the Context: Free Software with an Opaque Component
The initial premise, which justifies the approach, is that a free/open source software integrating AI is not truly free in the sense that the holders of a copy can only partially enjoy the expected freedoms.
The code is under GPL-3.0 or Apache-2.0, the dependencies are documented, the SBOM is clean. But the AI model used does not share its “recipe” – the weights, embeddings, calibration files – even though only this recipe makes it possible to understand the program’s behavior.
This situation is strictly analogous to those that motivated each major extension of copyleft in the history of free software licenses:
| Historical Problem | Response Mechanism | Analogy with AI |
|---|---|---|
| Tivoization (verifying the signature of embedded software so that only the manufacturer’s version runs on the hardware) | GPLv3, Section 6 (obligation to provide signing keys and installation information) | Distributing weights without the data or training scripts is distributing a binary without the source |
| ASP Loophole (using free software as SaaS without ever “distributing” the corresponding source) | AGPL-3.0, Section 13 (obligation to provide code access for network users) | Using a model via an API without ever providing the information needed for modification |
| Library Locking (integrating a free library into a proprietary application without providing the interface information needed to run a modified library) | LGPLv3, Section 4 (obligation to provide the “Corresponding Application Code” and interface information) | Integrating a free model into software without providing what is needed to modify or replace it |
The common denominator is the same: a part of the software is effectively rendered non-free due to the lack of information needed to fully modify it, even though the formal license is free.
The anti-tivoization clause, the Affero clause and the LGPL mechanism each extended existing obligations to those situations by resolving the ambiguities that technological evolution had created.
It is worth being precise about what these three precedents teach us – and what they do not. They are license modifications: the GPL text itself was rewritten to integrate the anti-tivoization section, AGPL is a separate license from GPL, and LGPL’s interface mechanism evolved through successive license rewrites. Our approach is deliberately different. It does not modify any license, adds no obligation, and removes no right. It belongs to another, equally established family of responses – what may be called interpretive clarification: the copyright holder publishes its reading of the existing text, without rewriting it.
Several well-known examples illustrate this mechanism:
- the Linux kernel
COPYINGsyscall note (Torvalds, 1992 onwards – SPDX-registered asLinux-syscall-note): the kernel’s copyright holders state that user-space programs invoking the kernel through normal system calls are not considered derivative works. A clarification of GPL-2.0’s scope that has structured industry behavior for thirty years without ever being adjudicated; - Larry Wall’s interpretive note in the Perl distribution README (verifiable from Perl 4.0, 1991): “my interpretation of the GNU General Public License is that no Perl script falls under the terms of the GPL unless you explicitly put said script under the terms of the GPL yourself”. Like the Linux note, never adjudicated; treated as authoritative by maintainers and counsel for thirty-five years; widely credited with making Perl’s third-party module ecosystem possible;
- the steward-level interpretive layer – the FSF GPL FAQ – the practical reference counsel and compliance teams work with –, Apache Software Foundation legal-discuss, Eclipse Foundation legal opinions, Mozilla MPL stewardship statements, OSI position papers – which clarifies contested boundaries of their respective licences without re-issuing them.
On the doctrinal distinction between interpretation (clarifying an imprecise or equivocal clause to fix the licence’s reach) and exception (derogation modifying the licence text), see B. Jean, L’évolution des licences libres et open source, HAL halshs-02077882.
The effect of such an interpretation is not certain in court – a judge could disagree with the licensor’s reading. But it has two strong practical effects: (1) it forecloses the most opportunistic readings a downstream distributor might invoke, since they cannot plausibly claim to have understood the license in a way the licensor expressly excluded; and (2) it shifts the argumentation burden – and the litigation risk – onto whoever wishes to argue the contrary, while voluntarily relying on the same license. In other words: departing from the licensor’s published interpretation remains legally defensible, but materially more costly than exploiting a silence.
The weights of an AI model integrated into software are functionally equivalent to object code. They determine the program’s behavior but cannot be modified without the information used to produce them (architecture, training scripts, data, hyperparameters). This analogy, discussed notably by Andrew Marble (“Copyright for AI Model Weights”, 2023), is shared by a growing part of the community: the weights are the “object code” of the model, and the data and training scripts are its “source code.”
Today, no court has yet ruled on whether the weights of a model trained on copyleft-licensed data constitute a “derivative work” within the meaning of copyright law. The question is complex and is the subject of ongoing academic work and litigation. Katzy et al. found about 16 million strong-copyleft licence declarations among the 171 million file-leading comments they examined across training datasets (arXiv:2403.15230) – the problem is posed acutely.
By adding to copyleft licenses an exception that interprets and removes all ambiguity regarding the necessity of sharing model information, the obligation arises from the license contract, not from a legal qualification that is still debated. This approach occupies a precise space:
- Less radical than the FSF’s position, which holds that a free machine learning application implies that the training code and training data must be free (see their official position). The FSF and the Software Freedom Conservancy maintain an ambitious stance (“FOSS in, FOSS out, FOSS throughout”) that requires complete openness of the entire chain. Factually, this approach is closer to the situation that would combine an ODbL license on training databases and models with a copyleft Open Source license with the exception.
- More conservative than the “Contextual Copyleft” proposed by Shanklin, Hine, Novelli et al. (July 2025), which extends copyleft from training data to the resulting models – but which potentially requires covering more than the software that is the subject of the license, which could be considered contrary to the OSD (which notably sets as a criterion that the license cannot “contaminate” other independent works).
- Complementary to the revival of copyleft-next by Bradley Kuhn and Richard Fontana (July 2025), which aims to design a next-generation copyleft license. The proposal of an exception that removes the ambiguity seems simpler to implement than waiting for such a license to be finalized (however interesting the project may be).
The Two Key Concepts to Be Specified
The “AI Model Component” refers to any part of the software whose behavior is determined, in whole or in part, by a machine learning model: trained parameters (weights, biases, embeddings), model configuration files, calibration data, quantization tables, or any other artifact whose modification is necessary to adapt the behavior of that part of the program.
It is part of the program. The fact that its behavior results from learned parameters rather than from explicitly programmed instructions does not alter its character as software or diminish the rights granted to recipients under the applicable license. It is therefore fully covered by the Open Source license associated with the software.
The second concept is that of “Model Corresponding Information,” which encompasses the complete set of information, data, and instructions reasonably necessary for a skilled person to understand, reproduce, modify, and reintegrate the component into the program – which directly refers to the preferred form for making modifications, extended to the specificities of AI.
This includes:
- (a) The complete specification of the model architecture (topology, layer configuration, activation functions) in a documented, machine-readable format;
- (b) The trained parameters (weights, biases) in a standard or documented format;
- (c) The complete source code of the training, fine-tuning, and evaluation procedures, including preprocessing pipelines;
- (d) The training configuration (hyperparameters, optimizer, random seeds where applicable);
- (e) A sufficiently detailed description of the training data – provenance, collection method, processing steps – to enable the assembly of a functionally equivalent dataset, or the data itself where it can be lawfully redistributed;
- (f) The instructions necessary to retrain, fine-tune, or replace the component and reintegrate it into the program.
Point (e) is intentionally flexible: when data cannot be redistributed (privacy, copyright, volume), a description sufficient to reconstitute an equivalent is enough. This is a pragmatic approach, comparable to the way the GPL requires “scripts used to control compilation and installation” rather than the compiler itself.
The Text of the Note
Published under a CC 0 license, this proposal is drafted in English as an interpretive clarification, graftable onto any copyleft license through a reference mechanism to the “Applicable License.” It does not depend on any specific license.
The canonical text (version 1.1-draft, June 2026) is maintained in the project repository; the key definitions are reproduced below.
AI Model Component Note
Interpretive clarification for software incorporating AI model components
The following interpretive clarification applies to the work to which it is attached (the “Program“) and is provided by the copyright holder(s) of the Program. This clarification supplements – and shall be read together with – the license under which the Program is distributed (the “Applicable License“). It does not add to or remove from the rights and obligations established by the Applicable License; rather, it makes explicit how certain existing obligations apply when the Program incorporates AI model components. To the extent that any provision of this clarification conflicts with the Applicable License, the Applicable License prevails.
0. Definitions
The “Program” means the work on which the Applicable License imposes the obligation to provide source code (or its preferred form for making modifications) – that is, the covered or licensed work as the Applicable License itself defines it. This clarification does not extend that obligation to any other work; it only makes explicit how the existing obligation applies to AI Model Components within that work.
An “AI Model Component” means any part of the Program whose behavior is determined, in whole or in part, by a machine learning model – including but not limited to trained parameters (weights, biases, embeddings), model configuration files, calibration data, quantization tables, or any other artifact whose modification is necessary to adapt, correct, or improve the behavior of that part of the Program. Whether an AI Model Component falls within this definition is determined by the role it plays in the Program’s behavior, and not by the technical form of its packaging or by its conceptual independence from the surrounding code. For the avoidance of doubt, an AI Model Component is part of the Program; the fact that its behavior results from learned parameters rather than from explicitly programmed instructions does not, by itself, take that Component outside the scope of the Applicable License when it is distributed as part of the Program. Sections 2(a), 2(b) and 2(c) below identify the configurations in which a model is not considered part of the Program.
The “Model Corresponding Information” for an AI Model Component means the set of information, data, and instructions sufficient for a skilled person to understand, retrain, fine-tune, replace, and reintegrate that Component into the Program – that is, the preferred form of the AI Model Component for making modifications to it. This includes, without limitation:
(a) the specification of the model architecture (topology, layer configuration, activation functions, normalization schemes) sufficient for a skilled person to understand and modify the Component, in a documented, machine-readable format;
(b) the trained parameters (weights, biases) in a standard or well-documented format;
(c) the complete source code of the training, fine-tuning, and evaluation procedures used to produce the distributed parameters, including data preprocessing and augmentation pipelines;
(d) the training configuration (hyperparameters, optimization settings, random seeds where deterministic reproduction is feasible);
(e) a sufficiently detailed description of the training data – including its provenance, collection method, and processing steps – enabling a skilled person to assemble a functionally equivalent dataset; the training data itself, where it can be lawfully redistributed; or, where the training data is publicly available or obtainable from third parties, a listing of that data and of where to obtain it, including, where applicable, for a fee; and
(f) the instructions necessary to retrain, fine-tune, or replace the AI Model Component and to reintegrate the modified version into the Program, such that the Program can be built and run with the modified Component in the same manner as with the original.
The Model Corresponding Information is sufficient when it enables a skilled person to retrain, fine-tune, or replace the AI Model Component and to reintegrate it into the Program. As for the preferred form of a work for making modifications, the reference is the form actually used: the Model Corresponding Information comprises the elements of points (a) to (f) as the licensor or distributor itself uses or holds them in order to modify the Component. It does not require acquiring elements that have never been in the distributor’s possession (Section 2 governs that case); it does require providing, without degradation, the forms actually used. The source code and the trained parameters of the Component, as distributed, are in all cases provided.
The Applicable License’s own definition of Corresponding Source continues to govern what is excluded from it (such as System Libraries or general-purpose development tools); this clarification adds inclusions, and does not remove those exclusions. The software and any training dataset are distinct creations subject to distinct rights; this clarification does not extend the Applicable License to a dataset or other work that is not itself part of the licensed Program. Whatever the form in which the information is provided, it must remain possible for a skilled person to produce a functionally equivalent model.
Sections 1 to 3 (scope declaration, carve-outs including third-party pre-trained models, delivery standard) are best read in the canonical text, together with the FAQ.
Observations on the Note
This proposed exception adopts the classic distinction in free software license law between “derivative work” and “mere aggregation.” A model integrated into the program (without which the program would not function as intended) is a component of the program and triggers the obligation. A model aggregated on the same distribution medium, or called via a public API, remains an independent work not covered by the exception. This is the transposition, for AI, of the boundary between linking and aggregation in software copyleft.
The legal exception (Section 2, final paragraph): it recognizes that in practice, certain elements of the corresponding information (personal data, databases protected by the sui generis right) cannot always be redistributed. It imposes a documentary obligation: provide everything that can be provided, and precisely document what is missing and why. This approach is inspired by the way the GDPR and Directive 2019/790 (TDM) coexist in practice: obligations overlap, and it is transparency about the limits that enables compliance. The same documentation regime now covers third-party pre-trained models (precise identification, documentation of the steps the distributor itself performed).
Section 3 sets a delivery standard: the Model Corresponding Information follows the same customary means and timing as the source code required by the host license; where that license contains a no-further-restrictions clause (GPL-3.0 §10, AGPL-3.0 §13, GPL-2.0 §6), contractual or technical barriers preventing recipients from using, studying, modifying, or redistributing that information are treated as further restrictions within the meaning of that clause. This preserves the anti-open-washing intent without exceeding what the host license tolerates for ordinary source delivery.
Next steps and project roadmap
This proposal is deliberately published as an open draft (currently v1.1-draft, integrating the first round of community feedback) – not because the work is unfinished, but because the conversation it opens is part of the method. The project follows three phases:
- Phase A – Seed and gather (current). Articulate a first credible proposal in the open, and use it to collect (1) substantive feedback – in particular from the FSFE Legal Network, SFLC/SFC, OSI and equivalent legal fora (the first feedback, from the FSFE Legal Network and the Free Software Foundation, has been received and integrated) – and (2) potential contributors and reviewers, both legal and maintainer-side.
- Phase B – Test on real projects. Accompany adoption on concrete projects, starting with Hermine-foss – the open source license-compliance tool we co-maintain, where the question of integrating an AI component is actively being discussed within the partner community (no model integrated at this stage). Other projects willing to pilot the clarification are welcome via the issue tracker.
- Phase C – Standardisation and broader uptake. Submit
AI-Model-Component-Noteto the SPDX License List, articulate the clarification with OSAID 1.0 and EU regulation (AI Act, CRA, NPLD), and broaden communication and outreach (FOSDEM, EOLE, OSS Summit, Free Software Legal & Licensing Workshop, foundations, academic publications).
A more detailed task-level roadmap is maintained in the project repository (ROADMAP.md).
Conclusion – Clarifying, Not Creating
This proposed “interpretive” exception is thus both an open source governance tool and a regulatory compliance tool. The two logics converge: ensuring that users have the information necessary to understand, modify, and audit the software they use.
It naturally complements the requirements of the three European regulations analyzed in our article on the CRA/AI Act/NPLD articulation:
- The AI Act (Art. 53.1) requires providers of GPAI models to make available “sufficiently detailed technical documentation” and information about training data. The “Model Corresponding Information” goes beyond this requirement. Article 53(2) exempts open-source GPAI models from points (a) and (b) precisely where the weights, the architecture and the usage information are made publicly available: the Model Corresponding Information documents exactly what this exemption presupposes;
- The CRA requires SBOMs and traceability. The exception complements the software SBOM with an inventory of model components;
- The NPLD extends product liability to software. Documenting the AI components of the software strengthens traceability in the event of a defect (and protects the maintainer who has fulfilled their transparency obligations).
This proposal for an AI Model Component Note does not create new obligations. It clarifies that the existing obligations of any copyleft Open Source license (providing the information necessary to exercise the freedoms over the software) naturally extend to AI model components. Where anti-tivoization and the Affero clause responded to technical evolution by rewriting the license, we propose here a response of the other family discussed above: a scope declaration by the copyright holders, which leaves the license text untouched and merely states what they consider “Corresponding Source” to mean when the program incorporates an AI model component.
Thus, if an AI model component determines the program’s behavior, then the information necessary to modify that component is part of the source code in the broad sense (and must be provided under the same conditions).
The exception is published under a CC 0 license. It is designed to be adopted, adapted, and discussed by the community. We invite maintainers, foundations, and legal experts to evaluate it, critique it, and help it evolve. It can be used in the same manner as other Open Source exceptions (LicenseName WITH Exception-Identifier; once the identifier is accepted on the SPDX License List: GPL-3.0-or-later WITH AI-Model-Component-Note; pending acceptance, see the interim SPDX expressions in the project’s HOW_TO_APPLY).
Illustration credit: Remix of @2006, Senor Codo, blotter_explosion. This file is licensed under the Creative Commons Attribution-Share Alike 2.0 Generic license.