Definition validation: Seeking volunteers
The process has entered a new phase: We are now seeking volunteers to validate the Open Source AI Definition, using it to review existing AI systems. The objective of the phase is to confirm that the Definition works as intended and understand where it fails.
A spreadsheet is given where you locate and link to the license, research paper, or other document that grants rights or provides information for each required component.
Systems include, but are not limited to:
Arctic
BLOOM
Falcon
Grok
Llama 2
Mistral
OLMo
OpenCV
Phi-2
Pythia
T5
To volunteer by May 20th, please contact Mer on the forum
Summary of comments received on the Definition draft
Grammatical and wording corrections
Some minor grammatical suggestions were made. These change and order the layout slightly differently, though the overall message remains.
One user suggested to explain what Open Source is under the “preamble” and “Why we need open source AI”. Instead of speaking about why Open Source is important, the section should rather be an introduction to what it is and why it matters for AI.
Under “Preferred form to make modifications to machine-learning systems” and “data information”, clarification is needed regarding “the training data set used”. It is not clear whether this means that all training data must be open source for the whole model to be.
Stefano Maffulli added here that the intention is to know what dataset was used, not to necessarily have it made available, and that it indeed seems to need clarification
Technical points
Under “Preferred form to make modifications to machine-learning systems” the release of checkpoints is mentioned as an example of required components, under “model parameters”. An objection was raised, arguing that this poses an unnecessary burden: It’d be like requiring that for software to be Open Source, it should include past versions of the program.
Maffulli reiterated that this was merely an example but that this might need to be a submission to the FAQ page
“Preferred form to make modifications to machine-learning systems” and “data information”, a “skilled person” is mentioned in the context of requiring sufficient information about the training data used to create a model. Question regarding why skill has to do with acquiring data
Clarification was given by Maffulli, pointing out that this is in the context of getting information about the data so that a “skilled person” can use, study, share and modify the AI system.
A user suggested that this confusion can be solved by changing the context of the wording “a skilled person can recreate”. From “using the same or similar data” to “if able to gain access to the same or similar data”.
A user points out that “skilled person” as a legal term used in patent law might not be appropriate as it has different legal connotations and precedence in different countries.
Discussion on why specifically we focus on machine learning (ML) as an AI system
A question was raised regarding why we explicitly mention ML systems under “preferred form to make modification to an ML system” and subsequently the “checklist”, pointing out that not all AI systems are ML.
Maffulli replied that we address ML as they need special and urgent attention as rule-based AI systems can fit under the open source definition. This needs to be addressed in the FAQ
Town hall announcement
The 9th town hall meeting was held on the 3d of May. Access the recording here if you missed it!
Source: opensource.org