FSF is working on freedom in machine learning applications

https://www.fsf.org/news/fsf-is-working-on-freedom-in-machine-learning-applications

BOSTON (October 22, 2024) – The Free Software Foundation (FSF) has announced today that it is working on a statement of criteria for free machine learning applications, which will require the software, as well as the raw training data and associated scripts, to grant users the four freedoms.

Machine learning (ML) applications raise the issue of whether they respect users’ software freedom. The Free Software Foundation (FSF) is preparing a statement of criteria to determine when a machine learning application is free (as in freedom). The statement is being prepared by a working group consisting of FSF’s board members, staff, and management, and they have consulted various external experts.

Freedom challenges in ML beyond software

Machine learning applications are only partially software. Each one includes software, plus data that is the result of training. Such data can be referred to generally as “model parameters.” In the case of neural network based applications, these are known as the model weights. Model parameters are both outputs of and inputs of the software of a machine learning system, and they influence or control the system’s responses.

Model parameters are made by running training software over training data, but they are not necessarily the result of translating anything written by a human author. So, training data is not the “source code” of model parameters in the usual sense. The model parameters are not comprehensible as such by humans, so it is not practical to study or adapt an ML application by analyzing or editing model parameters directly. Also, the computation of model parameters often requires processing large numbers of examples gathered in the training data. The influence of any one example on the control data thus generated can be subtle and indirect. So, in practice, studying and adapting an ML application is usually done, for example, by running it over different sets of prompt data, analyzing training data, and incrementally training or retraining the model from scratch.

The FSF’s conclusion is focused mainly on what must be distributed to users of an ML application so that they are able to control their own computing. Such an ML application could be called a free (or libre) machine learning application.

Close to a conclusion

After several conversations about the responsibility of the FSF in this discussion, serious work to come to a unanimous conclusion started in May of this year. That work has now concluded, and the working group is currently working to draft the exact text that will form the definition of a free machine learning application.

All software included in a free ML application has to offer every user the four freedoms that define free software. This applies to both the software that processes training data, and the software that interprets model parameters as context for prompts to produce human-usable output. This is necessary but not sufficient. Additionally, given our current understanding of ML applications, we believe that we cannot say a ML application “is free” unless all its training data and the related scripts for processing it respect all users, following the four freedoms. In addition, granting users the four freedoms may translate into a demand that the ML application’s release includes the model parameters that represent its training, and that users are permitted to use and redistribute the parameters and modified versions of them.

ML applications that do not offer the four freedoms to all users are, by definition, nonfree, even if their software components are free.

Freedom may not equal justice

FSF considers all nonfree software to be unjust to its users because it denies them the freedom to control their own computing. A further question is whether all nonfree ML applications are ethically unjust. It may be that some nonfree ML have valid moral reasons for not releasing training data, such as personal medical data. In that case, we would describe the application as a whole as nonfree. But using it could be ethically excusable if it helps you do a specialized job that is vital for society, such as diagnosing disease or injury. For the FSF to consider usage of such a nonfree ML application to be just, its component software must be free, and the ML application as a whole would have to be distributed to users in a form and manner that reasonably and flexibly supports incremental training, or retraining differently from scratch, or both.

FSF will continue to deepen the discussion on these topics during the drafting process. If you are interested in sharing your thoughts, please email, associate members can join the FSF member forum. To further support this work, join the FSF or donate.

About the Free Software Foundation

The FSF was founded in 1985 to promote and protect computer users’ right to use, study, copy, modify, and redistribute computer programs. Over the years it has addressed new challenges, resulting for example in releasing new versions of FSF’s GNU family of licenses. The current rapid development and public interest in machine learning applications is another opportunity for the FSF to explore a moral and ethical question, clarifying what it takes for users to be able to control their own computing when using these applications.

Donations to support the FSF’s work can be made at https://donate.fsf.org.

More information about the FSF, as well as important information for journalists and publishers, is at https://www.fsf.org/press.

Media Contacts

Zoë Kooyman
Executive Director
[email protected]
+1 (617) 542-5942

This work is licensed under a Creative Commons Attribution-No Derivative Works 3.0 license (or later version)

2 Likes

This news has been posted on OSI’s board too.

Carlo Piana, one of the OSI board member, replied.

I replied to Carlo, but I was censored.
(that’s why I’m linking to a snapshots on archive.is)

Here was my argument: