Source instead of Source Code

samj · November 3, 2024, 2:24pm

The OSD was originally written for the source code of a program, and while it does a good job on openness, this is one of the main reasons it has failed on completeness.

This is not a new issue with AI, but has been a growing problem for data-dependent applications for decade/s. For example, the release of the Quake game engine by id software is not a release of the Quake game if the data (models, textures, etc.) is not included. Similarly, a travel assistant app that does not include its database is little more than a database viewer, while one that ships with e.g. a copy of the WikiVoyage database could be considered Open Source.

This results in discussions (Welcome diverse approaches to training data within a unified Open Source AI Definition) that separate the source code from the source data (hence the term “data source”):

With that in mind, we need to be clear that an AI model is not purely software, it is the result of applying an algorithm (source code) to a specific training data set (source data).

We could improve the completeness of the OSD which is already strong on openness simply by removing the term “code” and covering “source” in all its forms.