The Free Software Definition talks about the “preferred form” for modification being source code:
Source code is defined as the preferred form of the program for making changes in.
This seems obvious but it needed to be explicit to prevent, among other things, vendors coding in a high-level language like C and then shipping “Open Source” in a less accessible intermediary form like assembly.
Unfortunately, it’s recently proven subjective, with the OSI itself using it as a loophole big enough to drive the OSAID through. First, they declared “nobody had a clear answer to what is the preferred form to modify an AI system [and] offered to find one with the communities involved in a co-design process”, not withstanding OSD author Bruce Perens’ public exhortations that the “training data IS the source code” for AI.
From there they defined four categories of data that “should be required”, including “unshareable non-public training data”, and that “all these classes of data can be part of the preferred form of making modifications to the AI system”. The idea that something you cannot obtain could possibly be the “preferred form” for modifications is patently absurd, but that’s what was ultimately published over objections.
The Free Software Definition, in the very next sentence, goes on to clarify that the source code is what the “developer [actually] changes to develop the program”:
Thus, whatever form a developer changes to develop the program is the source code of that developer’s version.
This objective terminology could be easily verified simply by looking over the shoulder of a programmer/practitioner as they use, study, and/or modify the system, and indeed for AI systems the result would certainly include training data. Note that it is important this cover any activity the developer may want to do (e.g., re-architecting), not a subset of activities defined by the observer (e.g., fine-tuning).