I think this is a very nice amendment and covers the intended cases (including open source AI) well.
However, even though as programmers we permanently function in DRY mode, I think the message would be more clear if we would repeat the ‘2. Source Code’ in a data variant (‘2b. Data’) rather than forcing ourselves to read the entire ‘2. Source Code’ with the insertion in the introduction in the back of our mind and imagining what it would mean for data.
I made a draft below - feel free to shoot at it (or disagree with the less pretty, less clever approach of ‘duplicating’ content specifically focusing on the data).
2b. Data
In cases where software relies on data — including databases, models, or media — for its creation, modification, or operation, the program needs to include the data, and must allow distribution in the original form of the data in which the program relies on it (‘source data’) as well as distribution in derived forms. Where some form of a product is not distributed with source data, there must be a well-publicized means of obtaining the source data for no more than a reasonable reproduction cost, preferably downloading from the Internet without charge. The source data must be the preferred form in which a programmer would modify the data the program relies on. Deliberately obfuscated source data is not allowed.