Why OSAID RC1 takes oxygen out of the room & expands the attack surface for open-washing

On the now-censored OSI forums, user quaid proposed a useful minimal consensus, namely to leave out the two murkiest of data types and keep only full access and shared access. OSI’s Maffuli commented,

This would relegate Open Source AI to a niche, and it would be a tactical mistake, as explained on How we passed the AI conundrums – Open Source Initiative.

Here’s a post I wrote in response to that which for murky reasons never made it out of moderation.

I want to move the conversation forward by contributing a different viewpoint. Reducing the data classes to the clearly defined Open Data and Open Access types, as @quaid proposes in his constructive contribution, would create a clear goal to aspire to: a part of the larger possibility space that is already inhabited by a few systems working to the highest standards (e.g. AI2’s OLMO), and that provides an aspirational goal for others to work towards. This would make the OSAID similar to the early days of GNU GPL (the archetypal example in the FOSS domain), which has been noted by legal experts to have had a positive effect. A definition that points to this goal is clearly in the spirit of FOSS, defends the four freedoms, and will create incentives to maximise transparency and minimise open-washing.

RC1 takes the oxygen out of the room for the most open players
Consider the alternative: allowing four data types, two of which are murky and not actually open, leads to a situation where it is likely to be much easier to comply by choosing less transparent forms of release, and where the most open players are not rewarded. As @samj noted elsewhere, water seeks its own level. For instance, what is to stop a corporation from simply bundling all training data into a “new” named dataset that happens to be available only from a third party for a fee (explicitly allowed by RC1). They would then comply with OSAID simply by listing that “new”, now-unreachable and unauditable dataset as the source.

The effect is that OSAID will actually take the oxygen out of the room for model providers (like AI2) that strive for true openness and show it can actually be done. Under the direction of RC1, OSI-defined Open Source AI would not be a badge of honour, but simply a box to tick off by model providers wanting to appear as open.

How can this be remedied? I think the proposal in OP, to leave out the murkiest categories of data, gets us an important part of the way.

RC1 expands the attack surface for open-washing
I am not convinced at all by Stefano’s repeated declarations that this would be tactical mistake. Tactical mistake for whom, and for which purposes? Right now, the tactical mistake that looms largest is that RC1 will expand the attack surface for open-washing.

It does so by:

  • asking for “a sufficiently detailed summary” that is not itself described in sufficient detail
  • asking for “data information” rather than data sharing,
  • allowing unshareable data
  • all of these moreover hedged by the ill-specified notion of “OSI approved terms” (which has been questioned by many)

An open source AI definition that has this many loopholes constitutes an open invitation to take open-washing to the max.

1 Like

Hi Mark and welcome!

As you might guess, I can understand your frustration with the OSI censorship, as I was silenced and censored myself several times in these months.

I’ve read your post and pretty much agree with it (you might remember that I even did a similar proposal before), but what about a much simpler change to the OSD?

1 Like