I’m struggling a bit with what data is.
Say that a statistics bureau publishes a dataset consisting of summary statistics (means, max, mins, medians, etc) of data from personal data on citizens, where the summary statistics is computed in such a way that the privacy of individuals cannot be compromised. Then, it seems pretty clear that if everything else in place, licenses, etc, then the resulting dataset would be open data.
We could take that a slight step further, where least squares has been applied to data and resulted in a slope and an intercept. Then, we have a linear model. Both the summary statistics and the linear model is a work, possibly a derived work, but a work nonetheless. And it would be data.
Then, we take a big jump to a deep neural network model. Conceptually, it has much in common with the very simple linear model, but I’m looking for exactly what sets it apart.
As long as it is just math, then it isn’t terribly interesting, because, to paraphrase Bruce Schneier way out of context, math doesn’t have agency, code has agency.
Somehow, there’s something with the DNN that gives it agency, to the extent that it can’t be trusted unless we have access to the training data, because we know that it will have adverse consequences.
This has been bothering me for some time, because of a feeling that there’s something I don’t understand here that seems important, and wasn’t answered in the OSAID process (which indeed left many questions open).
Is there something around the agency that is somehow given to this math that would need to be explicitly addressed in the OSD?