This whole conversation seems to be borne of confusion of terms between model (the end product of machine learning, which does contain the biases inherent in the dataset it was trained with, in the form of mathematical coefficients) and training algorithm (the mathematical process defining exactly how a model is trained to data, e.g. how the model's coefficients change when it incorrectly or correctly classifies a piece of input data during training). The question of whether a model reflects bigotry is very easy to assess but the question of whether a training algorithm contains inherent bigotry is kind of philosophically heavy.
Edit: hold this thought I should actually read through the article first lmao
Edit2: all right yeah, as I figured the article fails to disambiguate that shit. Anyway, although it's a tougher question, it's definitely possible that training algorithms can effectively be biased, but I think to really assess that you'd have to do a lot of studies with them with EXTREMELY controlled datasets to see if the models they produce tend to reflect the kinds of biases the article is talking about even with datasets controlled for those common biases. Honestly is a solid premise for an entire thesis project.
Uncle hoe basically answered it, but I want to emphasize:
Most of these new algorithms use machine learning. That means they feed a machine data, and it produces thousands of totally random algorithms and selects the best one. Rinse and repeat.
The people creating the test keep feeding it data and computation until they like the results. The data IS the algorithm.
The algorithm can end up being human-unreadable, basically a black box.
But somewhere in that algorithm, in a completely incomprehensible chunk of code, it decides that its masters really love it when it fires people who buy shampoo for kinky hair.
How would an algorithm be sexist? That doesn't make any sense. Of course it's about what data it's given
deleted by creator
This whole conversation seems to be borne of confusion of terms between model (the end product of machine learning, which does contain the biases inherent in the dataset it was trained with, in the form of mathematical coefficients) and training algorithm (the mathematical process defining exactly how a model is trained to data, e.g. how the model's coefficients change when it incorrectly or correctly classifies a piece of input data during training). The question of whether a model reflects bigotry is very easy to assess but the question of whether a training algorithm contains inherent bigotry is kind of philosophically heavy.
Edit: hold this thought I should actually read through the article first lmao
Edit2: all right yeah, as I figured the article fails to disambiguate that shit. Anyway, although it's a tougher question, it's definitely possible that training algorithms can effectively be biased, but I think to really assess that you'd have to do a lot of studies with them with EXTREMELY controlled datasets to see if the models they produce tend to reflect the kinds of biases the article is talking about even with datasets controlled for those common biases. Honestly is a solid premise for an entire thesis project.
Uncle hoe basically answered it, but I want to emphasize:
Most of these new algorithms use machine learning. That means they feed a machine data, and it produces thousands of totally random algorithms and selects the best one. Rinse and repeat.
The people creating the test keep feeding it data and computation until they like the results. The data IS the algorithm.
The algorithm can end up being human-unreadable, basically a black box.
But somewhere in that algorithm, in a completely incomprehensible chunk of code, it decides that its masters really love it when it fires people who buy shampoo for kinky hair.