That's really interesting, simultaneously more ingenious and less impressive technically than I was imagining. I'm sure insisting on sanitized data sets would make it extremely limited, but sometime I think if that's the trade off maybe just don't make the thing. Thanks for effort posting, even though you did ruin my joke.
Of course the real problem IMO is not that some terminally online nazi manipulated it like this, but that it will uncritically regurgitate whatever the most orthodox (for the internet) opinion on any subject, with no context and the veneer of impartiality.
So far, the bigger the training data set the better, and that's one of the biggest things that determines how well the model works. That's not an absolute - I'm sure that removing all the racist shit from the internet corpus would make it better overall. But the problem is how to get "the internet minus racist shit" instead of "this much smaller dataset that we've manually screened all the racist shit out of." You could make an AI do it, but where are you going to get a non-racist AI to go do that for you?
simultaneously more ingenious and less impressive technically than I was imagining
If you really want to dig into it, Andrej Karpathy did a video on how to make a GPT from scratch. It's less than 1000 lines of Python code, maybe 200 of which is actually the machine learning model (the rest being training, stuff to make using and training it easier, etc). The full-sized GPT3 is still the same model, just with bigger numbers and even more scaffolding stuff (to get it to run nicely on computing clusters, etc).
In terms of technical background needed: Understanding matrix multiplication is really important. At least a vague idea of how computer programming works, but it's short and Python and mostly math, so you could puzzle it out. Karpathy's video also treats it as a given that you can just optimize a matrix, which is possible because there's an automatic differentiator built into this, which lets you just move vaguely towards some min/max (called gradient descent now because they want to sound fancy; back in my day they called it hill climbing).
That's really interesting, simultaneously more ingenious and less impressive technically than I was imagining. I'm sure insisting on sanitized data sets would make it extremely limited, but sometime I think if that's the trade off maybe just don't make the thing. Thanks for effort posting, even though you did ruin my joke.
Of course the real problem IMO is not that some terminally online nazi manipulated it like this, but that it will uncritically regurgitate whatever the most orthodox (for the internet) opinion on any subject, with no context and the veneer of impartiality.
So far, the bigger the training data set the better, and that's one of the biggest things that determines how well the model works. That's not an absolute - I'm sure that removing all the racist shit from the internet corpus would make it better overall. But the problem is how to get "the internet minus racist shit" instead of "this much smaller dataset that we've manually screened all the racist shit out of." You could make an AI do it, but where are you going to get a non-racist AI to go do that for you?
If you really want to dig into it, Andrej Karpathy did a video on how to make a GPT from scratch. It's less than 1000 lines of Python code, maybe 200 of which is actually the machine learning model (the rest being training, stuff to make using and training it easier, etc). The full-sized GPT3 is still the same model, just with bigger numbers and even more scaffolding stuff (to get it to run nicely on computing clusters, etc).
In terms of technical background needed: Understanding matrix multiplication is really important. At least a vague idea of how computer programming works, but it's short and Python and mostly math, so you could puzzle it out. Karpathy's video also treats it as a given that you can just optimize a matrix, which is possible because there's an automatic differentiator built into this, which lets you just move vaguely towards some min/max (called gradient descent now because they want to sound fancy; back in my day they called it hill climbing).