• 0 Posts
  • 8 Comments
Joined 6 months ago
cake
Cake day: January 14th, 2024

help-circle



  • Its not any different than how it already was. Initially the GenAI models were all being trained on masses of unlicensed data including data from reddit. The problem is some companies like New York Times are suing for training an LLM off of their data. So in response companies like OpenAI are now trying to reach partnerships that basically license the use of the data (that they already had). This also means that they will be able to continue to have future access to that data as long as the partnership is in place. Whereas some companies without a partnership could start to ban scraping activity or update their terms to forbid training AI off of their data.

    Overall these partnerships are a good thing. Licensed training data is good. But from a privacy standpoint, the AI models were already trained on reddit data. This is just formalizing the relationship



  • Idk about self study workbooks, but i used cs50p.

    https://cs50.harvard.edu/python/2022/ Free lectures from harvard's David Malan, there are problem sets to work through which let you test your answers, and a free certificate at the end if you care for that.

    I know not quite what you asked for, but thought id share in case it sounds interesting. I found the lectures very engaging and the problem sets were great way to practice what is being taught