The big AI models are running out of training data (and it turns out most of the training data was produced by fools and the intentionally obtuse), so this might mark the end of rapid model advancement

  • bazingabrain
    ·
    5 months ago

    I fail to see how synthetic data is good if it makes AI used to justify job cuts, "better".

    • Amerikan Pharaoh@lemmygrad.ml
      ·
      edit-2
      5 months ago

      That's where I'm at. Sure, there might be moderately-beneficial use-cases, maybe; but it doesn't change the fact that there's no such thing as an ethically-trained model, and there's still no such thing as a model that wasn't created based on rampant theft by capitalists, so I consider anything that comes of it fruit of the poison tree.

      AI bad until the base that comprises it radically changes, across the board.

      • lurkerlady [she/her]
        ·
        edit-2
        5 months ago

        Sure, there might be moderately-beneficial use-cases, maybe; but it doesn't change the fact that there's no such thing as an ethically-trained model, and there's still no such thing as a model that wasn't created based on rampant theft by capitalists, so I consider anything that comes of it fruit of the poison tree.

        I mean thats just the case with everything really. Theres a lot of very good use cases that are mostly to do with data manipulation, but the coolest ones are translating. I think we're approaching a point where small models are providing very accurate translations and are even translating tone and intent properly, which is far superior to simple dictionary translation methods. I think its very possible that new phones could be outfitted with tensor cores and you could have a real-time universal translator in your hand, though it'll likely only add 'subtitles' irl for you. AI voice-word recognition has also been very good and can be miniaturized. This is the use case I'm most excited for, personally, as a communist. Currently translating in a foreign country requires a lot of typing (if you dont have a perfect grasp of language) and it removes a very human element I feel to conversation. If everyone could locally run a subtitle-translation generation app it'd be amazing for all of humanity.

        Theres of course plenty of manufacturing use cases as well, but China is spearheading on that, though there is some work being done in the US as well in the few industries that remain.

      • bazingabrain
        ·
        5 months ago

        AI bad until the base that comprises it radically changes, across the board.

        which wont happen, hence why me and 650k others moved to cara and gave meta the finger.

    • lurkerlady [she/her]
      ·
      edit-2
      5 months ago

      Synthetic data is basically a fancy way of saying 'I'm properly formatting data and reinforcing the ai's good outputs'. Rearranging words, fixing / adding tags, that sort of thing. This is generated with various tools that usually have an LLM or VLM plugged in, though some are as simple as a regex script.