ChatGPT is NOT A SEARCH ENGINE.

stinky [any] · 2 years ago

ChatGPT is NOT A SEARCH ENGINE.

shipwreck [comrade/them] · edit-2 2 years ago

deleted by creator

ClimateChangeAnxiety [he/him, they/them] · edit-2 2 years ago

It’s a very complicated version of the autofill text options above your iPhone keyboard.

RION [she/her] · 2 years ago

Citing predictive suggestions for my new piece, tentatively titled "The Last of us is the best use of the time of the year and I have a lot of work to do with the kids and other people"

corgiwithalaptop [any, love/loves] · 2 years ago

Well, I am not going?? I can not get the money from my account to the space to be a bank account, and then give me the credit for it

RION [she/her] · 2 years ago

Many people are saying this

corgiwithalaptop [any, love/loves] · 2 years ago

Many such cases!

FunkyStuff [he/him] · 2 years ago

I see a lot of people saying "it doesn't think" and while that's obviously true, I think it's important to expand on it a little. Not only is it the case that these models don't think in the classical sense of having sentience, they literally don't have any associations of meaning to the words. They don't put words together to express any semblance of meaning or some fact or anything like that. They are simply statistical engines that can link words together by their likelihood of appearing in some proximity of one another in their training data. It's not quite that simple because it doesn't see words the way we see words, it sees tokens which are transformed versions of the words that do carry some level of "meaning" but nowhere near the complexity of meaning in human speech. But there's a reason you can get Chat GPT to contradict itself very easily, the reason being that it doesn't assign meaning to what it was saying in the first place.

shipwreck [comrade/them] · edit-2 2 years ago

deleted by creator

Coca_Cola_but_Commie [he/him] · 2 years ago

What is the use case for something like this? If it doesn’t know anything then surely it can’t be used to communicate anything to another person in any meaningful way? Education, fact-checking, journalism, narrative, that’s all out the door. What, is it only useful for generating better sounding scam emails? Easily creating SEO content? Making fake social media posts? That’s all online bullshit anyway.

Is Hallmark going to fire their writers and get AI to write their greeting cards? Put a few marketers out of business because suits don’t give a shit about well written copy? If those are the limits of the technology I’m thoroughly unimpressed.

shipwreck [comrade/them] · edit-2 2 years ago

deleted by creator

CriticalOtaku [he/him] · edit-2 2 years ago

What bosses want: replace customer service with an AI chatbot

What it can actually do right now: replace automated "please press 1 to talk to customer service" phone recordings

BabaIsPissed [he/him] · 2 years ago

As much as I like to dunk on "journalists" and think everyone should know better by now since ChatGPT been around for months, the blame for behaviour like this is primarily on Microsoft/OpenAI for pushing it as a search engine/research assistant. They know better. They know it's a stochastic parrot.

stinky [any] · 2 years ago

:100-com:

supermangoman [he/him, they/them] · 2 years ago

From my current understanding, I'm not sure referring to GPT models as "stochastic parrots" is accurate. There is evidence the LLM builds internal "world models," even if it emerges through probabilistic mechanisms: https://thegradient.pub/othello/

BabaIsPissed [he/him] · edit-2 2 years ago

Let me preface this by saying I'm stupid, I can't even do my own research work well, let alone comment on cutting edge stuff with any degree of confidence:

I don't think this is incompatible with the concept of the stochastic parrot. Like, by the time "On the dangers of stochastic parrots" came out, it was already known that language models have rich representations of language structure:

(from the stochastic parrot paper):There are interesting linguistic questions to ask about what exactly BERT, GPT-3 and their kin are learning about linguistic structure from the unsupervised language modeling task, as studied in the emerging field of ‘BERTology’ [...]

So I don't think we can take this, or the probing/interpretability work later in the paper as a refutation of LLMs as stochastic parrots, because it was never about memorization:

(From the ICRL paper the blog post is based on): A potential explanation for these results may be that Othello-GPT is simply memorizing all possible transcripts. To test for this possibility, we created a skewed dataset of 20 million games to replace the training set of synthetic dataset[...] Othello-GPT trained on the skewed dataset still yields an error rate of 0.02%. Since Othello-GPT has seen none of these test sequences before, pure sequence memorization cannot explain its performance.

the concept is useful primarily as a way of delimiting how far this "understanding" really goes:

(stochastic parrot paper again) If a large LM, endowed with hundreds of billions of parameters and trained on a very large dataset, can manipulate linguistic form well enough to cheat its way through tests meant to require language understanding, have we learned anything of value about how to build machine language understanding or have we been led down the garden path?

the metaphor of the crow is kind of apt, I think. Like an LLM, it is working only with form, not meaning:

(blog post) At this point, it seems fair to conclude the crow is relying on more than surface statistics. It evidently has formed a model of the game it has been hearing about, one that humans can understand and even use to steer the crow's behavior. Of course, there's a lot the crow may be missing: what makes a good move, what it means to play a game, that winning makes you happy, that you once made bad moves on purpose to cheer up your friend, and so on.

(stochastic parrot paper) Furthermore, as Bender and Koller argue from a theoretical perspective, languages are systems of signs, i.e. pairings of form and meaning. But the training data for LMs is only form; they do not have access to meaning. Therefore, claims about model abilities must be carefully characterized.

Someone smarter please feel free to correct/dunk on me.

dat_math [they/them] · 2 years ago

): A potential explanation for these results may be that Othello-GPT is simply memorizing all possible transcripts. To test for this possibility, we created a skewed dataset of 20 million games to replace the training set of synthetic dataset[…] Othello-GPT trained on the skewed dataset still yields an error rate of 0.02%. Since Othello-GPT has seen none of these test sequences before, pure sequence memorization cannot explain its performance.

I'm going to maybe dunk on the authors of that ICLR paper (even giving them the benfit of the doubt, they should really know better if they got into ICLR). You can't conclude a lack of memorization from observation that the model in question maintains training accuracy on the holdout set. I really hope they meant to say that they tested on the skewed dataset and saw that the model maintained performance (without seeing any of the skewed data in training). However, if they simply repeated the training step on the skewed data and saw the same performance, all we know is that the model might have memorized the new training set.

I also agree with your conclusions about the scant interpretability results not necessarily refuting the mere stochastic parrot hypotheses.

BabaIsPissed [he/him] · 2 years ago

really hope they meant to say that they tested on the skewed dataset and saw that the model maintained performance (without seeing any of the skewed data in training).

Yeah, that's it, I should have provided the full quote, but thought it would make no sense without context so abbreviated it. They generate synthetic training and test data separately, and for the training dataset those games could not start with C5.

A potential explanation for these results may be that Othello-GPT is simply memorizing all possible transcripts. To test for this possibility, we created a skewed dataset of 20 million games to replace the training set of synthetic dataset. At the beginning of every game, there are four possible opening moves: C5, D6, E3 and F4. This means the lowest layer of the game tree (first move) has four nodes (the four possible opening moves). For our skewed dataset, we truncate one of these nodes (C5), which is equivalent to removing a quarter of the whole game tree. Othello-GPT trained on the skewed dataset still yields an error rate of 0.02%. Since Othello-GPT has seen none of these test sequences before, pure sequence memorization cannot explain its performance

I don't know much about Othello so don't really know if this is a good way of doing it or not. In chess it wouldn't make much sense, but in this game's case the initial move is always relevant for the task of "making a legal move" I think(?) It does seem to make sense for what they want to prove:

Note that even truncated the game tree may include some board states in the test dataset, since different move sequences can lead to the same board state. However, our goal is to prevent memorization of input data; the network only sees moves, and never sees board state directly.

Anyway, I don't think it's that weird that the model has an internal representation of the current board state, as that is directly useful for the autoregressive task. Same thing for GPT picking up on syntax, semantic content etc. Still a neat thing to research, but this kind of behavior falls within the stochastic parrot purview in the original paper as I understand it.

The term amounts to: "Hey parrot, I know you picked up on grammar and can keep the conversation on topic, but I also know what you say means nothing because there's really not a lot going on in that little tiny head of yours" :floppy-parrot:

dat_math [they/them] · 2 years ago

Actually, the more I think about the experiment and their conclusions, the worse it gets. They synthesized the skewed dataset by sampling from a distribution that they assumed for both the synthetic training and synthetic testing set, so in a way, they've deliberately engineered the result.

dat_math [they/them] · 2 years ago

They generate synthetic training and test data separately, and for the training dataset those games could not start with C5.

hmmm still feels like not as strong a result as the authors want us to read it as. I'd be much more impressed if they trained on the original training set and observed that the model maintains the performance observed on the original test set when tested on the skewed test set, but I bet they didn't find that result

Anyway, I don’t think it’s that weird that the model has an internal representation of the current board state, as that is directly useful for the autoregressive task. Same thing for GPT picking up on syntax, semantic content etc. Still a neat thing to research, but this kind of behavior falls within the stochastic parrot purview in the original paper as I understand it.

Totally agree. At least with biological parrots, they learn in the physical world and maybe have some chance of associating the sounds with some semantically relevant physical processes.

Transformers trained after 2022 can't cook, all they know is match queries to values, maximize likelihoods by following error gradients, eat hot chip and lie

GonzoBonzo [none/use name] · edit-2 2 years ago

deleted by creator

supermangoman [he/him, they/them] · edit-2 2 years ago

Showing a correlation between board states and internal representations on in-distribution data doesn’t disprove that it’s surface statistics at all.

The article doesn't make such a bold claim. It presents its goal as "exploring" the question, so not sure why the redditor started off with that.

If they had done an experiment where they change the distribution of the data, like a larger board, or restrict training to a part of the board, and it still works, that would show something.

Why?

They then “intervene” on all the later layers to get the result they wanted, proving that “intervening” on a single intermediate representation isn’t enough. It seems like this is the core claim by the post - that the author "led the witness" so to speak to get the desired outcome. Despite using the word "graident," the redditor does not really explain this, but could be true. It's definitely worth going through the appendices.

Appreciate the (semi-anonymous?) critique regardless.

GonzoBonzo [none/use name] · edit-2 2 years ago

deleted by creator

supermangoman [he/him, they/them] · 2 years ago

Doesn't look it's the same guy. This is the Kenneth Li that wrote the article: https://twitter.com/ke_li_2021

CanYouFeelItMrKrabs [any, he/him] · edit-2 2 years ago

Microsoft incorporated GPT into Bing and that version does use the internet and provides sources for the websites it access. I think the dunk goes to the journalist here

HamManBad [he/him] · 2 years ago

Even worse, it DOES cite sources. And they all sound believable. But half of them are completely made up

Fuckass · edit-2 1 year ago

deleted by creator

solaranus · edit-2 1 year ago

deleted by creator

Fuckass · edit-2 1 year ago

deleted by creator

Mardoniush [she/her] · 2 years ago

Alan Wake ass text generator.

TheBeatles [any] · 2 years ago

the real threat to society is not machine intelligence but human credulousness

it sucks because ML tech is genuinely fascinating and useful for many things but these tech companies marketing it as "AI" is incredibly stupid and dangerous

corgiwithalaptop [any, love/loves] · 2 years ago

Machine Learning tech :kombucha-disgust:

Marxist Leninist tech :sicko-pog:

nat_turner_overdrive [he/him] · 2 years ago

I have already dealt with dumb shit customers who asked a chatbot how to do something and broke their shit instead of just reading our documentation, numerous hexbear users commented about using chatgpt for similar purposes

There will be no avoiding incredibly stupid shit generated by some asshole chatbot and asshole people following it unquestionably

abc [he/him, comrade/them] · edit-2 2 years ago

I have already dealt with dumb shit customers who asked a chatbot how to do something and broke their shit instead of just reading our documentation, numerous hexbear users commented about using chatgpt for similar purposes

:yea: and my manager keeps asking why I get so annoyed anytime we mention 'AI' now.

I've handled like 3-4 tickets like this from high-volume clients and it is so funny because each time I'm like 'where did you find this support article that said to do it like this??' and they go :shocked-pikachu: 'I used ChatGPT'

nat_turner_overdrive [he/him] · 2 years ago

oh wild did the chatbot that only has outdated public information make a completely wrong guess and make up information that "seems" correct? crazy, it's almost like you should look at up-to-date documentation or open a ticket to get valid information

somebody in a past chatbot thread said they were using it for dating advice, and...

:yea:

Lovely_sombrero [he/him] · 2 years ago

It is not about being outdated, it will just use random stuff from the internet that has some required keywords in it. And it will sound very confident. And as more people publish its stupid answers to make fun of it (or because they believe the answer is correct), that wrong text will be used as part of its source material for the next iteration of ChatGPT.

nat_turner_overdrive [he/him] · 2 years ago

That is a good point. The important bit of the shit these doofuses used is not public, so the bot just guessed based on other companies and it was wrong and bad.

The only time using a chatbot makes any sense is if you are going to vet the output and you know enough to spot incorrect shit. Nine tenths of users will not vet or know anything about the subject it's outputting on.

abc [he/him, comrade/them] · 2 years ago

Yeah, they enabled ChatGPT for answering tickets on our end too - which I thought was fucking stupid because it isn't trained on anything relevant to our platform, so it just spits out fake ass instructions for things that almost sound right like "of course you can do X thing on our platform, here are the steps to do so:" but X is actually something explicitly forbidden.

"Oh so we're implementing ChatGPT for saving time answering tickets? Cool! Is it trained on the 50,000 tickets/cases from Salesforce we've accumulated in the past 5 years since we switched CRMs?" No. "Oh, so what's the fucking point then? It won't get anything right and I certainly am never going to trust it enough to answer anything for me" Well, we hoped it would help the team with getting tickets solved.......

Like I can almost forgive the customers who ask ChatGPT how to do X or Y on our platform & then schedule a phone call with me where I have to explain that our company has nothing to do with ChatGPT and if you wanted to do X or Y, you should've looked in the support center which has relevant articles/information about doing X and Y. But when we're training new people and actively have to tell them "yeah don't trust the ChatGPT thing that spits out a response for every new ticket, it has never been correct"?? lol

nat_turner_overdrive [he/him] · 2 years ago

I am increasingly convinced that the description I have seen here before about ChatGPT is correct - MBAs trained a chatbot to talk like an MBA and they assume that means it's intelligent rather than that they are not

GreenTeaRedFlag [any] · 2 years ago

It was deemed ableist to suggest computers prompts should not be oart of dating by some on this site

nat_turner_overdrive [he/him] · 2 years ago

:internet-delenda-est:

Bloobish [comrade/them] · 2 years ago

IBM: But AI will change the world and so we are firing our workforce!!!!

ChatGPT: Lol lmao I make shit up and don't worry about consequences cuss I'm a program made of bleep bloops

nat_turner_overdrive [he/him] · 2 years ago

me, sweating: cut the red wire

some dipshit, listening to music I don't understand, ignoring me and asking a chatbot: cuts the green wire

:this-is-fine:

GambeauxThrownes · edit-2 1 year ago

deleted by creator

FourteenEyes [he/him] · 2 years ago

Most of this "this is so scary" stuff boils down to "absolute fucking moron uses chatbot with extra bells and whistles, believes it to be God"

axont [she/her, comrade/them] · 2 years ago

All these tech journalists falling to their knees in praise of machine gods are literally more credulous than medieval peasants praying to a solar eclipse

CriticalOtaku [he/him] · 2 years ago

Me screaming "Warhammer 40K was supposed to be satire!" as the tech journalist start chanting hymns to appease the Machine Spirit.

VILenin [he/him] · 2 years ago

Love how the definition of “intelligence” has devolved into a glorified calculator stringing words together

robot_dog_with_gun [they/them] · 2 years ago

that's education in america for you

kristina [she/her] · 2 years ago

:stalin-stressed: PEOPLE ARE SO STUPID

RION [she/her] · 2 years ago

Throw back to that post about Black Wall Street where they were taking ChatGPT's responses as fact and wanted to learn more about what it told them

They are amongus

(Sorry to the OP but come onnnnn)

kristina [she/her] · 2 years ago

chatgpt is just like the line. put all your faith in it like it is god

GarfieldYaoi [he/him] · edit-2 2 years ago

I wouldn't be surprised that's the point of chatGPT, tell people what they want to hear and further make information be what is essentially a dietary choice. It won't be about truth anymore, but moreso popularity (not that it really ever was about truth).

iridaniotter [she/her] · 2 years ago

People using ChatGPT be like

Source: it occurred to me in a dream :galaxy-brain:

ssjmarx [he/him] · edit-2 2 years ago

So I've been writing with Novel AI pretty much every day for the past week, and while this particular problem doesn't exist since I'm generating fiction, by the time I'm done with literally anything I go back over it and no hyperbole 99% of the content ends up just being my own words anyway. I really got to say that if you have any kind of AI output in front of you and you're okay with sending that repetitive shit as your draft the problem exists within you.

corgiwithalaptop [any, love/loves] · 2 years ago

"President Biden, is it true that you have been quoted as saying 'uphold Mao Zedong thought, death to the imperial core?'"

"Jack, i don't even know where I am whats a Mao?"

sjonkonnerie [any, they/them] · 2 years ago

This is actually my real concern with AI language models. I'm not worried they'll become sentient and take over all our jobs and the world.

I'm more concerned people will think AI language models can do things they actually can't do, and then try to use them for things they're not suitable for, all the while thinking they have some new kind of all-powerful, all-knowing intelligence on their hands that they can make do all the work for them.

GreenTeaRedFlag [any] · 2 years ago

They want the computer from star trek, and won't acknowledge they have a speech writer without morals or a career to worry about.

D61 [any] · 2 years ago

From a future Onion article...

Man asks ChatGPT best way to give oral sex, rushed to hospital hours later.

mkultrawide [any] · edit-2 2 years ago

All of these people at my job are talking about using ChatGPT to write code or formulas and all I can think is "Great, can't wait for you to run to me to fix your work when it breaks."

Comp4 [she/her] · edit-2 2 years ago

Im not a coder but as a humble office drone I do use it to write emails and it is useful for that since I usually only need to adjust 1-2 things.

mkultrawide [any] · 2 years ago

What sort of emails are you having it write?

Comp4 [she/her] · edit-2 2 years ago

I basically use it to answer requests. Really nothing wild. Ofc I could do it by hand but it safes me some time that I then can use to browse the internet at work. I can understand being skeptical using it for code but for simple formal writing its useful. I also find it a bit ehhh if you want to use it in a creative manner since you have to redo a lot.

robot_dog_with_gun [they/them] · 2 years ago

it's probably really good at CS 101 homework, good for your job security.

JoeByeThen [he/him, they/them] · 2 years ago

Wikipedia citing its sources. :capitalist-laugh:

stinky [any] · 2 years ago

Yes and when you use it for research, you can check all the sources and not use the illegitimate ones.

BeamBrain [he/him] · 2 years ago

Article is paywalled :sadness:

JoeByeThen [he/him, they/them] · 2 years ago

12ft.io

supermangoman [he/him, they/them] · 2 years ago

Honestly, we're already awash in capitalist disinformation. I really don't see anything "game changing" with GPT. Assuming Wikipedia is accurate is just as dangerous.

GonzoBonzo [none/use name] · edit-2 2 years ago

deleted by creator

supermangoman [he/him, they/them] · 2 years ago

Sources that it disingenuously misquotes to frame its desired narrative, like on its article about the Holodomor. Honestly I think that's worse. At least ChatGPT has the message pasted on its UI that says "guys this might be totes inaccurate." If only Wikimedia was so thoughtful.

ChatGPT doesn’t even know what the word “accurate” means outside of a grammatical and statistical context.

Depends. If you ask it to check its previous output for errors it will find them, some of the time. If you give it access to sources within its "context window" it's able to assess accuracy much better - it just can't replace a database all on its own.

GonzoBonzo [none/use name] · edit-2 2 years ago

deleted by creator

supermangoman [he/him, they/them] · 2 years ago

Big yikes. Computer Engineer, actually. Wanting to go back to school to research AI. Get over yourself.

GonzoBonzo [none/use name] · edit-2 2 years ago

deleted by creator

supermangoman [he/him, they/them] · 2 years ago

:pigpoop:

GonzoBonzo [none/use name] · edit-2 2 years ago

deleted by creator

robot_dog_with_gun [they/them] · 2 years ago

brave sir robin over here

ChatGPT is NOT A SEARCH ENGINE.

ChatGPT is NOT A SEARCH ENGINE.

Chris Paukert on Twitter