@[email protected] to

[email protected]English • 9 months ago

Google Search Is Now a Giant Hallucination

831

Google Search Is Now a Giant Hallucination

@[email protected] to

[email protected]English • 9 months ago

Google rolled out AI overviews across the United States this month, exposing its flagship product to the hallucinations of large language models.

Google rolled out AI overviews across the United States this month, exposing its flagship product to the hallucinations of large language models.

Chat

@[email protected]
link
fedilink
English
5•9 months ago
I don’t even think hallucinations is the right word for this. It’s got a source. It is giving you information from that source. The problem is it’s treating the words at that source as completely factual despite the fact that they are not. Hallucinations from what I’ve read actually is more like when it queries it’s data set, can’t find an answer, and then generates nonsense in order to provide an answer it doesn’t have. Don’t think that’s the same thing.
- Balder
  link
  fedilink
  English
  3•
  edit-2
  9 months ago
  I don’t even think it’s correct to say it’s querying anything, in the sense of a database. An LLM predicts the next token with no regard for the truth (there’s no sense of factual truth during training to penalize it, since that’s a very hard thing to measure).
  
  Keep in mind that the same characteristic that allows it to learn the language also allows it to sort of come up with facts, it’s just a statistical distribution based on the whole context, which needs a bit randomness so it can be “creative.” So the ability to come up with facts isn’t something LLMs were designed to do, it’s just something we noticed that happens as it learns the language.
  
  So it learned from a specific dataset, but the measure of whether it will learn any information depends on how well represented it is in that dataset. Information that appears repeatedly in the web is quite easy for it to answer as it was reinforced during training. Information that doesn’t show up much is just not gonna be learned consistently.[1]
  
  [1] https://youtu.be/dDUC-LqVrPU
  - @[email protected]
    link
    fedilink
    English
    1•9 months ago
    I understand the gist but I don’t mean that it’s actively like looking up facts. I mean that it is using bad information to give a result (as in the information it was trained on says 1+1 =5 and so it is giving that result because that’s what the training data had as a result. The hallucinations as they are called by the people studying them aren’t that. They are when the training data doesn’t have an answer for 1+1 so then the LLM can’t do math to say that the next likely word is 2. So it doesn’t have a result at all but it is programmed to give a result so it gives nonsense.
    - Balder
      link
      fedilink
      English
      2•
      edit-2
      9 months ago
      Yeah, I think the problem is really that language is ambiguous and the LLMs can get confused about certain features of it.
      
      For example, I often ask different models when was the Go programming language created just to compare them. Some say 2007 most of the time and some say 2009 — which isn’t all that wrong, as 2009 is when it was officially announced.
      
      This gives me a hint that LLMs can mix up things that are “close enough” to the concept we’re looking for.

[email protected]

[email protected]

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

3.29K users / day
9.19K users / week
17.1K users / month
31.6K users / 6 months
63K subscribers
13.5K Posts
566K Comments
Modlog