It is amazing being able to watch in real time how LLM generators are making google even more unusable every day

laziestflagellant [they/them] · 2 years ago

It is amazing being able to watch in real time how LLM generators are making google even more unusable every day

Awoo [she/her] · 2 years ago

Part of the problem is their shift away from using backlinking signals.

Google was massively better when they used backlinks as a signal, even if it also meant that sometimes results would have blackhat content in there from the way blackhats could farm backlinks out to trick the algo. The quality of relevant content it produced overall was enormously better.

rubpoll [she/her] · 2 years ago

Could you explain what backlinking signals are and why they matter?

Awoo [she/her] · edit-2 2 years ago

In the past the entire web was categorised via backlinks. A backlink is something like me linking to this wiki article about Losurdo who wrote about joseph stalin. The text of this link "wiki article about Losurdo who wrote about joseph stalin" has a lot of information in it that can be used as signals for search results.

In the past, the internet was mapped by bots. Search engine bots went out into the web and they randomly visited sites. They read the pages, they then read all the links on the pages and went through them. Building a vast database of links to pages along with what the text of those links to those pages was.

This information, from thousands of entries all for the same individual page, would then be used to categorise what the topic of that page likely was, along with the content crawling of it.

You would then use all of this information to assign trust values to pages. And these trust values would result in the potential search results. All of this was primarily weighted by backlinks in the past whereas today it's primarily driven by the content itself.

The backlink method meant that you could fake thousands of backlinks and trick algos sometimes. The content method means you can trick algos with the bullshit way pages are written today.

In my opinion the older methodology gave significantly higher quality results. But some search pages would be static because some pages were so well established as high-value (linked millions of times) that they would always be the number 1-10 results. Google doesn't like this, they want the search pages to lean towards the NEW CONTENT. Because that's where the ad game is. It doesn't matter to them if a 10 year old page might have the highest and most valuable answers to a person's search, they want to serve ads. They'd rather serve mid-content and make cash from it. And that's why search engines suck ass today, because they're so heavily weighted to new and regularly updated content.

newerAccountWhoDis [they/them] · 2 years ago

And that’s why search engines suck ass today, capitalism

Feels like there's a pattern

zifnab25 [he/him, any] · 2 years ago

Shit costs money.

I do wonder if Chinese search engines are outpacing their American peers by being more publically oriented. Or if they're just blindly cribbing from American techniques as Best Practices by habit, and getting similar degraded results.

Or if this really is just a Cold War of bullshit, and the prior method would ultimately be contaminated by spammers in the same way new shit is.

robot_dog_with_gun [they/them] · 2 years ago

people were gaming pagerank back in the day, anyone who ever hired an SEO person should probably be shot.

zifnab25 [he/him, any] · 2 years ago

Hate the game, etc, etc. You can't really blame folks for wanting their content to be at the front of the Google queue.

I'd argue the real root problem is discrete list-rankings as a means of presenting information. These kinds of search results imply a certain empirical authority to higher ranks sources.

I might argue that a significant move forward on searches would be to present data not in terms of a ranked list, with the Top Item being definitive. Instead, present results as a graph with the Center Item being the result that most closely matches your query. Then you can move in 2D space to navigate results based on multiple axises of relationship and zoom in/out to reveal broader/grainular characteristics of results.

So, perhaps a search for "horse" gives you the dictionary definition. Then you have that result broken into quadrants by results organized as "horse: biological", "horse: fictional", "horse: historical", and "horse: metaphorical". Moving in a given direction gives you more refined data on that topic (so - horse: biological might give you the Wikipedia article on Horse Breeds and a Veterinary website on Horse Health). You can zoom in to get a more granular look at horses broken up by breed or zoom out and get categories of animal within the Equidae family of animals.

This kind of navigation would inevitably also get gamed. But it would de-emphasize the value of the initial results and turn it into a starting point for a search rather than the definitive result.

robot_dog_with_gun [they/them] · 2 years ago

completely changing web design would've been cool before smartphones, you can still see the legacy of 800x600 everywhere but if they did it now it would just be shit.

your idea is a little interesting but people would just click on the top left and that kind of movie UI goes wrong real fast for actual use.

zifnab25 [he/him, any] · 2 years ago

Maybe. But if I had an abundance of free time and/or some infinite cash spigot, I'd give it a shot regardless.

If nothing else, I think the novelty of a spacial search over a linear search would get people's attention and give the platform more engagement than the Bing approach of being just like Google but pushier.

chickentendrils [any, comrade/them] · edit-2 2 years ago

Reputable sites link to other sites, they form small world graphs that have bridges to other sites. They were primarily useful in early WWW search because sites had narrow focus. Eventually news sites became the hubs, and Wikipedia got mixed in and anyone can edit that, and then things started getting centralized in big forum hosts and news sites started degrading and getting bought and operated by grifters. There were fewer independent sites, blogs, etc, so most of the links just go between "news" and "social media", with a few surviving special interest sites still operating but mostly turned into places that link to news, social media, and Wikipedia. You can kind of get a similar effect between participants in social networks, but there's so much linking to mock/argue that it confuses things.

They are still useful, it just got naturally less useful as they say human activity on the Web evolved. If you run a local spider/index, like YaCy and I assume SearX, you can look at the network graphs showing connections between sites. It's usually a good indicator of the trustworthiness of a page you don't recognize.