Google's search results have become pretty much useless for anything but like... how to videos and song lyrics. I know there are better alternatives out there in the year of our lord 2022, but I've been using Google since before it was a verb. Any suggestions?
Also, how difficult is it to build an open source search engine? I imagine it's not something that a group of hexbear coders could manage, but if I'm wrong... we could really use a search engine with an explicitly leftwing bias.
It pains me to say this but googling any which topic and +reddit at least gets you actual results. It's why I hope the site doesn't shut off but gets reformed or something, it's genuienly the last bastion of the internet to gather around actual hobbyists for shit
This is my go-to as well. It actually works, for a wide range of topics.
Also, fuck you google. You don't know better than me - if I included a search term, I fucking want it in the results. None of that "these search terms are excluded from this result" shit should've ever been implemented. I'm tired of having to use quotes around each and every search term.
Ah, it's a product, and I reckon most people don't google "fixing sram p7 instructions" but instead do "how do I fix an sram p7 gear hub"
But since the tech doesn't bend that way, ya gotta exclude some shit. Whenever something gets more user friendly, it also gets a lot worse for anybody with half an idea of what they're doing
that's my fault sorry. i google things like i'm asking someone a question lol
I've actually started doing this one, too, because I get more results from actual people asking the same question. At some point, dumb guy googling became so common that it actually became more efficient.
reddit works as a filter because it covers a wide range of topics (though still not everything) and because the bulk of the posting is human-generated and not motivated by profit (there are bots and paid actors, but they're generally small and focused on a limited range of topics)
You can use
site:reddit.com
as part of your search query to ensure the results only come from the domain reddit.com. You can take it a step further, likesite:reddit.com/r/shitliberalssay
to get an even more narrow result.edit: my bad i see this advice was already offered.
I use Ecosia. It uses the ad revenue to plant trees in developing countries such as in Brazil, Burkina Faso and Indonesia.
These are always the most endangered species, too. Not monocultures. They're a non-profit and their financial records are public for all to see. For example, in September they made €3m, paid €770k in taxes, and planted 1.2m trees.
The search engine itself is decent. Usually finds what I’m looking for. There are a lot more search engines, of course. See this article for a good list/overview. Of them, I’ve personally used DDG, Startpage, SearX, and Ecosia.
how difficult is it to build an open source search engine? I imagine it’s not something that a group of hexbear coders could manage
The bare bones of a search engine are pretty easy. You can make 1998's Google in an afternoon.
Crawling and indexing can get expensive, especially if you want your results to be up-to-date. Even Google can't afford to keep all the sites up to date all the time, and has to prioritize based on which update fastest.
Having good-quality results is an open question, really. PageRank hasn't actually worked in over a decade due to SEO. Later most of the internet's information moved to walled gardens (Facebook, Pinterest, Instagram, and the ultimate death of hobby information on the internet: Discord) and video (indexing information in videos costs approx a bazillion dollars). Then more recently, AI-generated SEO spam sites cropped up, which are really hard for a computer to tell apart from useful information.
To make a better search engine, I think you'd need some mix of:
-
custom crawlers to sneak into all the biggest walled gardens (they have anti-crawler defenses though, not sure how you could pull it off without crowd sourcing it somehow).
-
a hand-curated list of trusted sites
-
a collection of bullshit rules to exclude the major SEO spam farms. It'd be an ongoing process. The good news is that it's easier while you're small enough that they aren't specifically targeting you. The bad news is that you really can't open source your techniques
-
[if you somehow have a huge budget] Get video contents into the index using AI captioning and image recognition.
Later most of the internet’s information moved to walled gardens (Facebook, Pinterest, Instagram, and the ultimate death of hobby information on the internet: Discord) and video (indexing information in videos costs approx a bazillion dollars).
This is precisely why forums, mailing lists, and public-visible membership sites like Mastodon or even Twitter are far, far better than fucking Discord. I hate when a FOSS project does development/support based on Discord. That information becomes so difficult to retrieve and share later.
PageRank hasn’t actually worked in over a decade due to SEO.
That's absolutely fascinating. I have a vague notion of what SEO does, but I'd love to hear more about how this breaks PageRank, considering how famous that algorithm is to google's monopoly
PageRank just counts how much a website is linked to by other websites with good PageRank. It makes a ton of sense in the old internet where links were manually curated. It'd find everyone in your web ring, notice that all of you are also linking to the same Sailor Moon fansite, and rank that site highly because it's clearly well-respected in whatever niche this is.
It's easy to game this by just making a bunch of trivial websites that all link to a few of each other plus the site you actually want to boost.
Obviously you then try to exclude things that look like fake web rings, but now you're in an arms race with the SEO goblins.
You should use an AI for detecting & excluding SEO spam farms.
Spam detection is extremely well studied to the point that it's often the textbook example of a machine learning classification task.
SEO spam farms can optimize against AI detection by training their content generators against the AI detectors. Ultimately you need detection algorithms that they don't know about, which are sufficiently different from the ones they do. That's best if it's an ensemble approach, since it's harder to replicate. Some of the detectors in there may be AI based.
-
they have a long list of all the cool things here, but most importantly they have their own search index like google and it isnt based on bing like literally everything else.
it seems to work well for english results, results may vary if you go for something more niche in something that isnt english
Forgot that they were working on this, neat. I'll give it a try for a few days.
Google's results are trash but they're still better than everybody else's trash, barely. I've found DDG to be basically useless unfortunately. I think everybody who isn't Yandex or Baidu is getting their results from Google or Bing.
So I mostly use startpage, which is just Google with some of the serial numbers and privacy violations filed off. I only go back to Google if I need an up-to-date currency conversion or their better weather widget or something.
DuckDuckGo's video search is so much better than Google's. It's so easy to find low quality bootleg streams on there.
i typically use searx because privacy, but honestly, it's very hit or miss for some things and i'll resort to ddg or google. it's very bad at pulling up stack overflow results for programming issues for whatever reason
Despite all its imperfections I still use DuckDuckGo. For what I need the results aren't bad, and I don't have to use !g very often.