How I Search

Blog > How I Search

How I Search

Multi-engine search process for internal and public data

September 29, 2024

888 words, ~ 4 min read

perspective

Information retrieval is a crucial part of any large system.

Describing the Internet as a large system is a gross understatement. 2024 is estimated to generate 150 zettabytes of data. This is mind bogglingly big. A million seconds is about 12 days, and a billion seconds is just over 31 years. One zetta second would be almost 32 trillion years - 2,300 times the age of the universe. There’s 150 of those in just this year.

Of course, not all of this data is exposed to every individual. However, a lot of it is. We just don’t realize it because of how good search engines have become and how much we use them. Google Search is the most-visited website in the world.

AI is changing the way we search by doing the searching for us to find what we’re really looking for: answers. In recent months, I’ve experimented with and settled on a workflow leveraging a number of these. Nothing too complex, but still worth thinking about.

Search Space
Multi-Engine Search Procedure
Future

Search Space

The original stores of accessible information were libraries, from ancient times with the Library of Alexandria. With the Gutenberg printing press, books could now be mass produced.

The Dewey Decimal System organized books in a hierarchy, grouping things together based on their topic. For example, a book on travel in Europe would be found under Class 900 for History and Geography, then 910 for Geography and Travel, then 914 for Europe, and so on.

Consider if there was no numbering on these books. Instead, there were just a bunch of books, in any order. The search space is over all books, which grows linearly with the number of books - with the average time it takes to find the one you’re looking for requires looking at half the books.

Instead, we split the space into multiple categories to make it far easier to find a book. The same principle applies almost universally - supermarkets have a large sign called Produce where all fruits and vegetables can be found, with every aisle containing descriptions of what items are present.

The big insight that Google had with recommendations was to go a step further into content relevance.

Multi-Engine Search Procedure

For my searching, I use 3 engines:

I primarily use Google Chrome, with Perplexity as the default. I also set up g as a shortcut to trigger Google and gl as a shortcut to trigger Glean.

This is my decision process:

Am I looking for internal documents?
- If so, use Glean
Does this require internal context to answer?
- If so, use Glean Assistant
Am I looking for a link, website, or a map?
- If so, use Google
Am I looking for an answer?
- If so, use Perplexity
Is what I’m looking for tricky to understand?
- If so, use Perplexity (or Glean Assistant)

At the time of writing, certain tasks are still better suited for Google. In my usage, these typically come for finding a link (perhaps for a document), going to a website directly (perhaps to explore) or going to Google Maps.

Most tasks, however, require me to get an answer. Traditionally, I would look through many websites, find ones that are relevant, and then synthesize across them to get an answer. Perplexity does all of that for me.

Complicated search queries are better understood by AI search compared to traditional search engine keyword matching. Quite often, I describe what I am looking for and what I am trying to do - the engine then does the hard work of figuring out a multi stage search plan (available in Perplexity Pro). It’s incredibly useful and saves a lot of preliminary time, which is the value proposition.

Perplexity Pro is slower than Google, otherwise I’d just use Perplexity for those searches too. It’s a noticeable difference, where Perplexity is very obviously considering options and Google is just instantly returning a search.

I doubt Glean can hit that level of external search, however. Techniques are constantly shifting and they seem to be focusing more on internal corporate search, which requires building connectors and understanding ters that are rarely used in public data (out of distribution analysis of internal company data).

Future

Searching will fundamentally change. That’s not a crazy statement. What is hard is to predict how.

SEO is a major component of any large website’s strategy. It’s an arms race between search engines and websites, where websites try to game analytics, engines update algorithms, and so on. The current state emphasizes keywords, good content, domain authority, and a number of other metrics.

AI search engines are different, though. Now, websites need to figure out how these engines index and find similarity in their websites.

This is just the start. Google Lens is a foray into using a camera for visual search. However, this has been around for a while, and hasn’t had a game-changing impact.

My guess is that the UX of search will change to become more like talking to ancient scholars instead of finding websites. I have no idea how people will upload content, and there probably will still be a place for people to view information. There is no source or rationale for this prediction.

Found this interesting? Subscribe to get email updates for new posts.

Open Source, Market Dominance

Self-Driving? Not Impressive Anymore

Return to Blog

How I Search

Table of Contents

Search Space

Multi-Engine Search Procedure

Future