How do you build a search engine?

Sarabjeet Singh
7 min readJun 7, 2021

--

A lot of folks at small and medium companies (particularly in retail and related industries) have reached out to me and asked about how to build a search engine and I am going to write a series on this topic. My first post today will talk about:

  • an introduction to search engines
  • search engines in the context of retail and business
  • building blocks of a search engine

In the last five years, I have worked across search indexing, query understanding, ranking, front end, and supply chains services to develop knowledge and expertise across the stack. In this article, I will share what I have learnt from my experience developing systems and services and also from reading about how the industry develops these systems.

An introduction to search engines

Before I begin, its importance to define what a search engine is. Search has evolved from a text input and output service to an experience that cuts across voice, video and conversations. For the purposes of this article, we will define search to include all popular formats and mediums of search engines that exist in the market today. Let me start with acknowledging a few key facts about search as a problem:

  • Search is an infinite problem to solve. Google has been around for over two decades now, organizing the world’s information and making it easily accessible for people around the world, and the scale they have created is astounding (they send visitors to 100M+ sites every day — more information here)
  • Modes and formats of questions, conversations and knowledge sharing are changing. With the proliferation of voice mediums, chat assistants and video conversations, users expect more from search engines and queries continue to follow much more natural and long forms (this article talks about how query patterns have been evolving)
  • Technologies on information retrieval, language understanding and experience mediums are also improving at a fast pace. Whether it is the rise of large language models (likes of GPTs), use of ultra-fast processing cores and massive computation power by large corporations, or fast touch and information exchange on your latest smartphone or voice friend, these technologies are speeding up progress across the industry (this article talks about latest language technology wars between big tech companies and associated issues)

If you want to learn more about search engines and latest relevant technologies, I’d recommend the following books and blogs:

  • AI Powered Search by Manning (book link) — slightly academic but caters to both product leads and data scientists
  • Information Retrieval (book link) — great for newbie ML or DS geeks who want to learn about search; relevant for product and technical managers too (I had the honor of being a student reviewer of this book in 2008)
  • Medium Data Science blog (link) — a collection of blogs and articles on AI, data science, ML and related technologies (authors include academics and professionals from across the industry)

Search in the context of retail or any other business

Before talking about developing a search engine, it is important to understand the context you are operating in and what the needs of your service are. Are you running a restaurants booking engine, a niche clothing site, a large scale products search engine or an internal jobs search page? Who are your users and what do they need? How often do they use your service and what is important to them?

Let’s assume you are a large retailer with stores across the American West Coast that sells grocery and household items. You have thousands of customers who shop regularly with you and you would like to build your own website or app to make it easier for your customers to shop. As you do that, you need a search engine to enable your customers to find products and services they need. If you a small business, you might use Shopify or Instacart to sell your goods online and both of them will provide you search solutions that might be good enough for your needs. But if you are slightly bigger or have a large assortment (say over a few thousand items) or have a variety of pickup or delivery options or want to offer additional services, you might need to think of custom solutions including a custom search engine.

In this article, we will assume you are interested in developing your own search engine and walk you through the core components of common search engines. Before we get into that, we will lay out what will be absolutely critical for your customers:

  • When shopping with you, customers will want an accurate view of what items are in stock at local stores; and for items that are out of stock, they would want to know when they will be back in stock
  • Being able to search for items fast and seeing results quickly will be critical; often, customers also want to filter or sort by departments, brands, or top rated options
  • Prices are important and customers want to know what’s a low vs high price in a category; in addition, information about discounts and deals and price gains will help customers make better decisions
  • Making it easy for customers to search for products by delivery or pickup options, and providing transparency on any associated fees, pickers or drivers involved and tips can be helpful
  • Customers want to look for substitutes or replacements or similar items particularly when their choices are not in stock

Building blocks of a search engine:

When developing any software system, including a search engine or service, it is important to consider:

  • the needs of users and business as shared above
  • the rules of good software design (fast, flexible, scalable, fault tolerant, secure, monolith vs micro-services, etc.)
  • any constraints and tradeoffs (timelines, resources, costs including hardware, storage, maintenance, etc.)

For smaller companies, it will be useful to look at off-the-shelf solutions for a search service. It is also good to look at the most common search stack designs or architectures used commonly by leading players in the industry. Typically, most search services would have the following components or core blocks:

  • Search index to store and manage all products or listings, whether is it restaurants, apartments, grocery items or jobs — the index must reflect accurate and up to date information on listings in a manner that search retrieval engines can easily match user needs and queries and find the latest and more relevant information
  • Query understanding and ranking models and services, to decipher user queries, match them to most relevant listings and rank the list so users can find the most useful listing on the top. Query understanding services extract words or tokens from queries to understand the category or product type and additional characteristics a user might be looking for, while retrieval and ranking services are optimized for high precision and recall to ensure users see the most relevant listings and have many options to view if they want to
  • Federators and blender services, which are used to “divide and conquer” or call multiple types of retrieval/ranking services or different indices to then collect and gather listings from all these sources (for example, when you search for “san francisco” on Google, you will see news, videos, pictures, weather and a list of relevant websites — federators and blenders are used in such situations where the same query is sent to different indices like the videos one to gather relevant listings)
  • Real-time information services and orchestration layers to provide up to date information on live or in-stock listings based on the user’s context and show accurate prices and convenience options like making a reservation or purchase; and pass this information to the client layer
  • FE clients and network/security/caching layers which manage loading the most relevant listings and information on user’s web or app pages or their voice devices in a manner that it is always available, fast, accurate, responsive and secure
  • Adjacent services and experiences including typeahead (or the query typing and autocomplete service) and spell-corrector, guided filters or facets (to help narrow a set of listings), related queries (to help guide the user to the next step), sponsored results (to show relevant new brands and products), other forms of advertising, and more
LinkedIn Search Stack

The articles below will give you a sense of how big and small companies like LinkedIn, Dropbox and Cliqz have built their search engines.

  • LinkedIn Search Stack: Galene(link)
  • Architecture of the new Dropbox search engine (link)
  • Architecture of a large scale web search engine, Cliqz (link)

In the future posts, I will go deeper into each of the building blocks above and talk about latest technologies, what might work or small or large companies, and the challenges and tradeoffs you will have to work through. My goal is to share my knowledge and expertise on search engines with all of you. If you have feedback, or there is a topic within search you would like me to talk about, please leave a comment.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Sarabjeet Singh
Sarabjeet Singh

Written by Sarabjeet Singh

Curious. Love people, ideas and technology.

No responses yet