open search design

From IndieWeb


Open Search Design is a discussion about ways in which the component pieces of search engines (i.e. crawler, indexer, and web UI) can be developed separately and link together.

Why design open search projects?

Good search engines need to solve a lot of problems to provide relevant, accurate, and appropriate results to users. When you visit a search engine, you are looking for a quick result that meets your intent, without having to click through many spam links, outdated posts, or other pieces of irrelevant material.

Solving problems like relevance takes a lot of work and many parts of search are their own fields entirely. With a distributed approach, those interested in search could create systems that encouraged open participation from others so as to create an environment where collective contributions would help advance search as a whole.

How

There are many areas of search that could be made open or distributed, including:

  • Crawling, where an indexing service agrees on a distributed system through which people who want to show their content on the search engine can submit a crawled version of their site.
  • User interface representation, where a search engine has an open index that can then be built on top of by other parties.
  • Relevance algorithms, where a search engine provides an open index that can then be queried using custom algorithms.

Search has traditionally been taken on by individual businesses (i.e. Google, Yahoo, Yandex) so the vocabulary and exploration around ways in which search could be distributed or designed to support open programmatic access requires deep discussion.

IndieWeb Examples

IndieWeb Search

  • capjamesg is building IndieWeb Search in the open on GitHub. The search engine is marked up with h-feeds and has an API that returns direct answers and search results. This creates a system upon which custom search UIs could easily be built.
    • More exploration is needed in terms of how the search engine can be made more open. How could IndieWeb search accept crawls from other community members? How would these crawls be validated?

Considerations

Distributed crawling

  • How does one verify the integrity of crawl results openly? Can this be done? There would have to be a high bar for validation set in order to ensure that people do not create manipulated indexes of their site that they use to rank higher for certain terms.
  • Setting up a web crawler doesn't take too much computing power at a small scale. At what scale does a distributed approach make sense?
    • IndieWeb Search can index over 200,000 URLs in two days on a generic cloud server which is find for a search engine of its size.

Search UIs

  • What problems would an open index upon which one can build UIs solve?
    • People could easily build native site search on their website.
    • One could synthesize search results from multiple standard-compliant search engines.
  • What are the problems with existing search UIs that need solved?
    • Maybe a new search UI could come up with novel ways of extracting "direct answers" from snippets.
      • This would open up further NLP research on bigger datasets without having to set up a crawler.

Criticism

The term "open search" may be confusing as it is reminiscent of the OpenSearch specification.

Brainstorm

Add your notes here.