search

From IndieWeb
Jump to: navigation, search

🔍

search in the context of the IndieWeb refers to being able to search your personal site for your own content. Personal site searchability is a requirement for IndieMark Level 1.

Why

Contents

Why should your site be searchable?

  • You want your independently created and owned content to be found and preferably above and beyond content on silos.

Why should your site have a search UI?

  • Convenient indieweb site search is a very commonly requested feature by readers / users of indieweb sites.
  • They don't want to have to think to go to Google (and take the extra steps) to search your site.
  • Make it easy for your friends that read your site to find stuff there by having a simple search box in the top right of every page (common UI convention) that allows the user to type something in and perform a search on your site. You can of course use 3rd party search engine to do this, even returning results directly from them. E.g. using a Google search box on your site.

Why Not

There are sometimes reasons you don't want a particular page or section of your site to not be indexed for searching. E.g.

  • Private / private by obscurity URLs
  • Dynamic aggregations, e.g. tag aggregation pages, archives (by date etc.), because you'd rather that just the post permalinks themselves get indexed to reduce noise in search results.

How

How to implement search on your site.

searchability - level 1

Make sure your site is at least searchable (IndieMark search level 1). This means:

  • allow robots to index. Permissive or no robots.txt. Either don't have /robots.txt (easiest), or if you have one, it MUST allow search engines to index public posts on your site.
  • post content in HTML. Your post content MUST be in the visible HTML of the page retrieved from your post permalink. No depending on Javascript to render your post content - if you can't curl it, it's not on the web.
  • site-specific searchability. Be able to use "site:yoursite.example.com search-term" in Google and other search engines (that support site-specific searches) directly to find and display your posts in search results.

search box - level 2

Add a simple search box to your site using a static form that submits to a search engine to provide time ordered (most recent first) results! (IndieMark search level 2)

E.g.
<form class="search" action="http://www.google.com/search" method="get">
<input type="hidden" name="as_sitesearch" value="tantek.com"/>
<input type="hidden" name="tbs" value="sbd:1,cdr:1,cd_min:1/1/1970"/>
<input type="search" name="q"/>
<button type="submit">Search</button>
</form>

And change tantek.com to your personal site name! This HTML has been selfdogfood tested live since 2012-07-06.

Search form styling is left as an exercise for the creator.

site search with 3rd party backend - level 3

Search where your site uses a 3rd party search service (e.g. Google), but still provides the results on your own domain. (IndieMark search level 3)

How to TBD.

Third Party Search Services

site search with site backend - level 4

Search where your site handles all the indexing and search queries. (IndieMark search level 4)

How to TBD.

Software

self-hacked engines

client side search

Possible level 5.

How to TBD.

Start with: http://lunrjs.com/ I.e.:

  • Background XHR your recent content storage files to the client
  • Feed them in structured form to lunr.js to build an index
  • Enhance your search box / form with JS to do it all clientside if you have a lunr.js search index available

How To Avoid

For all the reasons above in Why Not, here's how to avoid having specific pages not be indexed:

Put this in the head of your page you don't want indexed:

<meta name="robots" content="noindex,follow" />

IndieWeb Examples

IndieWeb sites that have search interfaces.

Tantek

Tantek Çelik has had a search interface on his site tantek.com since 2012-07-06 which uses a simple static form that submits to Google search (IndieMark search level 2).

Aaron Parecki

Aaron Parecki has had a search interface on his site aaronparecki.com from 2012-07 until 2016-01 which uses a simple static form that submits to Google search scoped to the website and with query parameters that indicate to Google to return posts in reverse date order (IndieMark search level 2). Since 2016-08 there is a search interface which searches a local index of posts, returning the list of matching posts rendered in normal list format in reverse date order (IndieMark search level 4).

Ben Werdmuller

Ben WerdmĂĽller has had a search interface on his site werd.io since (2013-06-20) which uses his own site's backend (MongoDB in particular). (IndieMark search level 4).

Barnaby Walters

Barnaby Walters added a simple static search form (based on Tantek’s code) to waterpigs.co.uk on 2014-02-24 which submits to a site-scoped Google search (IndieMark search level 2).

Also experimenting with local search engine which indexes the archive of all the pages I’ve linked to as well as mentions of my own pages using Elasticsearch.

UI as of 2014-03-01, showing authorship information, page name, excerpt, URL: 2014-03-01-indie-search-halsway.png

Dan Lyke

Dan Lyke has had locally hosted search since March of 2001, and currently has a simple search which uses his PostgreSQL back-end text indexes, and does some ordering of search results based on phrases, and "+" and "-" to require and exclude. (IndieMark search level 4).

Since I'm scanning various other sites for inbound links, I'd like to, at some point, index those other sites as well for additional search options.

Ben Roberts

Ben Roberts has a search box on his site since 2014-09-30 which simply submits to Google search (IndieMark search level 2) ordered by most recent post.

Kyle Mahan

Kyle Mahan previously had local search backed by Postgres full text matching (@@ operator) on 2015-01-16. Posts were presented as a standard h-feed, but I'd wanted to style them more like "search results" (and have more results per page) in the future.

In 2016-01, I converted my site to Known, which uses MySQL full text search by default.

Christian Weiske

phinde search indieweb.png

Christian Weiske wrote his own generic search engine phinde that powers his blog search since 2016-02-05. The search engine is running on http://search.cweiske.de/ and supports site-specific search. Indie search level 4.

phinde supports faceted browsing on tags, domain, language and file type. It indexes not only the blog but all linked URLs. Crawling, indexing and the HTML frontend are written in PHP, data storage and searching is done in ElasticSearch.

The search form is available on the blog index and tag index pages, as well on single blog posts.

phinde is also used for indieweb chat log search at https://indiechat.search.cweiske.de/

gRegor Morrill

gRegor Morrill added local search to gregorlove.com for articles and notes on 2016-06-05 (IndieMark search level 4).

  • As of 2017-04-21, the search form has been moved to only appear at https://gregorlove.com/search. This page is linked from the footer of each page. I moved it in part because page footers were looking cluttered, and my usage of search is relatively infrequent. My current thinking is that it's fine to have it only on the /search URL.
    • Previously: The search form is the bottom of each page and has fields for filtering by before/after date.
  • Uses the ProcessWire API to search. Defaults to full-text queries, but uses like queries for shorter text.

Design

If in doubt: copy Google.

Search UI

To start with, no need for anything more than a single-line text box and “Search” button — keep things focused.

Indexing

Due to widespread use of microformats on the indieweb, each page being indexed is rich in semantics which can be indexed e.g. explicit name, publication datetime authorship information, relations like in-reply-to, representative image, etc.

Some properties can be faked if microformats markup isn’t present:

  • name can be substituted with the contents of the title element

Many semantics similar to microformats ones can be found in invisible metadata like OGP meta elements — whether or not it can be trusted or gives a better search experience requires further experimentation.

To get the best results a plaintext representation of each page should be indexed. In lieu of a HTML to Plaintext algorithm, some steps to follow include:

  • Remove the head element from the page
  • Remove any other script, style elements from the page
  • Replace embedded content (e.g. images, videos, audio) with it’s text-based accessible fallback e.g. alt attribute for images
  • …

Result Display

Results should be displayed in order of relevance by default. Having the option to order search results by datetime might also be useful.

Each result should form a block with clear visual thingness separating it from other results. A click anywhere on the block should navigate immediately to the search result URL, which should be shown in it’s entirety at the bottom of the block.

  • the only reason I can think of to not navigate immediately, i.e. navigate via an intermediate redirect is to check whether or not the page exists and show an archived copy instead. Perhaps that checking is better left to a browser extension which acts only on a 404 --Barnaby Walters 09:51, 28 February 2014 (PST)


Brainstorming

Faceted scoped search

Beyond raw searching of the contents of your site, it may also be useful to index and be able to search within:

  • a time window (from/to dates/times)
  • a geography (location proximity, within an area / polygon)
  • person mentions

Site plus links search

Beyond searching just the contents of what you publish on your own indieweb site, it may be useful to *also* index:

  • every page that you link to in your posts

And then provide results from those as well as your own site.

Site plus linked sites

Beyond searching just the contents of what you publish on your own indieweb site, it may be useful to *also* index:

  • every site of pages that you link to in your posts, perhaps using PuSH discovery for those sites.

And then provide results from those as well as your own site.

Social search

Beyond searching just the contents of what you publish on your own indieweb site, it may be useful to *also* index:

  • every indieweb site of any person you mention
    • in your posts
    • sidebar
    • friends lists
    • etc. anywhere on your site.
  • every site you follow in your indie reader

And then provide results from their sites as well as your own site.

More brainstorms

See additional brainstorms at:

See Also

Search related sessions at past IndieWebCamps:

Retrieved from "http://indieweb.org/search"
Personal tools
Namespaces
Variants
Actions
Recent & Upcoming
Resources
Toolbox