routing

From IndieWeb


routing (or URL routing) is a way to configure a website to have some (or all) of its URLs handled by software rather than static files; an IndieWeb site may have no routing (a static site), some routing like one directory handled by PHP or other software, or all URLs handled by a framework like Ruby on Rails with its own routing system.

Why

Why use routing? You need some form of routing for any kind of dynamic site (anything more than serving a static site) in order for the web server to send a URL to some code for it to figure out what content should be served, rather than it just using the file system.

IndieWeb Examples

Tantek

Tantek Γ‡elik has used some hybrid routing on his site tantek.com since 2010-01-01 at least when he posted his first "note" that was dynamically handled by PHP. Details:

  1. Apache .htaccess used to route of some URL patterns to falcon.php
  2. PHP Falcon.php used to route those URLs to specific types of pages:
    • home page (using an index.html template)
    • Atom feed (100% generated by code)
    • permalink pages (100% generated by code)
    • archive pages (100% generated by code, for all posts in a day or new month, or all posts of a particular type in a single day)

Tools

Brainstorming

PHP-only routing

Tantek Γ‡elik: Since Jeremy Keith showed me on day two of IndieWebCamp Brighton (2019-10-20) that it was possible to launch a PHP webserver on Mac for localhost, using e.g.

php -S localhost:8000 falcon.php

it made me realize that the routing in my htaccess file was being ignored and thus untestable using a local PHP webserver.

So to fix that I'm thinking of migrating the routing in my htaccess into PHP somehow with perhaps the following goals:

  1. Use htaccess for "server config" type stuff that only needs to be handled "on the open internet", e.g.
    • subdomains, e.g. permanent redirect from www.tantek. to tantek.
    • HTTPS-only for admin UI
    • defensive blocking abusive IPs, e.g. for excessive non-human requests
    • defensive bot blocking, e.g. for misbehaving bots ignoring robots.txt
      • AhrefsBot
      • EasouSpider
      • Egress
      • FAST Enterprise Crawler
      • FyberSpider
      • Gigabot
      • ichiro
      • Java
      • Mubidi-bot
      • MJ12bot
      • OutfoxBot
      • SpiderMan
      • Wavefire
      • wume_crawler
  2. Use PHP "routing" for anything "content" related, including URL paths
    • content types, returning HTTP header Content-Type text/html vs. text/css vs image/png vs image/jpeg vs application/atom+xml etc. for different file extensions
    • static file passthroughs for files that should be returned from the file system as-is, e.g. .css .png .jpeg .jpg .js .ico
    • paths without extensions, like /contact rather than /contact.html
    • 404 redirects to actual/new locations (sometimes repairing others's inbound links)
    • specific shortnames for blog posts, presentations, pages, or redirects like affiliate links or profile accounts (e.g. /github to github.com/tantek) or comms URLs (e.g. /txt to sms:tantek@...)
      • defensive shortname (or path) banning, e.g. requests from bots (presumably) for things I've never had like: /_vti_bin, /MSOffice, /wp-admin. Keeping this blocklist here should help prevent actually putting something there useful.
    • algorithmic shortpaths e.g. Whistle shortlinks
    • algorithmic permalinks

See Also