URL design

From IndieWeb
(Redirected from URL format)


URL design is the practice of deliberately designing URLs, in particular, permalinks, typically for a better UX for everyone who creates, reads, and shares content. The guidance on this page refers specifically to designing URLs for personal social content.

Why

By deliberately designing your URLs and URL structure, especially permalinks, you can:

  • make them more usable
  • communicate rough topic of publication
  • make it easier for you to change your permalink policies over time (without breaking, or even having to change past years)
  • communicate date of publication (if desired; debatable)
  • URLs are UI. 2017-07-08 from Scott Hanselman:

    You care a lot about the evocative 2meg jpg hero image on your website. You change fonts, move CSS around ad infinitum, and agonize over single pixels. You should also care about your URLs.

    Emphasis in original.

Why short URLs

Why short clean URLs, or rather, disadvantage of long URLs (e.g. why short Etherpad or wiki URLs instead of / in contrast to long Google Docs URLs)

How

More Usable

Make your permalinks more usable by:

  • keeping your URLs human readable for simpler conveyance of information with URLs
  • keeping your URLs short for easier sharing, reducing mental overload, greater reliability[1]
  • Avoid giant long strings of numbers or characters, i.e. database IDs

Dates

Many people communicate date of publication by using a top-level structure that starts with a date:

  • /YYYY/MM/DD/ - the date in order of hierarchical significance
  • /YYYY/DDD/ - ISO ordinal date order, saves two characters (shorter is better, though may not be immediately understood by human readers), and communicates a linearity to your year of posts.

Avoid the "literal" ISO 8601 date patterns of use /YYYY-MM-DD/ or /YYYYMMDD/ because by omitting path separators they are are less URL friendly (less hackable), e.g., https://chat.indieweb.org/2015-07-26.

This is a very common practice for permalinks, but possibly more of an implementation detail and convenience than recommended design.

Publication date can be useful at a glance, but should be communicated in the page itself regardless. It's probably not so critical that it needs to be in the URL, especially if even the more useful topic is considered optional. In particular, some pages are "evergreen" and regularly updated, e.g. the list of books you've read, so their publication date isn't very interesting.

As an implementation detail, if you use date (bim, etc) to identify your posts and URLs and ignore any slug, it's easier to change the slug (and your internal identifier) later. You can redirect old slug to new ones without doing this, though. The important thing is that you create and maintain the redirects so that your permalinks keep working. How you do it is less important.

Non-publication dates

Though the predominant default in date-based URLs is the publication date of the page, there are some notable exceptions:

Issue date: magazines/periodicals typically identify a specific issue with a date (or month) that is often in the future. E.g. as of 2020-11-27, the following pages are valid and viewable

DRY violation

Though useful, putting the date in the URL for a post which contains its publication date in visible text is a DRY violation, and thus vulnerable to inconsistencies, e.g.

If you put the date of your posts in your permalinks, please take extra care to keep it accurate, and in the case of errors, make sure to fix the permalink and redirect from the errant version.

Topic

Communicate topic by using a "slug" somewhere after the date, e.g.

  • ../tag1-tag2-tag3

Also:

  • Many put the slug at the end of their permalinks
  • Make the slug optional for identifying the post (i.e. not a required part of a permalink) if at all possible, since
    • it contains human written/readable/editable content
    • you may want to change it after the fact without any need for maintenance or URL redirects,
    • it may be inadvertently truncated (like in email, or in IRC).
    • it may be inadvertently extended by a bad autolinker that is errantly including a subsequent character, like a "." or a ","
  • Some have chosen to make the slug required:
    • gRegor Morrill: I like the method of slugs being optional and redirecting to the canonical URL, but I thought about it for a while and decided that I preferred having only the year and month in my URL hierarchy. If, in rare situations, a URL slug gets truncated, I plan to perform a search on the partial slug and present possible matches on the 404 error page.

Content Type

Individuals with large quantities of different content types may want to differentiate in the URL what type of content to expect, as it primes the user for the subsequent interactions with the content. For example, take this comparison:

  • /2014/11/10/url-design - This URL could be anything about "URL design." It could be an article, a favorite, a reply, or even a photo about "URL design."
  • /2014/11/10/reply/url-design - This URL would be give the reader of the URL immediate understanding of the content to expect at that URL: a reply. If a person was not looking for this type of content, it would allow them the ability to skip over this content or be ready for a threaded conversation around "URL design."

Ordinal

One of the drawbacks of having an optional topic slug as mentioned above is that a lot of posts could become difficult to pinpoint when posting multiple times per day. Adding a time-relative ordinal to the end of a given Date allows for better pinpointing while still maintaining relevancy to readers of the URL. Take the two Dates formats listed above and simply add the ordinal at the end (N). This maintains hierarchical significance in both types of Date URL structure.

  • /YYYY/MM/DD/N/
  • /YYYY/DDD/N/

Author

As web publishers began to publish more and more "snackable content," or short meme driven content, the feeds became overwhelming. Few publishers offer an author feed if you want to add specific journalist and avoid clickbait content.

  • BuzzFeed articles urls follow content type, author, and then article title.

/article/alexkantrowitz/how-the-retweet-ruined-the-internetHowever each author's page does not have a corresponding RSS feed.

Avoid

See everything listed in this article and expand here inline:

Long URLs

Long URLs are fragile and break in many places, e.g.

  • email - auto-wrapping at 70chars etc.
  • IRC / terminal UIs[2]
  • ...

Long URLs look less trustworthy, especially when they have a bunch of utm_... tracking parameters.

Long URLs look ugly when copy/pasted into IM/PM/DMs.

IndieWeb Examples

See: permalinks#IndieWeb_Examples

Perspectives and Experience

Aaron Parecki

I've tried a number of different permalink formats over the years. Below is a list of some of my past attempts as well as a description of why I eventually moved away from it.

Pre-2012

/{year}/{ordinal day}/{type}/{sequence}/{optional slug}

Examples:

  • /2011/203/article/1/enabling-ssh-on-the-seagate-blackarmor-nas-220
  • /2011/203/note/9

Issues:

  • The type in the middle of the permalink is a strange mixing of hierarchy. The original intent was to provide a hint at the content to expect at the URL in case there was no slug
  • The ordinal day is not easily readable

2012-2015

/{type}/{year}/{month}/{day}/{sequence}/{optional slug}

Examples:

  • /notes/2015/12/23/1/h-card
  • /notes/2015/12/23/1

I originally had the type first to be able to create a feed for specific post types without querying a database, just reading the files on disk, since all notes were stored in a folder called "notes", articles in "articles", etc. The addition of a database as an index meant that this was no longer solving a problem. Moral of the story is: don't let implementation detail affect URL design.

2016-Present

/{year}/{month}/{day}/{sequence}/{optional slug}

Examples:

  • /2015/12/23/1/h-card
  • /2015/12/23/1

Ryan Barrett

snarfed.org has two main types of URLs:

(Both have created and updated timestamps in the page contents themselves.)

FAQ

Why Not US Date Order

Q: Why not US date order like /march/15/2014/ ?

A: Lots of problems with this:

  1. US-centric - the web is world-wide
  2. English-centric month name "march" is not international friendly (again, *world-wide* web)
  3. does not follow hierarchical significance - "march" and "15" are less significant than 2014.
  4. makes it harder to change URL policies every year (since the year is the 3rd component instead of the first.

Time

Q: What is a good way to represent time in a URL?

A: There are several reasonable approaches. Using zero-padded hours HH (24hr), minutes MM, and seconds SS:

  1. Immediately after the date with separators, e.g.

    /YYYY/MM/DD/HH/MM/SS

    or even

    /YYYY/MM/DD/HH:MM:SS

  2. Immediately after the date without separators - less readable but ok

    /YYYY/MM/DD/HHMMSS

Omitting the seconds SS is ok too if you don't find yourself posting more than once a minute.

/YYYY/MM/DD/HH:MM

Alternatively if you post more than once a second (e.g. automatic metrics), you may want to include a digit or two of decimal seconds.

/YYYY/MM/DD/HH:MM:SS.ss

Why not AM PM

Q: Why not times with AM and PM like /12:57pm/?

A: Some problems with this:

  1. AM/PM are easily confused / misread (bad for usability)
  2. makes the URL longer unnecessarily (compared to 24hr time)
  3. less international - 24hr time is more readily recognized when reading world-wide.

Why not content type first

Q: Why not put the type of content first in the URL structure, e.g.

/reply/2014/11/10

 ?

A: Many reasons:

  1. The more known/stable aspect should go first. Dates are much more well understood (stable) and well known than "kinds" of posts, which are still squishy and growing. Permalinks and URLs in general are supposed to be stable, thus putting the more stable pieces first makes sense. Less change if you do have to change the squishy parts.
  2. Year first allows changing URL policies more easily, like once a year. If your year is first, you can set a policy for how your URLs work each year and change it, not having to go back and change past years.
  3. Keeps the URL distinct from type implementation. The type usually has to be defined somewhere in the implementation of a site or in a list of allowed types. Changes to the names, classification, or implementation of types can be made without changing the URL.
  4. Makes it easier to experiment with new post kinds. If the type is not in the URL it is easier to change the type of a posts type later. For example, a new post could originally be published as a note and later changed to a more specific type like an RSVP.
  5. Experience. Aaron Parecki in particular started with a type-first URL structure, and is now having to convert it all to date first because of various scaling, storage, and other reasons.[3] Anthony Ciccarello also started with type first when the kinds of posts were more distinct but changed to a generic /posts root once he added more types. [4]

Alternatively:

  1. Ben Roberts puts content type first as a user feature /note/2015/8/12/4/ but do not depend on it for identifying posts (the problem Aaron Parecki was having). Putting in an incorrect type will automatically redirect to the correct URL. This allows for more intuitive URLs for streams of type specific posts. /note/ will list all notes, /photo/ only photos, etc.

URL in a URL

Q: How can I put a URL within a URL as in http://example.com/other/http://example.org/post?

Aside: note common use-cases for a URL within a URL:

  • Syndicating in some content from another site, e.g. sites that accept content submissions like IndieNews
  • Archiving a copy of content from another site, e.g. Internet Archive Wayback Machine, or personal sites that archive/serve a copy of content from defunct or lost sites or zombies.

A: Current best practice: remove the protocol part of the second URL as http://example.com/other/example.org/post. For example:

The problem with the example in the question is that URLs should have only one instance of http:// in them for readability.

Alternatives:

  • Pass the second URL as an encoded parameter. Disadvantage: URL encoded parameters are uglier and somewhat obfuscate the other URL.
  • Keep the protocol part. Disadvantage: this makes your URL uglier (and may confuse some autolinkers). E.g. https://web.archive.org/save/http://example.org/post/ will save a snapshot of the given URL to the Wayback Machine.

Articles

  • 2016-05-29 Daniel Appelquist: In defense of the URL

    I bet if you presented people with a URL and asked them โ€œwhat is this?โ€ they would tell you something like โ€œitโ€™s a web address,โ€ โ€œitโ€™s a web site,โ€ โ€œitโ€™s an internet address,โ€ โ€œitโ€™s a linkโ€ or something that indicated they basically knew what it was.

These articles need to list their dates of publication explicitly and article titles, with perhaps a brief blockquote specific to what aspects of URL design they are covering

More thoughts on (potentially additional aspects of) URL design:

Semantic Web related:

See also permalinks e.g. for why

Brainstorming

Posting UI

  • I use /new (or /new/note , /new/checkin) for my posting UI. Tantek ร‡elik asked why not /create instead of /new. The URL is slightly shorter but I normally use /create for the internal site call to actually create the object, not show the creation UI. Additionally I feel this follows the naming scheme for text editors ("New Document") and "create" is more a software action. Ben Roberts

URL Template

Consider creating a URI Template for your site.

E.g.

See Also