authorship-spec

From IndieWeb

The Authorship specification is an algorithm for discovering and determining the author of a post. See authorship for how to publish the author of a post.

Status
This is a Living Specification yet mature enough to encourage implementations
Latest Published Version
https://indieweb.org/authorship-spec
Test Suite
https://authorship.rocks/ — test consuming code
Validator
https://sturdy-backbone.glitch.me/ — test a published h-entry
Participate
issues
discussion on #indieweb-dev (on Libera IRC)
Editor
Tantek Çelik
Authors
Other contributors: revision history and authorship revision history
License
Per CC0, to the extent possible under law, the editor(s) have waived all copyright and related or neighboring rights to this work. In addition, as of 2024-03-19, the editors have made this specification available under the Open Web Foundation Agreement Version 1.0.

Use-case

Code which consumes h-entry, e.g. a CMS that receives Webmentions and parses the source for post information, needs a way to deterministically discover who the author is. The authorship algorithm is how consuming code determines who the author(s) is/are.

Algorithm

How to determine authorship of a post on a page - AKA the Authorship discovery algorithm / processing model for implementations.

  1. start with a particular h-entry to determine authorship for, and no author. if no h-entry, then there's no post to find authorship for, abort.
  2. parse the h-entry
  3. if the h-entry has an author property, use that
  4. otherwise if the h-entry has a parent h-feed with author property, use that
  5. if an author property was found
    1. if it has an h-card, use it, exit.
    2. otherwise if author property is an http(s) URL, let the author-page have that URL
    3. otherwise use the author property as the author name, exit
  6. if there is no author-page and the h-entry's page is a permalink page, then
    1. if the page has a rel-author link, let the author-page's URL be the href of the rel-author link
      • Note: this is for backcompat rel-author support, and not recommended for new sites/posts.
  7. if there is an author-page URL
    1. get the author-page from that URL and parse it for microformats2
    2. if author-page has 1+ h-card with url == uid == author-page's URL, then use first such h-card, exit.
    3. else if author-page has 1+ h-card with url property which matches the href of a rel-me link on the author-page (perhaps the same hyperlink element as the u-url, though not required to be), use first such h-card, exit.
    4. if the h-entry's page has 1+ h-card with url == author-page URL, use first such h-card, exit.
  8. otherwise no deterministic author can be found. Implementations are encouraged to document additional heuristics below for consideration for incorporation in to the authorship algorithm.

Note: the steps of checking for "url == uid == page's URL" and "url that's also a rel-me" were incorporated inline from the steps for parsing a representative h-card. Some improvements have been made here due to feedback from implementations in practice, and those improvements should be incorporated into an iteration of representative h-card.

Questions

  • (moved to github) In step 7, "if author-page has 1+ h-card...", can you clarify the meaning of "has"? In my implementation, I was checking for the presence of this h-card by iterating through all top-level "items" on the page and checking those. However, I believe I found an example where this fails. When the home page h-card is actually the "author" property of the top-level h-feed, my consuming code doesn't find this h-card. Is this sentence intended to match this case? If so, clarification on what it means to say a page "has" an h-card would be helpful. Parsing the HTML for h-card will easily find this h-card, but once you're working with the mf2 parsed result, it is buried a little deeper in the data structure. Aaron Parecki 06:45, 24 February 2017 (PST)
    • "has" like CSS .h-card - Tantek Çelik 13:02, 26 July 2017 (PDT)
    • so, if you're working with the JSON structure, this means you have to iterate through every mf2 object and all levels of nesting to look for an h-card then? Aaron Parecki 08:43, 21 November 2017 (PST)
  • Sven Knebel 2017-05-13: "if there is no author-page and the h-entry's page is a permalink page": What makes a page a permalink page? url == u-url? only one root h-entry on the page (and no feed)?
  • Martijn van der Ven 2017-07-16: could this same authorship algorithm be applied to h-feed? See chat. Are there any obvious problems with this?
    • It would start at step 4.
    • Step 5 is done as normal.
    • Step 6 would need “the h-entry’s page” changed to “the h-feed’s page”.
    • Same for step 7.4.
    • (end algorithm summary)
    • Seems like a good idea, what are the consuming code use-cases for this? Tantek Çelik 13:02, 26 July 2017 (PDT)
  • (moved to github) Aaron Parecki 2018-02-18: Both https://adactio.com and https://miklb.com use a pattern that is not supported by the current authorship algorithm. On those pages, there is a list of h-entrys as well as an h-card at the same level. There is no author property on the h-entrys, so nothing currently links those entries to the h-card. Is this a pattern that should be recognized by the authorship algorithm? Or should we encourage them to change that markup?
    • Tantek Çelik I think we should encourage h-feed markup for such cases because clustering related objects/properties ("list of h-entrys as well as an h-card at the same level") is exactly what a parent object is for. Also worth documenting as an h-feed use-case.

Test cases

Previously:

Issues

Spoofing

Authorship can potentially be spoofed, as the current algorithm may only look at the markup within an arbitrary page to determine the author.

For example, any page with markup like http://paste.debian.net/plainh/587c8bb3 would be parsed as being written by Barnaby Walters. Examples of such spoofing are present at:

was not actually written by aaronpk, but posted on a debian pastebin site and linked via a webmention.

is similarly spoofed.

http://checkmention.appspot.com allows you to test receiving a spoofed webmention from Jonathan Ive.

multiple authors

A lot of steps could be expanded (have discovery steps check for a list of authors, do "enrichment" steps for each of them individually) for dealing with multiple authors. Do we have examples of posts published with multiple authors?

  • Sven Knebel: all the steps that turn URLs into h-cards could be run for multiple author-pages. The main question IMHO would be if authors can be discovered from multiple places, or if they have to be in the same place? The first would mean early steps could not exit the algorithm, but would have to check further places for additional authors, so my first instinct would be to not allow this. (Theoretical pattern where this could come up: main author on h-feed level, with a "in collaboration with $Second_Author" on an individual post)
  • David Shanske Cannot find examples of multiple mf2 author displays in the wild. Have found prior art in RSS/ATOM feeds and articles marked up with Dublin Core Meta Tags However, for feed readers, including Feedly and This Old Reader, they completely ignored multiple authors. The only reader that displayed it was FeedMe for Android, which displayed it as a semicolon separated list. As there was only a name property, could not find tests of how this might be displayed with multiple photos, etc.

Theoretical Issues

Issues are moved here when they have no real world examples, or all examples given don't actually illustrate the problem case. Please find real world examples for them.

h-card without photo

Warning: theoretical, needs actual real world examples.

  • https://ma.tt/ has no photo on permalinks or homepage - thus not helpful

Issue:

A common practice, especially in WordPress themes(see example https://ma.tt/), is to have an h-card with name and url only, but not a photo on individual h-entrys inside a feed. The IndieWeb plugin for Wordpress allows to have the URL point to the homepage, but can not modify the theme to not add a mini hCard. The expectation of a minimal h-card would suggest you follow the URL inside the h-card to find the photo, assuming your goal is to offer a visual representation of the individual.

  • Sven Knebel: there's also the pattern of having a larger h-card in the sidebar (in WP through the h-card widget). For both, a possible change to the algorithm could take u-uid (or a single u-url) from the mini-card and use it as the author-page in the later stages of the algorithm. For the steps where an additional page is fetched, it'd be interesting how often this actually helps vs how often it doesn't produce a result. This could also benefit from integration into the consuming use case: check if there's actually properties not yet known before going further to find them.

WordPress author archives

These author archives have no h-card of their own, and have the same h-entrys with author property h-cards linking back to the author archive.

Other theoretical

Theoretical issues are grouped here for capturing purposes. If you find a real world example of one of these, feel free to promote it to an actual issue with its own === subhead.

  • string-only author: What to do if page h-entry has an author property which is not an h-card, but a string? Treat as empty and continue to fallback methods?
    • Algorithm updated for handling a string-only author that is a URL. See 5.2.
    • This probably isn’t worth discussing much as I can’t find a single example of this in the wild, but as it’s supported in the h-entry spec (“optionally embedded h-card(s)) it might be worth considering/specifying for when it does happen --Barnaby Walters 15:26, 17 January 2014 (PST)
    • Aaron Parecki's new site is now an example in the wild, e.g. https://aaronparecki.com/2016/04/06/15/ Kylewm.com 14:04, 7 April 2016 (PDT)
  • Step 7.4 looks for h-cards on the current page and will use the first one where the url property matches the author URL. The current page could have many such h-cards when minimal person cards are used for linking to other people (e.g. for person-tags), and these h-cards may have been created by someone other than the currently sought after author. They could use a nickname for the person rather than the actual name they would have expected.
    • This is an acceptable problem when 7.1–7.3 have all failed, as the algo has tried to get a full name from the author’s own specified page. It was discussed moving 7.4 to before 7.1–7.3 which may highlight the trade-of.
      • Checking whether the author page is on a different domain could help against this. E.g. by not accepting an h-card on the current page if the current page is on a different domain from the author URL.
      • Another option is to use 7.4 as a value until you get around to parsing the external page as outlined by 7.1–7.3. Then the value will be updated for its actually specified value but the overhead of the web request can be offloaded.
    • Applying the algo to h-entry objects for comments could potentially make for a bigger chance that a minimal person card has been used previously in source order. E.g. someone commenting on a post they were person tagged in.
    • Looking through the current document for h-cards in other places can help out when the document’s mark-up does not allow the h-card to be nested directly under the h-entry, because of layout/DOM limitations. See a discussion on the #microformats IRC channel. (This time brought on by Keith J. Grant.)
      • Keith also wishes “there was a way to say the mf of element X is associated with the h-* defined by element Y.” [1] This is basically what is being done by having the u-author property map to any h-card on the page with a matching u-url. In the mapping use-case it makes sense to do so before retrieving external resources.

Resolved Issues

As issues as resolved, they'll be moved here. Next step is to rewrite their resolutions as FAQs below.

Consider head meta link tags

The algorithm described above does only rely on new markup, but does not consider the page's head meta and link tags with rel="author". They have been used since HTML 3.02 and are still defined in HTML5:

  • http://www.w3.org/TR/REC-html32#meta
    • RESOLVED WON'T FIX. meta tags have long been abandoned in practice (with few exceptions, not author). Lacking any real world indieweb examples that depend/need this, makes no sense to burden consuming code with unreliable legacy hidden metadata. Better to encourage the less-bad rel=author link to a page with visible author information marked up with h-card. - Tantek Çelik 12:56, 26 July 2017 (PDT)
  • http://www.w3.org/TR/html5/links.html#linkTypes
    • RESOLVED WORKS ALREADY. rel=author was already in algorithm at step 6.1 when issue noted. - Tantek Çelik 12:56, 26 July 2017 (PDT)

Previously: this issue was originally named Does not implement standards. This is an example of a bad name for an issue because it is (1) Negative framing - it complains about what something is not rather than requesting something positive, (2) As a general statement, it is false, as of course the authorship algorithm is based on standards. Left here as an example, please avoid such negative framing overgeneral statements that are provably false, and instead, name your issue as something you want (use-case) but which does not appear to be handled, e.g. in this case, it was renamed to the actual request which was to "Consider head meta link tags".

Formal spec and issues

authorship is being supported by many implementations which would benefit from a formal specification (interoperability), test cases, a more structured issues list.

Resolution:

FAQ

What about meta author

Q: Can the algorithm consider the meta author tag as described in http://www.w3.org/TR/REC-html32#meta ?

A: No. Avoid meta tags. Old style meta tags (including "author") have largely been abandoned due to data rot and spam, two common results of invisible metadata.

Use Cases

Name avatar display in comments

In comments-presentation, it describes how a site that accepts indieweb reply posts via webmention can retrieve those replies and display them as full-fledged comments on a post, including name and icon/avatar of commenter.

h-card is the most common way that name, URL, photo information is published about a person on the web. Thus parsing for an author's h-card to retrieve their name and avatar makes the most sense.

Minimal h-cards

Maybe handle u-uid on h-cards for fallback for representative h-card? My site showed no avatars in comments with a minimal h-card like this: <a class="u-author h-card" href="https://fireburn.ru">Vika</a> until I removed h-card class from the markup. -- Fireburn.ru (talk) 16:56, 5 December 2018 (PST)

Name avatar display in a reader

In a reader (feature of an IndieWeb site), it's nice to show the name and icon/avatar of the person whose posts you're reading from their indieweb home page h-feed.

Typically this name/icon information is found via the authorship algorithm.

Unhandled Examples

In some (many?) cases, an indieweb h-feed of h-entry elements does not have explicit author information for a couple of reasons:

  • No author information inside each h-entry, because it would be redundant, since all the entries (the entire h-feed) are from the same author.
  • No rel=author link, because the page itself is likely the home page and thus page representing the author.

IndieWeb examples:

  • Tantek Çelik's home page at http://tantek.com/
    • tried <a class="u-author" href="/"></a> inside h-entry but there doesn't seem to be much interest in handling that. Also am leaning against having to include such seemingly empty / non-visible markup just to work around algorithm limitations.
  • ...

Fallback to page representative h-card

Proposal: we could add one more fallback to lack of author h-card, or lack of rel-author, and that is to use the page being processed as the author-page if no other author page has yet been found.

I.e. change "7. if there is an author-page URL " to "7. if there is no author-page URL, use the page itself as the author-page URL" and then continue processing the rest of the algorithm accordingly.

This would handle the examples from above:


Fallback to icon for photo

Proposal: if the author doesn't have a photo in their entry h-card, and they also don't have a photo on their representative h-card, but you still want to display a photo as an avatar, how about looking for an icon in the head of the document?

If a rel="icon" value exists that is using type="image/jpg" (or maybe type="image/png", but *not* type="image/ico"), then that image could be used an avatar. If a sizes attribute exists, a parser can look for the most appropriate size for display as an avatar.

This could be used as a last resort if every other photo discovery mechanism fails to return an image.

(idea by Loic)

Examples of sites with icons, but without photos in h-cards:

Algorithm Design Notes

Why do we parse for the authorship details in the order that we do?

First, we prefer the p-author of the h-entry first because that is the most direct way of specifying the information, visibly, on the page. There's also established practice among indieweb sites of publishing a mini h-card with photo, name (sometimes as the alt text of the photo img), and URL to the person's indieweb site root / home page. Also, it may be possible that the post is a guest post, in which case we really want the post-specific authorship information rather than anything general to the site.

Only if the post itself lacks direct authorship information do we fall back to checking for a rel-author link, which is a fairly well established practice for linking from posts to pages representing authors.

On such sites that use rel-author, they almost always point to a page that has a much richer h-card about the author than the post page itself, including a much higher likelihood of having a good photo / avatar image as part of that h-card. Thus we next prefer to go retrieve that rel-author destination, and look for a representative h-card there (per the "url == uid == page's URL" and "url that's also a rel-me" steps noted above).

Only if the rel-author page lacks an h-card do we then fallback to looking for a likely smaller (if present) h-card on the post page itself that has a u-url of the same value as the destination of the rel-author, thus indicating that it is an h-card for the author.

Implementations

php mf2 getAuthor

barnabywalters/mf-cleaner getAuthor() implements several extra steps whilst missing out the steps above which require fetching another URL — at the moment getAuthor completely lacks side effects:

  • If the given h-entry has an author or reviewer (for compat. with h-review if it doesn’t become more consistent with h-entry) property:
    • if the found value is a string, search all the h-cards on the page for one with a name property equal to the found value, and return it
  • If the found value is a microformat, return it
  • Look for page-scoped rel-author, search all the h-cards on the page for one with a url property equal to the first rel-author value — if found, return it
  • If a page URL is given, or the h-entry has a url property, search all h-cards on the page for one where the domain of their url property is the same as the domain of the found url — if found return that
  • Otherwise return the first found h-card on the page, or null

Converspace

XRay

ProcessWire Webmention

  • The ProcessWire Webmention plugin handles the authorship algorithm as of 2016-04-10 with the exception of step 4
    • gRegor Morrill runs this on gregorlove.com. I'm skipping step 4 for now because it's undecided what the desired behavior would be if an h-feed page sent a webmention.

Dobrado

SimplePie

mf-obj

Ruby

Brainstorming

feed authorship

gRegor Morrill: Here's my attempt at finding the author of a feed. Step 2 is based on this update to XRay.

Parse the page for microformats2 and start with no author.

  1. if an h-feed microformat is found, then
    1. if the h-feed has an author property, use that
    2. if an author property was found
      1. if it has an h-card, use it, exit
      2. otherwise if the author property is an http(s) URL, let the author-page have that URL
      3. otherwise use the author property as the author name, exit
    3. if there is no author-page, then
      1. if the page has a rel-author link, let the author-page have that URL
    4. if there is an author-page URL
      1. get the author-page from that URL and parse it for microformats2
      2. if author-page has 1+ h-card with url == uid == author-page's URL, then use first such h-card, exit.
      3. else if author-page has 1+ h-card with url property which matches the href of a rel-me link on the author-page (perhaps the same hyperlink element as the u-url, though not required to be), use first such h-card, exit.
      4. if the h-feed's page has 1+ h-card with url == author-page URL, use first such h-card, exit.
    5. otherwise no deterministic author can be found
  2. Otherwise if the page has a list of 2+ h-* and exactly one of them is an h-card, then use that h-card, exit
    1. otherwise no deterministic author can be found

Silo Implementations

none currently

Past Implementations

Google Search

Support dropped 2014-08-28[2]

Google's search spider supported only part of the authorship algorithm, rel=author, in an oddly silo-specific way:

  • if there's rel=author link on a page
    • if it links to a Google+ profile, use that profile for authorship information
    • if it links to a home page
      • if the home page has a rel=me link to a Google+ profile
      • and the Google+ profile has rel=contributor-to link back to the home page
      • then use that profile for authorship information

See Also