post-type-discovery

From IndieWeb

Post Type Discovery specifies an algorithm for consuming code to determine the type of a post by its content properties and their values rather than an explicit “post type” property, thus better matched to modern post creation UIs that allow combining text, media, etc in a variety of ways without burdening users with any notion of what kind of post they are creating.

Status
This is a living Editor's Draft yet mature enough to encourage implementations and feedback.
Editor's Draft
https://ptd.spec.indieweb.org/
W3C Note
https://www.w3.org/TR/2018/NOTE-post-type-discovery-20180118/
Participate
Wiki (Feedback, Open issues, issues on GitHub)
IRC: discussion on #indieweb-dev #indieweb-dev on Libera
Editor
Tantek Çelik
License
Per CC0, to the extent possible under law, the editor(s) and contributors have waived all copyright and related or neighboring rights to this work. In addition, as of 2024-11-21, the editor(s) and contributors (2015-04-07 onward) have made this specification available under the Open Web Foundation Agreement Version 1.0.

Post Type Discovery was published by the W3C Social Web Working Group as a W3C Note on 2018-01-18, and is currently maintained by the IndieWeb community as a Living Standard at https://ptd.spec.indieweb.org/

Use cases

Both creation user interfaces, and post presentation designs are evolving to directly use the presence or absence of specific properties (and their values) directly, rather than depending on any kind of explicit "post type", thus why bother discovering a post type in the first place? This section documents the (few) use-case(s) that is/are known to date.

Synthesizing explicit type formats

There are existing formats that require explicit post types (e.g. ActivityStreams), and code that consumes them expects explicit post types. Post type discovery enabling automatic synthesizing of such formats from posts that merely have a set of content related properties.

Algorithm

The Post Type Discovery algorithm ("the algorithm") discovers the type of a post given a data structure representing a post with a flat set of properties (e.g. Activity Streams (1.0 or 2.0) JSON, or JSON output from parsing microformats2), each with one or more values, by following these steps until reaching the first "it is a(n) ... post" statement at which point the "..." is the discovered post type.

  1. If the post has an "rsvp" property with a valid value,
    Then it is an RSVP post.
  2. If the post has an "in-reply-to" property with a valid URL,
    Then it is a reply post.
  3. If the post has a "repost-of" property with a valid URL,
    Then it is a repost (AKA "share") post.
  4. If the post has a "like-of" property with a valid URL,
    Then it is a like (AKA "favorite") post.
  5. If the post has a "video" property with a valid URL,
    Then it is a video post.
  6. If the post has a "photo" property with a valid URL,
    Then it is a photo post.
  7. If the post has a "content" property with a non-empty value,
    Then use its first non-empty value as the content
  8. Else if the post has a "summary" property with a non-empty value,
    Then use its first non-empty value as the content
  9. Else it is a note post.
  10. If the post has no "name" property
      or has a "name" property with an empty string value (or no value)
    Then it is a note post.
  11. Take the first non-empty value of the "name" property
  12. Trim all leading/trailing whitespace
  13. Collapse all sequences of internal whitespace to a single space (0x20) character each
  14. Do the same with the content
  15. If this processed "name" property value is NOT a prefix of the processed content,
    Then it is an article post.
  16. It is a note post.

Quoted property names in the algorithm are defined in h-entry.

Methodology

There are two important aspects to the methodology of the Post Type Discovery algorithm: scope (why is something explicitly in the algorithm), and order (why is something where it is in the algorithm).

Scope
The algorithm could attempt to cover innumerable potential hypothetical post types, or take an evidence based approach, focusing on real world publishing practices. This specification does the latter, specifically by placing a minimum bar of documented real world publishing practices of different visually apparent post types on the open web at recent (< 1 year old) permalinks, each with at least three independent implementations that have converged on what properties (and potentially values thereof) they have to imply their visually apparent post types. As a result of being evidence based, it is likely this specification will expand over time as more apparent post types are published by more convergent implementations.
Order
The algorithm must also specify an order (e.g. of precedence) that various properties (and their values) imply various post types. The algorithm is ordered by post types that are in general "richer" in terms of content as well as show greater cognitive effort by the author.

Other Types Under Consideration

Other types are being considered and will be included in the future iterations of the algorithm based on convergence of publishing patterns and critical mass of adoption thereof.

Examples

Like post

Here is an example h-entry post from Activity Streams 2.0 Vocabulary examples:

<div class="h-entry p-name">
  <span class="p-author h-card">Sally</span>
  liked
  <a class="u-like-of"
    href="http://example.org/notes/1">
    http://example.org/notes/1
  </a>
</div>

Following the algorithm, the step "If the post has a "like-of" property with a valid URL" is satisfied and thus the algorithm returns that the post is a "like" post.

Given this semantic, an implementation can generate (or process as if generated and consumed) the following AS2 JSON, in particular the "@type": "Like" in this output is what is determined by this algorithm:

{
  "@type": "Like",
  "actor": {
    "@type": "Person",
    "displayName": "Sally"
  },
  "object": "http://example.org/notes/1"
}


Next steps

This section is for the editor, and may provide hints at updates to come for this specification.

Feedback

Feel free to leave feedback about the draft here, please sign your name with ~~~~!

PTD as a guide, not a rule

As described here, many people have gradually shifted from choosing an explicit post type up front to combining individual parts (photos, location, person tags) and discovering post type after the fact. The IndieWeb community discovers and analyzes new real world post types, catalogues them here on the wiki, and then adds them to this algorithm.

This is great in general, but sometimes tricky. Post type boundaries are often fuzzy and overlap in the real world - quotations are a good example - and algorithms and code aren't good at fuzziness and uncertainty. Given this, we should keep in mind that that post type discovery is high level and sometimes imprecise and not expect too much of it beyond that.

-Snarfed.org

Re: above about "fuzziness" of PTD, should we have some means of allowing content to give a "hint" to what kind of post type that they're intending to represent? Jacky Alciné 2022-07-12

Issues

Open issues on this specification are documented here.

Challenges from summaries

note detection name content details

  • The article vs. note check currently says to check if name is a prefix of content, but this fails the common auto-generated "name" case, where the name contains everything in the post including the content. I have had better luck checking name.contains(content) instead of content.startswith(name) Kylewm.com 21:24, 29 September 2015 (PDT)
    • Note with an implied "name" that contains the "content": https://shanehudson.net/2015/09/24/2042
    • What if we made it an OR? That is "name.contains(content) OR content.startswith(name)" means this is a note? Tantek 22:15, 29 September 2015 (PDT)
      • I can't think of anyone who gives their notes an explicit title that is a prefix of the content. The markup would be odd <div class="e-content"><span class="p-name">This is note</span> with a title that is a prefix</div>. I don't feel too strongly about it, but if we can't come up with any real-world examples, it's probably not worth complicating the spec with content.startswith(name)
        • It was more anticipatory, as those of us doing name==content on notes start truncating the name for HTML title element readability, e.g. dropping leading URLs. Tantek 14:00, 16 February 2016 (PST)

How to handle new types

  • The post type discovery algorithm needs some way for new types to be discovered. For example I am publishing bike rides, runs, and food consumption on my site right now, and it would be better to have a way to add these types as a possible result of the algorithm without having to change the core algorithm.
    • Jacky Alciné I've considered reworking the PTD parser in Rust such that one can define the 'rules' for a type and the parser will attempt to find the best one from the provided set. It'd match on things like properties in an object and property values (namely to help with finding it an article is being used)

Entry only or events too

  • Is this limited in scope to "h-entry" only? Are "h-event" always "event"s and never replies, invites, etc.? Kylewm.com 09:21, 28 October 2015 (PDT)
    • Quite right, the top level microformat object can help with post type discovery. Will look into real world examples and detection steps. Tantek 16:26, 16 February 2016 (PST)

Feature request type invitation

  • Add "invitee" property implies post type "invite"? Above or below "rsvp"? Kylewm.com 09:21, 28 October 2015 (PDT)

Feature consideration checkin

Since Known had to coordinate with client software to make this explicit, written to be inserted into the current algorithm

  • if there is a u-photo property, then it is a photo (any venue information is info about the photo)
  • if there is u-location to a venue h-card with name plus street-address and/or geo lat+long then is a checkin (any note/content is a "shout" or comment about the checkin)
  • ...
  • if there is a content and/or name that "match" then it is a note
  • ...
  • if there is a "location" property (but no content nor name) then it is a checkin

In summary:

  • photo -> photo
  • named location with adr street-address and/or geo lat+long -> checkin
  • ...
  • content~=name -> note
  • ...
  • location geo lat+long or adr -> checkin

Resolved

Summary vs content details

  • In an h-feed, an entry may have a name and summary but no content. In this case it is an article, but step 6 would have already classified it as a note since there is no content. Perhaps update this step to read "If the post has no "content" property and no "summary" property"?
    • Steps 6 and following have been updated to use "summary" as a fallback for "content". Tantek 21:02, 29 September 2015 (PDT)

Note as fallback

  • It appears that a post with no content and no name would be classified as a note. Would it be better for a possible result of this algorithm to be "unknown"?
    • No. By having the known fallback of a note post, implementation for both publishing and consuming code is more predictable, and reliable. Tantek 21:40, 29 September 2015 (PDT)

Empty comments

  • For example, when determining the post type of a comment, I may not want to display anything if the content and name are empty since it would just look like an empty post.
    • What you display is up to you (outside the scope of this specification), and "post type" should only be one aspect of what you use to decide what to display or present. Tantek 21:40, 29 September 2015 (PDT)

Feature Request video

  • There are several examples of people publishing video posts. Can we add "If the post has a "video" property with a valid URL, then it is a video post." before checking for the "photo" property? (Video posts often have a photo property as a fallback for consumers that don't understand video posts and to show a thumbnail of the video in a feed)
    • There is only one recent example of people publishing video posts with u-video (Ben Roberts), one person no longer publishing (last was 2+ years ago), and no other examples with u-video.[1] One < three - thus video stays as "under consideration" for now. Tantek 22:08, 29 September 2015 (PDT)

Theoretical objection to AS2 compat

  • Specific to http://microformats.org/ vocabulary, while incorrectly suggests support of http://www.w3.org/TR/activitystreams-vocabulary/ For example as:Like does NOT recommend using 'like-of' property (which doesn't even exist in AS2.0 Vocabulary) instead it uses generic 'object' property. Wwelves.org perpetual-tripper 14:04, 6 October 2015 (PDT)
    • The algorithm shows how to discover a "like" post via the 'like-of' property. Once that discovery has occurred, the implementation can treat it as an AS2 as:Like. Tantek 11:24, 6 October 2015 (PDT)
      • AS2.0 Vocabulary does NOT define 'like-of' property and uses generic 'object' instead. This algorithm will not work on data following canonical example of as:Like in JSON-LD serialization. AS2.0 specs editor warns in both spec to consider Microformats HTML only informative and that they don't represent recommended modeling. Also current state of Microformat vocabulary doesn't allow using it in JSON-LD serialization, which includes 'like-of' property defined in it. Based on above, claiming compatibility of this draft with ActivityStreams 2.0 looks very misleading! Wwelves.org perpetual-tripper 14:04, 6 October 2015 (PDT)
        • Closing as theoretical only, not a problem in practice. No implementations have had any problems with this. You may reopen if you implement it, or other implementers may re-open if they have the same problem. Tantek 14:00, 16 February 2016 (PST)

FAQ

What about a photo reply

Q: What about a reply that includes a photo?

A: It's a reply.

Q2: Should that show up as a "photo" post?

A2: It should show up as a "reply" and not be in a user's published feed of their photos. The user-centric design here is to treat replies separately, because in practice, when users post replies to others' posts, and include a photo, the photos typically assume the context of that other post, and would look odd outside of it (e.g. in a generic "photos" feed). In addition, by not including reply photos in a user's feed of their photos, it gives the user the freedom to reply to other posts with whatever they wish, including photos, and not have those reply-specific photos pollute their streams of "their stuff" that their followers subscribe to.

A2a: From a presentation perspective, a reply should primarily be displayed as a reply first, and then adapt accordingly to whatever other properties it may have.

Is a video tag sufficient

Q: Is a video tag sufficient to imply a video post?

A: No, video tags can be used for additional content e.g. inside an article. Only relying on video tag markup would lead to false positives. E.g. http://cweiske.de/tagebuch/kdenlive-lower-third.htm has a video tag but is an article, not a video post.[2]

Implementations

Implementations, in progress, partial, or complete, of Post Type Discovery.

Granary

Granary synthesizes ActivityStreams, microformats2, and Atom from various input feeds and sources, and as such has some code that can be considered in progress or even a partial implementation of Post Type Discovery:

p3k

p3k (a CMS) implements Post Type Discovery internally within its micropub endpoint to automatically add posts to various collections. E.g.: if this post is a reply, it goes in the "replies" collection. if it's an RSVP, it goes in the "rsvps" and "replies" collections.

mf2util

mf2util exposes a function for post_type_discovery that takes an h-entry and returns "like", "reply", "note", "article", etc.

IndieWeb Utils

IndieWeb Utils has a get_post_type function that returns the type of a post per the Post Type Discovery specification.

indielib

indielib exposes a function for post discovery algorithm. This algorithm is augmented with other post types that are not in the original algorithm, such as watches, listens, etc.

Normative References

Informative References

History

The following was originally part of posts, and moved here.

Alternative perspective: instead of explicit post kinds, infer the post kind (and thus presentation/UI) by what aspects/properties a post has. E.g.:

The above list is very roughly ordered and likely requires implementation experience (of multiple post types/kinds) to better understand which properties should more primarily determine a post type than others.

Combinations of the above will also have to be explored to see if they make sense, and given multiple possible inferred post kinds, which should "win" in terms of driving the presentation/UI. E.g.

  • with an embedded image and with a p-location venue -> checkin with photo. E.g. this post by aaronpk should be considered a check-in. Reasoning:
    • the p-location applies to the whole h-entry, not the specific u-photo, thus making the h-entry primarily a checkin, and secondarily, happening to have a photo
    • the alternative, a photo post tagged with a venue location, should be expressed by having the p-location be a property of an object nested on the u-photo, e.g. a "u-photo h-item" or "u-photo h-photo" which then has a "p-location h-card" inside it. The actual nested root class (h-item or h-photo, these are (mostly) just brainstorms) matters less than there happening to be such a nested root class.

Articles about implied post types

See Also