Webmention-brainstorming

From IndieWeb
Jump to: navigation, search
Main article: Webmention

Contents

Brainstorming

using x-forwarded-for to mitigate DDoS abuse

When verifying a webmention by checking the source site, include an X-Forwarded-For HTTP header in your request, with the value set to the IP address of the client that notified your server of the alleged webmention.

That is, to verify, do something like:

   curl -H "X-Forwarded-For:$sender_ip" $source

where $sender_ip is the IP address of the incoming request that informed you of the webmention and $source is the source page of the alleged webmention. Then check the response to see whether a link to $target is present.

For more, see DDOS#Forward_originating_IP_address.

Note: WordPress is using X-Pingback-Forwarded-For to mitigate abuse of pingbacks. And IETF is also in process of standardizing a Forwarded header (with a more detailed semantics). If someone feels strongly, we could pick one of those or mint something new, but in the meantime, use X-Forwarded-For.

Asynchronous status polling

Asynchronous status polling could be achieved via the webmention receiver returning a LINK header with rel=status with a URL for querying the status of a specific pending webmention request.

Example:

LINK: <http://example.com/m/10> ; rel="status"

Then the sending of the webmention could poll that URL for updates on the status of that particular webmention.

  • It's nice to reply to a webmention request with HTML including a link to the status location (particularly useful for mentions sent via a form). I would prefer to put the rel="status" on that visible link instead of the header Kylewm.com 19:07, 19 August 2014 (PDT)

Status Query Results

In order to distinguish between the layers of a request to the status page and the actual current status of the webmention, the page could return an additional header to explain that status of the webmention processing. This could be the same code that would have been returned if the webmention were just submitted.

Example:

HTTP/1.1 200 OK
Webmention-Status: 202

This would say that the webmention is still in the state of "accepted" but has not processed yet. This also leaves the page body open to any formatting for the user's view.

Implementations

IndieWeb community members who have implemented this

aaronparecki-webmention-status-example.png

Asynchronous status notification

Asynchronous status notification could be achieved by using a webhook in the initial webmention request.

I.e. in addition to:

  • source=URL
  • target=URL

add:

  • callback=URL

to the webmention POST request.

Webmention receivers should optionally support the "callback" parameter by calling it back (doing an HTTP POST to it) upon completion of processing of the webmention, with a "status" query parameter indicating the outcome of webmention processing. See "More status codes" below for possible values for that parameter.

Example payload sent to the callback URL:

source=URL&
target=URL&
status=200


Variable Response Body Problem

To deal with the variable response body problem:

  • Make response body optional
  • Send the error information in a response header, probably similar format to status
    • E.G. X-Webmention: 16 Source URI Not Found or X-Webmention-Response: 17 Not Linking
    • Re-use the semantics + error codes from the pingback spec [1]

propagation to a mobile device

To make webmentions more useful, it would be great if there was some way to have them propagate as notifications to whatever devices the user has setup to receive such notifications.

homepage webmentions

It is possible you may want a separate webmention endpoint to handle homepage webmentions

See: homepage: webmention to homepage.

retrying/error handling

Good retry logic can be hard, regardless of what you're doing, and webmentions are no exception. Here are some current practices we use to retry HTTP fetches for webmention endpoint discovery, sending webmentions, and fetching source pages to check backlinks.

  • Bridgy gives up immediately on 4xx errors and DNS lookup failures. On other errors (mostly 5xx and connection failures), it retries 30s later, backs off exponentially down to once an hour, and gives up after 3d.


rate-limiting

A webmention endpoint getting too many requests, potentially from a single source, should be able to communicate this to senders, asking them to retry later. (note that it might be necessary to rediscover the endpoint, maybe comparable to caching)

If a single sender is sending to many requests, HTTP 429 (Too Many Requests) seems like the apropriate response. In other cases, HTTP 503 (Service Unavailable) can be used to signal temporary failure. Both allow a Retry-After header to give a hint when new submissions are accepted.

The webmention specification currently doesn't mention how responses other than 400 & 500 should be treated.

For senders that are directly triggered by a user that can not retry later automatically, it might make sense to displays those temporary errors differently to the user than more permanent errors (like a missing WM endpoint), prompting them to retry manually later.

Endpoint By Page on Same Domain

Pages that do not support webmentions should not advertise an endpoint, even if the domain in general does.

proof-of-work

Webmention spam is not (at time of writing) really a problem; it'd be nice to keep it that way. A proposal: add a hashcash-like "proof of work" to a webmention request. In essence, this is a calculation which takes a little time to do; not enough to impact an individual sender, but enough to make large-scale webmention-sending (i.e., spam) problematic.

epoch hashing

Proposed approach (suggestions from crypto people are welcome!) In addition to source=http://source/article&target=http://target/whatever POST parameters to the webmention endpoint, also pass two new parameters: time=1417292519&nonce=5684712. The time parameter is UTC epoch time (seconds since 1970), and the nonce is calculated such that sha256(target + "-" + time + "-" + nonce).hexdigest() startswith "000000". The number of zeroes required can be varied before specifying this approach to "tune" the difficulty of creating a nonce; the above (6 zeroes) takes some seconds on a reasonably modern machine.

A webmention endpoint then checks that sha256(target + "-" + time_parameter + "-" + nonce).hexdigest() really does start with "000000", and that time is not too long ago (to prevent replay attacks; "not too long ago" is intentionally vague here because one must consider clock drift and so on).

hashcash-for-vouches

Possible alternative: set up hashcash-for-vouches.com where one submits the proof of work to h-f-c.com and if it verifies, h-f-c then vouches for you as per Vouch. What should then happen is that if people generally agree that h-f-c is requiring a proper proof-of-work, everyone whitelists it. So it's all done on the back of Vouch, but it doesn't have Vouch's clique problem.

A working implementation of this is at Hash for Vouch and described in detail there and on kryogenix.org.

plain hashcash

Hashcash for Vouches seems like overengineering.

Why not just attach a hashcash to the webmention itself?

If you were sending a webmention from http://example.org/a to http://another.example.org/b, you'd take the target URL, calculate a hashcash for that URL, then send that as an extra parameter in the webmention.

To specify the hashcash bit level, we could use a meta tag like this:

<meta name="webmention-hashcash" content="20" />

If you get a hashcashed webmention and you do support it:

  • check to see if it follows the expected text format of a hashcash and that it contains a URL you care about and/or control
  • if it doesn't, treat it as if it has not been hashcashed (or discard it if the URL doesn't match)
  • if it passes the expected format test, check the hashcash
  • if it passes, post it to your site
  • if it fails, discard it

If you get a hashcashed webmention and you don't support hashcash, simply ignore it.

Performance testing: https://gist.github.com/tommorris/d57178fd02ee991e8c08

webmention permalinks

Several individuals have implemented webmention status pages, but what about a permalink for the webmention itself?

verifying private webmentions

If you receive a webmention from a web page that requires authentication to access it, the normal webmention verification process will fail. What is the best way to have the webmention endpoint authenticate as the receiver's website to be able to fetch the private page?

Main article: private-webmention

checking endpoint once per domain/caching the endpoint

Currently the spec requires webmention discovery on every URL you want to send a webmention to. If your post links to three separate notes on the same website, for every webmention you need to check the note’s page to discover the correct endpoint even though these are likely to be the same. Would it be safe to assume only one endpoint per domain? (From IRC 2015-06-15.)

Notes from related IRC discussion 2016-04-11:

  • bridgy caches endpoints per domain for 2 hours, timing of exponential backoff for retries means if an endpoint changes the webmention is still delivered later
  • caching fails with sites implementing expiring endpoints as anti-DDOS/spam/CSRF measure
  • caching implementations will likely send Webmentions for URLs that don't accept them, receiving HTTP 400 responses and creating "unnecessary" load on both sides. They should not use high error rates as indication that the receiver is defective.
  • implementations should not cache negative discovery results -> just because one page on a domain does not support WMs and doesn't announce an endpoint doesn't mean there aren't other pages doing so.
  • If an implementation caches the endpoint and receives a 400 error when sending to the cached endpoint, it SHOULD NOT treat that as a failure, and SHOULD go re-discover the endpoint. This allows senders to use cached endpoints for receivers that don't implement CSRF protection, and will not fail for those that do.

storage

Tantek Çelik, Sandro Hawke and I ( Shane Becker) discussed what to store (irrespective of how one stores data, i.e. database, html+mf2, etc) one night after IndieWeb Summit 2016. This is what we came up with.

Store:

  • source (url)
  • target (url)
  • datetime received
  • validation state (pass/fall/pending/etc)
  • datetime validated
  • source blob (whole html payload or just a section of the html payload)

Then you should process that blob from the source. If you can learn additional information about the source's post type, you should store that (rsvp, photo, note, etc). Then process the blob for type specific properties.

Optionally store:

  • source's post type
  • source's post specific properties (either inline or in a referenced store (e.g., a photos table in a database)

Even after processing the source blob and saving parsed properties, you should keep the source blob in case you want or need to process it again in the future with some new understanding or new requirements.

sending status

Webmention senders may wish to keep track of the status of webmentions they send.

  • ready to send - the post exists on your site, but you have not yet sent the webmention
  • discovery
    • no webmention endpoint was found
    • a webmention endpoint was found
  • request received
    • the webmention endpoint accepted the request (returned 2xx)
    • the webmention endpoint requested a vouch (returned 449)
    • the webmention endpoint rejected the request because of rate limiting (returned 429)
    • the webmention endpoint rejected the request for some other reason (returned 4xx)
    • the request failed due to a server error (returned 5xx)
  • webmention displayed - the target URL sent a webmention back to you after displaying your comment

Alternatives

Alternatives to webmention.

LINK verb

James Snell brings up a suggestion to use the HTTP LINK verb to send the webmention notification. This would remove the need to first make the discovery request to find the webmention endpoint, since it could immediately send the LINK request to the URL itself.

Problems with LINK verb:

  • It's way less likely to actually get adoption using the LINK verb. The discovery step adds an opportunity to add a layer of indirection, which lets us separate the server that handles the webmention. While discovery is an extra step, but at the same time it provides a lot of benefits.
  • For example, it wouldn't be possible to have a static HTML site handle the HTTP LINK request, whereas it could delegate to a webmention service by using the <link> tag.
  • Additionally, some hosting environments may not provide a way to handle the LINK request where a normal POST would work instead.

Because of the benefits and flexibility the discovery step adds, we need to stick with it and can't use the LINK verb as cool as that might be.

http://aaronparecki.com/replies/2013/08/08/2/webmention

Issues

DDOS attack vector

Webmention, like Pingback, is a vulnerable vector for DDOS attacks. See DDOS for more discussion and proposed solutions.

3rd party tracking exploits

Third party visitor tracking (or possible exploit) via nocache icons. See [4] for details.

  • Known implements a fix for this where icons are locally saved as of September 7, 2014. Ben is interested in other approaches.

client unsure can read response

Not specifying a required response format and/or structures of the response formats is problematic as a client can never be sure they’ll be able to read the response.

  • Fortunately, as Aaron Parecki noted, if there is an error, knowing exactly what caused it isn’t really very helpful apart from debugging, where the response will be read by a human anyway
  • Rough consensus seems to be to default to a plantext error/success message, or if an accept header is given then maybe serialise it as requested, e.g. HTML or JSON. [5] Use case for HTML is in-browser sending --Waterpigs.co.uk 08:58, 22 October 2013 (PDT)

should bookmarks send webmentions

Should bookmarking (i.e. publishing a bookmark post) send a webmention? What if there are 1000's of bookmarks? how do you distinguish between normal links and bookmarks? - http://indiewebcamp.com/irc/2013-06-11#t1370966945

  • I think bookmarking should yes, just as favorites and likes do. - Tantek 14:53, 13 August 2013 (PDT)
  • As of 06-2017, several sites show bookmarks similar to likes, e.g. in facepiles

asynchronicity

Asynchronicity: a server may want to queue webmentions for source retrieval processing and do that asynchronously with the request. It's not always practical or advisable to handle all that processing in the actual HTTP POST thread. Thus receivers of webmentions must not be required to return no_link_found errors. - Tantek 14:53, 13 August 2013 (PDT)

repeat webmentions not an error

Updates: it's acceptable to send a repeat webmention, e.g. for updating or deleting comments. Thus the already_registered error must not be required, and should likely not even be an error in the first place. - Tantek 14:53, 13 August 2013 (PDT)

de-duplication

Main article: deduplication#How_to_deduplicate_responses

De-duplication (AKA de-duping deduping): replies and other responses are often duplicated in different places, e.g. via backfeed of POSSEd replies by Bridgy. Ideally, recipients should try to de-dupe webmention sources, preferring an original post.

prefer original posts

prefer original posts - an alternative to implementing de-duplication is to simply always prefer original posts (replies) rather than their POSSE copies. You can implement this by:

  • always do original post discovery on the "source" URL
  • if there is an original post,
  • use it instead of the source URL.
  • thus: if/when your post receives Bridgy webmentions of POSSE copies of replies, you can find, retrieve, and display the original replies instead (i.e. treat it as if you received a webmention for the original).
  • if you also receive a webmention for the original later (or before), you merely treat it as an update webmention using your existing update handling code path.

bad interpretation of h-feed source

  • if the source URL contains multiple posts where only one references the target post (e.g., a homepage h-feed, sent accidentally or "maliciously"), target may interpret it as many comments (see [6] for an example)
  • mitigation: webmention receivers should look for a backlink in the first h-entry on a given source page only.

Resolved Discussions

rel webmention addition to registry

  • rel=webmention should be added to the HTML5 rel registry (supercedes/obsoletes IETF/IANA 'Web Linking').
    • I'll take this up if there is enough consensus that we need it. I prefer the current URL based rel because it can be followed easily to get more info. Www.sandeep.io 06:08, 30 June 2013 (PDT)
      • I'm happy to draft-up a rel-webmention spec and add it to the HTML5 rel registry if we decide we want this. Also ok with just keeping the URL for now. - Tantek 17:46, 30 June 2013 (PDT)
      • rel=webmention is now registered in the HTML5 rel registry, and points to webmention.org as the spec. - Tantek 14:32, 13 August 2013 (PDT)
      • Let's have rel=webmention supersede the .org URL value as rel=webmention is shorter, more readable in rel values / HTTP headers, and follows more standard term based rel value naming conventions. - Tantek 21:40, 17 October 2013 (PDT)
      • rel=webmention in the rel registry now points to this wiki as living spec. - Kevinmarks.com 22:41, 26 November 2015 (PST)

rel-webmention on <a> tags

Does it make sense to allow rel=webmention on anchor tags in addition to link tags?

  • pro: wordpress.com (and maybe other hosted platforms?) prevent users from adding custom <link>s, but allow <a>s
  • con: webmentions are basically machine-only; no user-facing data
  • pro: GET request to webmention endpoint could return a helpful page with a link to webmention ("visible endpoint that teaches" [8])
  • See this pull request to the webmention spec

These libraries and implementations support <a> webmention endpoint discovery

  • PHP: mention-client library [9]
  • Ruby: webmention gem [10]
  • Python: webmention-tools [11]
  • Python: Red Wind [12]
  • ...

These webmention endpoints show human-friendly information, when visited with a GET request

More status codes

Synchronous webmention receivers should return the following status codes:

  • 200 OK - the webmention was somewhat processed (i.e. preconditions checked out), but any of the following conditions may apply:
  • 202 Accepted - the webmention is still queued for processing
  • 400 Bad Request - for any of the following conditions:
    • Source URL not found
    • Specified target URL not found
    • Source URL does not contain a link to the target URL
    • Specified target URL does not accept webmentions

Asynchronously:

Similarly, in an asynchronous interaction, the callback webhook should include a "status" URL query parameter with the status codes noted above.

Personal tools
Namespaces
Variants
Actions
Recent & Upcoming
Resources
Toolbox