Internet Archive

From IndieWeb
(Redirected from
Jump to: navigation, search

The Internet Archive is a non-profit organization that is building a digital library, including archival copy of much of the public web, and since 2019 is the primary location the IndieWeb community hosts videos of IndieWebCamp and Summit sessions.

How to

Upload an Archive

The Internet Archive wants to create a library of publicly available files. If you have legal rights to share they will host your content. Create an account and upload your files.

When you upload files you make a page each item. An item can contain multiple files. A concert, for example, would be one item with each song in a file.

To "hotlink" to the files, to embed a podcast for example, you need to use the following[IDENTIFIER]/[FILENAME] instead of[IDENTIFIER]/[FILENAME]

To upload files to an existing page you need to edit the item. Directions for editing items

Trigger an Archive

NOTE: As of some time around 2020-08 this no longer works, and returns an error that says "You need to be logged in to use Save Page Now."

Updated API

There is a new API documented in a Google Doc

Capture request

SPN2 runs on which requires authentication using two alternative methods: S3 API Keys (highly preferable). Get your account’s keys at Use HTTP Header "authorization: LOW $accesskey:$secret" in your requests. Cookies: Get logged-in-sig and logged-in-user from your browser when you log in to and add them to your SPN2 HTTP requests. Cookies are not desirable because they tend to expire after a few days so you would need to login again to to get new cookies.

There is a limit of 7 concurrent save sessions per user.

To capture a web page via the API, you can use an HTTP POST or GET request as follows:

curl -X POST -H "Accept: application/json"  -H “Authorization: LOW myaccesskey:mysecret” -d'url='
curl -X GET -H "Accept: application/json" --cookie "logged-in-sig=AAAAAAAAAA;;"

The method described below is supposed to call through to the new API and enqueue it, but it may be slower - the current quoted backlog is 671 minutes

Previous API

You can tell to crawl and archive a specific URL immediately.

$ curl -I -H "Accept: application/json"{url to archive} | grep Content-Location

and you'll get a response like:

Content-Location: /web/20160715203015/

(We have wiki entries for the -I option and the -H option to curl.)

The response includes the path to the archived page on Append this path to to build the final URL for the archived page.

Trigger Archive in PHP

PHP snippet to call the curl_exec() function with appropriate options/params to trigger an archive:

$options = 
array(CURLOPT_URL => ('' . $url_to_save),
      CURLOPT_HEADER => true,
      CURLOPT_HTTPHEADER => array('Accept: application/json'),
$ch = curl_init();
curl_setopt_array($ch, $options);
$response = curl_exec($ch);
$info = curl_getinfo($ch);

You can then check $info['http_code'] for a numeric HTTP return code to see if there was an error (e.g. >= 400) and take action accordingly.

Trigger Archive in Python (with requests)

This method requires the 'requests' Python library.

pip install requests
import requests

# fire and forget call
    verify = requests.get(
        '%s%s' % ('', target),

Trigger Archive in Ruby

This is a Ruby method that triggers archiving by the Internet Archive for a given URL. It returns the URL that you can visit to see the archived page. It uses the rest-client Ruby gem to make a GET request.

require 'rest-client'

def web_archive(url)
  archive_request_response = RestClient.get("{url}")
  "" + archive_request_response.headers[:content_location]

Trigger Archive in Browser

You can manually prepend to a URL to save it on-demand. You can also set up a bookmarklet to archive the current page: [1]

javascript:var''+location.href,'', 'scrollbars=1,status=0,resizable=1,location=0,toolbar=0');


(new!) There is a Known plugin to auto-archive your posts and edits to your posts, as well as pages that you bookmark:



There is a useful little plugin Post Archival in the Internet Archive which will not only archive the user's post, but will also archive all the links within the post. has a setting (off by default) to send new blog posts to the Internet Archive. It does not currently archive other links found in the blog post.

IndieWeb Examples

Jeremy Keith

Jeremy Keith has been pinging to archive pages that posts link to since 2016-09-26

Aaron Parecki

Aaron Parecki has been pinging with every URL that a Webmention is sent to since 2016-09-26


Tantek Çelik implemented (2016-11-13) pinging with every URL in a link/reply-to and has been doing it from since 2016-11-15.

Chris Aldrich

Chris Aldrich implemented (2017-01-07) pinging the archive with URLs of both his own posts as well as links within posts using Post Archival in the Internet Archive.


Kevin Marks implemented pinging the archive with URLs of the SVG display posts, and thus indirectly the SVGs themselves


Kevin Marks implemented (2017-01-24) pinging the archive with both the source and target URLs of the webmention so that they get preserved too.

Jonathan LaCour

Jonathan LaCour implemented on 4/04 (HAH!) in 2018 using the Known plugin.



While likely only the blog, downtime has happened:

  • 2019-04-23:


And attempting to archive that page failed!


See Also