archival copy

From IndieWeb
(Redirected from web archive)


An archival copy is a copy of a web page made (often by someone other than the author) at a particular point in time, that can be used as a reference if the original disappears or is temporarily unavailable.

Archival copies are sometimes called "archives" for short, however in the context of independent publishing, the term "archive" is specifically used to refer to personal historical archives and the navigation to use them on a personal site.[1][2][3]

Why? / Use cases

  1. Store HTML to parse the microformats in the page to render comments and reply context [4]
  2. Store screenshot of page to preserve how a page looked like, without needing to store all related files (CSS, JavaScript, images)
  3. Save external content for longevity (to refer to later) in case of site-death or deletion

IndieWeb Examples

Chris Aldrich

In late December 2016, with the help of PressForward a WordPress plugin, I began self-archiving read posts and am contemplating doing the same with bookmark posts on a selective basis. (Additional details)

I also use the Post Archival plugin for WordPress to create additional backups of my content and content to which I link.

Services

Internet Archive

Main article: Internet Archive

You can curl http://web.archive.org/save/{url} to archive a post.[5]

Perma.cc

WebCite

WebRecorder

  • https://webrecorder.io/ - works clientside, e.g. with logged in sites, but saves "recordings" to their service, which you can later share (assuming you named them), may require login for recordings to persist.

archive.is

LOCKSS

The LOCKSS Program is an open-source, library-led digital preservation system built on the principle that β€œlots of copies keep stuff safe.” Many versions of these are "dark archives" which may not be publicly available.

Link Archiver on Twitter

"I make sure the Internet Archive's Wayback Machine has a current snapshot of the links my friends tweet. Follow me and I'll follow you! Experimental, by @xor. If I follow you, then any time you tweet a link I'll quietly make sure there's a backup in the Internet Archive's Wayback Machine."

Savemy.News

Ben Walsh of the LA Times Data Desk has created a simple web interface at www.SaveMy.News that journalists can use to archive their stories to The Internet Archive and WebCite. One can log into the service via Twitter and later download a .csv file with a running list of all their works with links to the archived copies.

Tools

Tools and approaches to tools for keeping archival copies of others pages.

IndieArchive

Main article: indiearchive

IndieArchive is a open source project to collaboratively grow collective archives of public pages using indieweb sites.

Owark

Owark is short for Open Web Archive and is a WordPress plugin for archiving pages you link to, and then upon linkrot, automatically showing the archived copy instead of linking to a broken page.

ArchiveBox

ArchiveBox is a self-hosted tool that "takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more)."

Archivy

Archivy is an open source "self-hosted knowledge repository that allows you to safely preserve useful content that contributes to your own personal, searchable and extensible wiki." (source code)

Perkeep

Perkeep is an open source personal storage system for storing, syncing, sharing, modelling and backing up content. (source code)

Wallabag

Wallabag is a self hostable application for saving web pages. (source code)

LinkAce

LinkAce is a self-hosted bookmark archive. ([https://github.com/Kovah/LinkAce/ source code).

Brainstorming

Local archives

Another approach is to save external content purely locally, not online or in "the cloud", for your personal reference (see use-case above "to refer to later"). This may also help offline reading / browsing use-cases.

Related discussions:

See Also