archival copy

This article is a stub. You can help the IndieWeb wiki by expanding it.

An archival copy is a copy of a web page made (often by someone other than the author) at a particular point in time, that can be used as a reference if the original disappears or is temporarily unavailable.

Archival copies are sometimes called "archives" for short, however in the context of independent publishing, the term "archive" is specifically used to refer to personal historical archives and the navigation to use them on a personal site.[1][2][3]

Why? / Use cases

Store HTML to parse the microformats in the page to render comments and reply context [4]
Store screenshot of page to preserve how a page looked like, without needing to store all related files (CSS, JavaScript, images)
Save external content for longevity (to refer to later) in case of site-death or deletion

IndieWeb Examples

Chris Aldrich

In late December 2016, with the help of PressForward a WordPress plugin, I began self-archiving read posts and am contemplating doing the same with bookmark posts on a selective basis. (Additional details)

I also use the Post Archival plugin for WordPress to create additional backups of my content and content to which I link.

Services

Internet Archive

Main article: Internet Archive

You can curl http://web.archive.org/save/{url} to archive a post.[5]

Perma.cc

https://perma.cc/, has an API documented here: https://perma.cc/docs/developer

WebCite

http://www.webcitation.org/

WebRecorder

https://webrecorder.io/ - works clientside, e.g. with logged in sites, but saves "recordings" to their service, which you can later share (assuming you named them), may require login for recordings to persist.

archive.is

https://archive.is/

LOCKSS

The LOCKSS Program is an open-source, library-led digital preservation system built on the principle that “lots of copies keep stuff safe.” Many versions of these are "dark archives" which may not be publicly available.

Link Archiver on Twitter

"I make sure the Internet Archive's Wayback Machine has a current snapshot of the links my friends tweet. Follow me and I'll follow you! Experimental, by @xor. If I follow you, then any time you tweet a link I'll quietly make sure there's a backup in the Internet Archive's Wayback Machine."

Savemy.News

Ben Walsh of the LA Times Data Desk has created a simple web interface at www.SaveMy.News that journalists can use to archive their stories to The Internet Archive and WebCite. One can log into the service via Twitter and later download a .csv file with a running list of all their works with links to the archived copies.

Tools

Tools and approaches to tools for keeping archival copies of others pages.

IndieArchive

Main article: indiearchive

IndieArchive is a open source project to collaboratively grow collective archives of public pages using indieweb sites.

Owark

Owark is short for Open Web Archive and is a WordPress plugin for archiving pages you link to, and then upon linkrot, automatically showing the archived copy instead of linking to a broken page.

Details: http://owark.org/2012/04/14/whats-next-for-owark/

ArchiveBox

ArchiveBox is a self-hosted tool that "takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more)."

Archivy

Archivy is an open source "self-hosted knowledge repository that allows you to safely preserve useful content that contributes to your own personal, searchable and extensible wiki." (source code)

Perkeep

Perkeep is an open source personal storage system for storing, syncing, sharing, modelling and backing up content. (source code)

Wallabag

Wallabag is a self hostable application for saving web pages. (source code)

LinkAce

LinkAce is a self-hosted bookmark archive. ([https://github.com/Kovah/LinkAce/ source code).

Brainstorming

Local archives

Another approach is to save external content purely locally, not online or in "the cloud", for your personal reference (see use-case above "to refer to later"). This may also help offline reading / browsing use-cases.

Related discussions:

2023-08-25 A note to young folks: download the things you love
- Hacker News discussion: https://news.ycombinator.com/item?id=37304125

Internet Archive
indiearchive
site archive
archives
backfill
commonplace book
PASTA aka Publish Anywhere, Save To (private) Archive
https://arxiv.org/abs/1806.00871
https://twitter.com/archive_tweet
Dodging the Memory Hole conferences via the Reynolds Journalism Institute has some interesting resources and lists of researchers working on archiving "born digital" news.
2017/Berlin/bookmarks
2019/Austin/webio
2019-03-04 Steven Melendez: Delete Never: The Digital Hoarders Who Collect Tumblrs, Medieval Manuscripts, and Terabytes of Text Files (has links to resources for archiving content)
https://www.reddit.com/r/DataHoarder/
Preserve this Podcast
Web Archiving Community
https://jitp.commons.gc.cuny.edu/what-do-you-do-with-11000-blogs-preserving-archiving-and-maintaining-umw-blogs-a-case-study/
- ""
DigiPres Commons Community-owned digital preservation resources
2020-11-04 New Twitter UI: Replaying Archived Twitter Pages That Never Existed from Himarsha Jayanetti at Research and Teaching Updates from the Web Science and Digital Libraries Research Group at Old Dominion University.
WebMemex a browser extension for Firefox and Chrome developed by treora that will snapshot web pages and save copies locally on your computer
https://amberlink.org/ - plug-ins for Wordpress and Drupal to automatically locally archive copies of what you link to
ArchiveBox
https://github.com/iipc/warc2html tool to turn a WARC recording into standalone static files
Resources from the Library of Congress on personal archiving https://www.digitalpreservation.gov/personalarchiving/
Kemp, Angie, Lee Skallerup Bessette, and Kris Shaffer. “What Do You Do with 11,000 Blogs? Preserving, Archiving, and Maintaining UMW Blogs—A Case Study.” _The Journal of Interactive Technology and Pedagogy_, May 16, 2019. https://jitp.commons.gc.cuny.edu/what-do-you-do-with-11000-blogs-preserving-archiving-and-maintaining-umw-blogs-a-case-study/.
- ""