archival copy
This article is a stub. You can help the IndieWeb wiki by expanding it.
An archival copy is a copy of a web page made (often by someone other than the author) at a particular point in time, that can be used as a reference if the original disappears or is temporarily unavailable.
Archival copies are sometimes called "archives" for short, however in the context of independent publishing, the term "archive" is specifically used to refer to personal historical archives and the navigation to use them on a personal site.[1][2][3]
Why? / Use cases
- Store HTML to parse the microformats in the page to render comments and reply context [4]
- Store screenshot of page to preserve how a page looked like, without needing to store all related files (CSS, JavaScript, images)
- Save external content for longevity (to refer to later) in case of site-death or deletion
IndieWeb Examples
Chris Aldrich
In late December 2016, with the help of PressForward a WordPress plugin, I began self-archiving read posts and am contemplating doing the same with bookmark posts on a selective basis. (Additional details)
I also use the Post Archival plugin for WordPress to create additional backups of my content and content to which I link.
Services
Internet Archive
You can curl http://web.archive.org/save/{url} to archive a post.[5]
Perma.cc
- https://perma.cc/, has an API documented here: https://perma.cc/docs/developer
WebCite
WebRecorder
- https://webrecorder.io/ - works clientside, e.g. with logged in sites, but saves "recordings" to their service, which you can later share (assuming you named them), may require login for recordings to persist.
archive.is
LOCKSS
The LOCKSS Program is an open-source, library-led digital preservation system built on the principle that βlots of copies keep stuff safe.β Many versions of these are "dark archives" which may not be publicly available.
Link Archiver on Twitter
"I make sure the Internet Archive's Wayback Machine has a current snapshot of the links my friends tweet. Follow me and I'll follow you! Experimental, by @xor. If I follow you, then any time you tweet a link I'll quietly make sure there's a backup in the Internet Archive's Wayback Machine."
Savemy.News
Ben Walsh of the LA Times Data Desk has created a simple web interface at www.SaveMy.News that journalists can use to archive their stories to The Internet Archive and WebCite. One can log into the service via Twitter and later download a .csv file with a running list of all their works with links to the archived copies.
Tools
Tools and approaches to tools for keeping archival copies of others pages.
IndieArchive
IndieArchive is a open source project to collaboratively grow collective archives of public pages using indieweb sites.
Owark
Owark is short for Open Web Archive and is a WordPress plugin for archiving pages you link to, and then upon linkrot, automatically showing the archived copy instead of linking to a broken page.
ArchiveBox
ArchiveBox is a self-hosted tool that "takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more)."
Archivy
Archivy is an open source "self-hosted knowledge repository that allows you to safely preserve useful content that contributes to your own personal, searchable and extensible wiki." (source code)
Perkeep
Perkeep is an open source personal storage system for storing, syncing, sharing, modelling and backing up content. (source code)
Wallabag
Wallabag is a self hostable application for saving web pages. (source code)
LinkAce
LinkAce is a self-hosted bookmark archive. ([https://github.com/Kovah/LinkAce/ source code).
Brainstorming
Local archives
Another approach is to save external content purely locally, not online or in "the cloud", for your personal reference (see use-case above "to refer to later"). This may also help offline reading / browsing use-cases.
Related discussions:
- 2023-08-25 A note to young folks: download the things you love
- Hacker News discussion: https://news.ycombinator.com/item?id=37304125
See Also
- Internet Archive
- indiearchive
- site archive
- archives
- backfill
- commonplace book
- PASTA aka Publish Anywhere, Save To (private) Archive
- https://arxiv.org/abs/1806.00871
- https://twitter.com/archive_tweet
- Dodging the Memory Hole conferences via the Reynolds Journalism Institute has some interesting resources and lists of researchers working on archiving "born digital" news.
- 2017/Berlin/bookmarks
- 2019/Austin/webio
- 2019-03-04 : Delete Never: The Digital Hoarders Who Collect Tumblrs, Medieval Manuscripts, and Terabytes of Text Files (has links to resources for archiving content)
- https://www.reddit.com/r/DataHoarder/
- Preserve this Podcast
- Web Archiving Community
- https://jitp.commons.gc.cuny.edu/what-do-you-do-with-11000-blogs-preserving-archiving-and-maintaining-umw-blogs-a-case-study/
- ""
- DigiPres Commons Community-owned digital preservation resources
- 2020-11-04 New Twitter UI: Replaying Archived Twitter Pages That Never Existed from Himarsha Jayanetti at Research and Teaching Updates from the Web Science and Digital Libraries Research Group at Old Dominion University.
- WebMemex a browser extension for Firefox and Chrome developed by treora that will snapshot web pages and save copies locally on your computer
- https://amberlink.org/ - plug-ins for Wordpress and Drupal to automatically locally archive copies of what you link to
- ArchiveBox
- https://github.com/iipc/warc2html tool to turn a WARC recording into standalone static files
- Resources from the Library of Congress on personal archiving https://www.digitalpreservation.gov/personalarchiving/
- Kemp, Angie, Lee Skallerup Bessette, and Kris Shaffer. βWhat Do You Do with 11,000 Blogs? Preserving, Archiving, and Maintaining UMW BlogsβA Case Study.β _The Journal of Interactive Technology and Pedagogy_, May 16, 2019. https://jitp.commons.gc.cuny.edu/what-do-you-do-with-11000-blogs-preserving-archiving-and-maintaining-umw-blogs-a-case-study/.
- ""