2018/Berlin/datalake

 Digital Archiving  was a session at IndieWebCamp Berlin 2018.

Notes archived from: https://etherpad.indieweb.org/datalake

IndieWebCamp Berlin 2018 Session: Digital Archiving When: 2018-11-03 14:10

Participants

 * Sebastion Dümcke
 * Sneha Belkhale
 * Toni Mattis
 * Sneha Belkhale
 * Toni Mattis
 * Sneha Belkhale
 * Toni Mattis
 * Toni Mattis

How to store

 * media vs format
 * licensing is interesting: public, permissive licensed has chance to get additional copies outside
 * media:
 * reliablitiy of media vs reliability of interfaces
 * Tape is awesome in reliability/storage duration, but readers are expensive and need to be maintained too
 * optical: CD is random, DVDs (outside maybe M-Disk?) less good. readers long-term available. Blue-Ray: archival grade available, cheap ones organic storage layers. Expensive media too.
 * Flash (SD, SSD, CF, ...) loose data (charge) if unpowered for extensive period (~5 years) (SSDs I've seen numbers in the months if not stored in cold places)
 * HDDs are relatively cheap, sturdy even if stored powered off
 * digital:
 * text is great
 * common, simple formats are good (jpg, png)
 * PDF is complex - if you do want to store to print, LaTeX (https://www.latex-project.org/) or PostScript
 * potentially: archive software too if possible (trickier for modern, clound-connected software) - if archiveing source code, archive compiler, and all dependencies as well

how to archive (structure)

 * who wants to find it
 * indexed from structure vs search based, flexible data
 * automatic tagging can work surprisingly well nowadays for images
 * petermolnar: images archived by topic, data, occasion
 * storage of metadata in files themselves is attractive (easy to move etc), but can be tricky with non-text format
 * simple index evolved into ontology/tagging

How to find stuff

 * again audience: if external users, central, somewhat standardized hierarchy is useful.
 * full-text search


 * 4 How to share/publish
 * 5 how to gather all
 * browser hierachy would be great worldbrain.io, webmemex

References
 * "As we may think" article describing idea of Memex machine
 * Zotero also works for websites