2018/Berlin/datalake
Digital Archiving was a session at IndieWebCamp Berlin 2018.
Notes archived from: https://etherpad.indieweb.org/datalake
IndieWebCamp Berlin 2018
Session: Digital Archiving
When: 2018-11-03 14:10
Participants
- Sebastion DΓΌmcke
- Calum Ryan
- Peter Molnar
- Jeremy Keith
- David Shanske
- Sebastian Greger
- Sneha Belkhale
- Toni Mattis
- Sven Knebel
- Greg McVerry
Notes
How to store
- media vs format
- licensing is interesting: public, permissive licensed has chance to get additional copies outside
- media:
- reliablitiy of media vs reliability of interfaces
- Tape is awesome in reliability/storage duration, but readers are expensive and need to be maintained too
- optical: CD is random, DVDs (outside maybe M-Disk?) less good. readers long-term available. Blue-Ray: archival grade available, cheap ones organic storage layers. Expensive media too.
- Flash (SD, SSD, CF, ...) loose data (charge) if unpowered for extensive period (~5 years) (SSDs I've seen numbers in the months if not stored in cold places)
- HDDs are relatively cheap, sturdy even if stored powered off
- digital:
- text is great
- common, simple formats are good (jpg, png)
- PDF is complex - if you do want to store to print, LaTeX (https://www.latex-project.org/) or PostScript
- potentially: archive software too if possible (trickier for modern, clound-connected software) - if archiveing source code, archive compiler, and all dependencies as well
how to archive (structure)
- who wants to find it
- indexed from structure vs search based, flexible data
- automatic tagging can work surprisingly well nowadays for images
- petermolnar: images archived by topic, data, occasion
- storage of metadata in files themselves is attractive (easy to move etc), but can be tricky with non-text format
- simple index evolved into ontology/tagging
How to find stuff
- again audience: if external users, central, somewhat standardized hierarchy is useful.
- full-text search
- 4 How to share/publish
- 5 how to gather all
- browser hierachy would be great worldbrain.io, webmemex
References
- "As we may think" article describing idea of Memex machine
- Zotero also works for websites