2018/Berlin/datalake

From IndieWeb

Digital Archiving was a session at IndieWebCamp Berlin 2018.

Notes archived from: https://etherpad.indieweb.org/datalake


IndieWebCamp Berlin 2018
Session: Digital Archiving
When: 2018-11-03 14:10

Participants

Notes

How to store

  • media vs format
  • licensing is interesting: public, permissive licensed has chance to get additional copies outside
  • media:
    • reliablitiy of media vs reliability of interfaces
    • Tape is awesome in reliability/storage duration, but readers are expensive and need to be maintained too
    • optical: CD is random, DVDs (outside maybe M-Disk?) less good. readers long-term available. Blue-Ray: archival grade available, cheap ones organic storage layers. Expensive media too.
    • Flash (SD, SSD, CF, ...) loose data (charge) if unpowered for extensive period (~5 years) (SSDs I've seen numbers in the months if not stored in cold places)
    • HDDs are relatively cheap, sturdy even if stored powered off
  • digital:
    • text is great
    • common, simple formats are good (jpg, png)
    • PDF is complex - if you do want to store to print, LaTeX (https://www.latex-project.org/) or PostScript
    • potentially: archive software too if possible (trickier for modern, clound-connected software) - if archiveing source code, archive compiler, and all dependencies as well

how to archive (structure)

  • who wants to find it
    • indexed from structure vs search based, flexible data
    • automatic tagging can work surprisingly well nowadays for images
    • petermolnar: images archived by topic, data, occasion
    • storage of metadata in files themselves is attractive (easy to move etc), but can be tricky with non-text format
    • simple index evolved into ontology/tagging

How to find stuff

  • again audience: if external users, central, somewhat standardized hierarchy is useful.
  • full-text search
  • 4 How to share/publish
  • 5 how to gather all
    • browser hierachy would be great worldbrain.io, webmemex

References

  • "As we may think" article describing idea of Memex machine
  • Zotero also works for websites

See Also