Personal Data Warehouses was a session at IndieWebCamp East 2020.
Notes archived from etherpad: https://etherpad.indieweb.org/personaldata on 2020-11-16 at 11:22 PM
IndieWebCamp East 2020
Session: Personal Data Warehouses
When: 2020-11-14 14:50 PM Eastern
- - Dogsheep, Nostalgia, Nextcloud, more (scheduled)
- Simon Willison (facilitator)
Dogsheep and Nostalgia are two projects that tackle the personal data warehouse problem: how can we ingest our personal data from multiple sources into a space that we can control, and then run our own queries against that data to learn more about ourselves?
Dogsheep https://dogsheep.github.io/ builds on SQLite and uses Datasette as a web interface https://datasette.io/ - More info here: https://simonwillison.net/2020/Nov/14/personal-data-warehouses/ Nostalgia is built around Pandas DataFrames - https://github.com/nostalgia-dev/nostalgia and allows for access via the Nostalgia query language or through applications such as Timeline https://github.com/nostalgia-dev/timeline
- +1 Maxwell Joslyn Oh yes!
- +1 Template:gyuri
- +1 Template:jenna >>> last time slot if poss! >> thanks :)
- +1 Michael Bishop
- +1 Angelo Gladding
- Ryuno-Ki / jaenis.ch (having a personal Nextcloud instance as well)
- +1 sue hanen
- +1 Greg McVerry
- +1 Kevin Marks
- +1 Antonio Rodrigues
- +1 kongaloosh
- +1 Johannes Ernst (selling Nextcloud appliances for fun and profit :-))
- Tantek Çelik
Personal Data Warehouse systems: Dogsheep, Nostalgia, Nextcloud, more
How do personal data stores work for the IndieWeb?
- "Meta information from the physical world"
- Many indieweb people are adding physical world data to their entries - location, weather, mood, etc. It'd be interesting to discuss the drive behind them, examples for the existing ones, their usefullness, etc.
- "Webnative/Indie Web & Web 3.0"
- Fission, AppRun TrailMarks TrailHub
- How to make make the WebNative World interoperate with Indie Web Standard and likeminded ecosystems
- Simon Willison (Facilitator)
- David Shanske
- Tantek Çelik
- Jeremy Felt
- Sue Hanen
- F. Weil
- Antonio Rodrigues
- Caroline Kuhn
- Antonio Rodrigues
- Gyuri Lajos
- Ian Forrester
- Johannes Ernst
- Martijn van der Ven
- Steve Williams
- Peter Molnar
- Template:jenna (sorry to have missed most of this, didn't catch that it didn't stay in the last timeslot)
- Tantek Çelik
Simon Willison demoing Dogsheep:
The key idea is to pull data from many different sources (Twitter, Apple HealthKit, GitHub, Swarm, etc) and load it into SQLite database files. These can then be browsed using the Datasette web interface, queried using that interface or custom SQL queries and visualized using Datasette plugins.
More about Dogsheep (including a video demo and screenshots) here: https://simonwillison.net/2020/Nov/14/personal-data-warehouses/
- issues, genom, checkins, etc - massive amount of data stored in a web-ui searchable SQLite - "complex SQL query is easy to do if you have it all in a database"
(questions around the "why keep a certain information" #metaIRL topic ties)
- Simon: when did I last have a waffle fish ice cream? - Peter: why would you want to know when? Location, yes, photo, yes but why the when? - Simon, oh, cause I'm a nerd.
The ultimate goal it to have everything together, to have it in a form that makes it possible to explore it
Here's a redacted version of my Dogsheep crontab: https://gist.github.com/simonw/1299d61d17637d1145955ebc019ea3c4 - I run manual export scripts for the photos and HealthKit stuff, but most of the other sources are updated automatically via APIs.
Some public demos:
- https://github-to-sqlite.dogsheep.net/github - some of my GitHub data
- https://dogsheep-photos.dogsheep.net/public/photos_with_apple_metadata - selected public photos
Nostalgia is built around Pandas DataFrames - https://github.com/nostalgia-dev/nostalgia and allows for access via the Nostalgia query language or through applications such as Timeline https://github.com/nostalgia-dev/timeline
Gyuri talks about https://guide.fission.codes/ , IPFS, and it's possibilities to run services from inside a browser
Moving on to why keep/collect/attach data? - Ian Forrester: people keep it to have access to their legacy - if they don't have to search for it - how to present present it to them? - angelogladding.com: "AI mind" show memories to myself - maxwelljoslyn.com: do Facebook, but ethical - send birthday things to eachother, send/show memories - Simon: this stuff is genuinely superpower - it's super-memory
- has 80 parallel projects in github, needs to treat them as someone else's project - turn yourself into a cyborg - "memory doesn't matter any more'
- IRL metadata is useful for the human factor? - aaronpk includes an emoji that indicates a mood so posts can differentiate between emotional differences embedded in posts - Steve Williams: this is really personal and vary from one person to another, so memories-posts need to be sensational - post mood - Simon Willison: that's why I want all the data - context!