2020/East/personaldata

From IndieWeb

Personal Data Warehouses was a session at IndieWebCamp East 2020.

Watch: ▶️00:56:33s

Notes archived from etherpad: https://etherpad.indieweb.org/personaldata on 2020-11-16 at 11:22 PM


IndieWebCamp East 2020
Session: Personal Data Warehouses
When: 2020-11-14 14:50 PM Eastern

Proposal

  • - Dogsheep, Nostalgia, Nextcloud, more (scheduled)
  • Simon Willison (facilitator)

Dogsheep and Nostalgia are two projects that tackle the personal data warehouse problem: how can we ingest our personal data from multiple sources into a space that we can control, and then run our own queries against that data to learn more about ourselves?

Dogsheep https://dogsheep.github.io/ builds on SQLite and uses Datasette as a web interface https://datasette.io/ - More info here: https://simonwillison.net/2020/Nov/14/personal-data-warehouses/ Nostalgia is built around Pandas DataFrames - https://github.com/nostalgia-dev/nostalgia and allows for access via the Nostalgia query language or through applications such as Timeline https://github.com/nostalgia-dev/timeline

I am looking at using Nextcloud I use a lot of OwnCloud apps now but they limit you to 1,000 folders. but not personal data beyond gpx and want to build a jukebox

  1. personaldatawarehouses


Personal Data Warehouse systems: Dogsheep, Nostalgia, Nextcloud, more

How do personal data stores work for the IndieWeb?

Combined with:

  • "Meta information from the physical world"
    • Many indieweb people are adding physical world data to their entries - location, weather, mood, etc. It'd be interesting to discuss the drive behind them, examples for the existing ones, their usefullness, etc.
  • "Webnative/Indie Web & Web 3.0"
    • Fission, AppRun TrailMarks TrailHub
    • How to make make the WebNative World interoperate with Indie Web Standard and likeminded ecosystems

Participants

Notes

Simon Willison demoing Dogsheep:

Dogsheep is a personal data warehouse built on top of Datasette: https://dogsheep.github.io/ and https://datasette.io/

The key idea is to pull data from many different sources (Twitter, Apple HealthKit, GitHub, Swarm, etc) and load it into SQLite database files. These can then be browsed using the Datasette web interface, queried using that interface or custom SQL queries and visualized using Datasette plugins.

More about Dogsheep (including a video demo and screenshots) here: https://simonwillison.net/2020/Nov/14/personal-data-warehouses/

- issues, genom, checkins, etc - massive amount of data stored in a web-ui searchable SQLite - "complex SQL query is easy to do if you have it all in a database"

https://dogsheep.github.io/

(questions around the "why keep a certain information" #metaIRL topic ties)

- Simon: when did I last have a waffle fish ice cream? - Peter: why would you want to know when? Location, yes, photo, yes but why the when? - Simon, oh, cause I'm a nerd.

The ultimate goal it to have everything together, to have it in a form that makes it possible to explore it

Here's a redacted version of my Dogsheep crontab: https://gist.github.com/simonw/1299d61d17637d1145955ebc019ea3c4 - I run manual export scripts for the photos and HealthKit stuff, but most of the other sources are updated automatically via APIs.

Some public demos:

Nostalgia is built around Pandas DataFrames - https://github.com/nostalgia-dev/nostalgia and allows for access via the Nostalgia query language or through applications such as Timeline https://github.com/nostalgia-dev/timeline

Gyuri talks about https://guide.fission.codes/ , IPFS, and it's possibilities to run services from inside a browser

Moving on to why keep/collect/attach data? - Ian Forrester: people keep it to have access to their legacy - if they don't have to search for it - how to present present it to them? - angelogladding.com: "AI mind" show memories to myself - maxwelljoslyn.com: do Facebook, but ethical - send birthday things to eachother, send/show memories - Simon: this stuff is genuinely superpower - it's super-memory

  - has 80 parallel projects in github, needs to treat them as someone else's project
  - turn yourself into a cyborg - "memory doesn't matter any more'

- IRL metadata is useful for the human factor? - aaronpk includes an emoji that indicates a mood so posts can differentiate between emotional differences embedded in posts - Steve Williams: this is really personal and vary from one person to another, so memories-posts need to be sensational - post mood - Simon Willison: that's why I want all the data - context!


See Also