2011/Federating Systems with Peer-to-Peer Architectures
Notes from Amber Case
The main problem with architectures is that they no longer scale for the large amount of people on them. So what we've been doing with Client-Server is no logner good enough.
Does not scale
- It is expensive
- It has a single point of failure
- It is subject to congestion
- It requires polynomial Storage
Is hard to secure (we saw this lately with some systems and people's personal information. If there's a single
Places control of data in the hands of a small minority
Does not reflect the shape of real networks (Amber: it should be a more of an emergent system that adapts to the shape of the data stored, otherwise, there are giant database tables that can tip over (as in the case of the massive Foursquare failure a year ago).
If we have a bunch of peers, we imagine that on Facebook we are all talking with each other. We imagine that the information is actually sent to each other. But that is wrong! What's really happening is that we're talking up to Facebook and Facebook is talking down to all of our friends for us.
No matter what shape we try to project on the system, it's always going to look like this up and down structure. Is that the best way for a system to work? No!
People are talking to each other for different reasons. For us, our kind of ad hoc solution is to use all sorts of different networks for whatever solution we think we can use them for. For instance, Facebook for family, Twitter for socialization, etc.
Grigsby: It seems like it's also pretty difficult to secure distributed networks -- will you go into that?
Brian: Yes, I will go WAYYY into that in a moment. Just trying to give some history first.
Brian: SO here's the concept of overlay networks. An overlay network is a network system built on top on another network system. The internet is built on top of the telephone. Twitter is built on top of TCP.IP. Encompasses private network layers. Distinguishes sociological network protocols.
We're planning for the world Internet. Where we have various smaller networks built on top of a large network, where there are overlays on overlays -- for sub-interests and so on.
- DHT's combine the infuriating difficult of networking with the mathematical obscurity of hash functions.
A hash table is most like a dictionary - you have keys, and you have its definition to look it up. They break down into three things - keys, hash functions, and buckets. This gives you help when you're trying to create fast applications with fast lockups.
Analogy: Keys use a hash function to associate keys to buckets in the same way that addresses use a network protocol to associate addresses to computers.
When the hash community got into networking they made chord rings, which were extremely primitive. When you're doing these you're doing modular arithmetic. It abstracted very nicely into a ring.
Then the chord ring occurred and there were some connections there that could stretch across the circle, and this decreased the diameter of the circle instead of having to go around the circle. With the old m
Network diameter is very important because you're trying to minimize the phone calls to get to any point on the circle.
Then when there is two rings, you're dealing with toroids now. Instead of one address you have two, you find anywhere I am on the toroids by finding where I am on the circle, and how far I am from the center. If you cut that doughnut into a square and flatten it out, you get a landscape.
But this is a lot of complicated mathematics to explain really a simple concepts of where there's 2 addresses instead of one. And this model doesn't work really well. But in a sense this is very useful because we're making a neat little abstraction about what's happening.
GeoPeer: A Location Aware P2P System
GeoPeer says "I don't talk to the server, I talk to the people nearest me". In other situations where you have to talk to people far away from you, this won't be as useful. But there will be solutions made for that as well.
So why haven't these things been implemented yet? With these corporate investments in infrastructure, we get ossification, and we're not able to experiment with peer to peer networks. The environment that would've been producible in the 80's and 90's are impossible to build now, because there are a giant number of people dependent on these networks being used.
But now there are these cloud solutions are great spaces for experimentation, because there are a giant bunch of computers set up together. If you want to test it without paying for all of the computers, just try it with different port numbers. In my mind, these Peer to Peer systems are exactly what these Federated social web systems need.
What about Torrents? Well, they don't count really as federated social structures because you still have to go to a site and find your file.
Cubit: Approximate Search
Cubit, which was written by one of my professors, is a great way of approximate search. When you realize the hash functions don't have to be randomly distributed. All you have to do is satisfy some lesser constraints. At any point in the network, you should be able to get closer to what you're looking for in one jump.
Approximate search: every node keeps track of different keywords and they're separated by edit distance. If I change one word to another, I have to add letters or subtract letters. For each one add or subtract, that's one jump away.
So this created an approximate search for torrents, and remove the need for the websites to do the search on. This augmented torrents into a distributed system in a way.
If you don't have anybody in the network then you can't do anything. So you have to think what do people do in empty networks! You do have to know the address of someone in the network already. You also need to be able to put information in a system. You're trying to share something with the other addresses on the network, be it a photo or a status. And then finally, how to we find paths on the Internet to the things we want to get to. I have to be able to get this information without having to ask global questions on the network.
Dann Stayskal: Going on the neural psychology. The tuning curves are where the kind of safety networks are in that kind of mapping.
Amber: More discussion needs to occur for this.
Notes from aaronpk
Client server problems
- does not scale
- hard to secure
- places control of data in the hands of small minority
- does not reflect the shape of the real networks -- most important
On facebook we imagine that we're talking to each other, sending data to each other. But actually the network is shaped where everyone is talking to Facebook and Facebook is talking to our friends for us.
Overlay network - a network built on top of another network. Twitter/Facebook built on top of TCI/IP.
Hash function - often the challenge is finding a good hashing algorithm to evenly distribute data into buckets by the hash.
Keys use a hash function to associate with buckets, Addresses use network protocols to find computers. (analogous)
Filesharing paved the way for distributed systems because you had people going after file sharers and there wa sa strong motivation to work around it.
Content-addressible network, http://en.wikipedia.org/wiki/Content_addressable_network an improvement over ring networks. Torus-shaped.
GeoPeer - a location aware p2p system. Rather than making up a 2D address space, use your 2D coordinates on the earth as your address.
Sometimes difficult to experiment with new p2p networks because existing ones are often in use and in production. Virtual hosting environments are facilitating experimentation now with being able to quickly instance new machines on Amazon/Rackspace/whatever.