Blogging about our lives online.


Git And The Future Of The Internet.

I've recently taken a detour into philosophizing about where technology is going. What does the future look like and what does it mean for humanity, life and the current business models as we know them?

It all started with a bit of research into Linus Torvalds latest project, Git. I've been thinking about trying some kind of content management system for personal use. I've looked at a lot of personal database type stuff (Bento, FileMaker, MySQL, ...) and they just seem like format specific black holes to drop your content into. I'm still not sure Git is right for what I'm thinking, but I watched Linus' Google tech talk followed by Kevin Kelly's TED talk and had a vision of a web that is so much more than what it is right now.

They're both pretty long, but I've had a bit of time on my hands lately... Linus brings up two important points in his talk: one is the notion of working in a "network of trust" and the other is the sacredness of one's own data. Both of these are extremely important and often lacking components in the emerging technologies of our day. The network of trust is the only way to do collaborative work on open source development right now.

I think this is hitting a critical mass and will soon be the only way to do any kind of work. Monolithic organizations cannot keep up with the changing landscape of information growth. Git is a very interesting project because it takes this model and implements it in a very practical way. It employs a lot of very technical algorithms to allow software projects to grow very organically in a social environment. A lot of the metaphors that surround software development are hard, physical metaphors like construction, building and engineering, but the emerging metaphors are about growth, evolution and adaptation to environment. 

The benefits of collaborative networked projects are obvious but the sacredness of one's data is a bit more of a veiled concept. Linus outlines the use of the SHA1 algorithms as a means to ensure that the entire history of a project, or set of data, can be verified to be accurate and traceable throughout it's lifespan. This has obvious benefits when dealing with buggy network connections or failing hard drives, but it's more interesting to me in it's wider application.

Where's My Information?

As a person that has used a computer for a number of years I'm already seeing the breakdown of continuity in my archived information. As data gets moved around, archived to CDROM, uploaded to Google Docs, downloaded to PDF's and transferred to different operating systems, it all ends up in a soup of data without context or history. I have no idea if the timestamps are accurate, or what the context and related content might be. As soon as you add cloud computing to the mix, the problems amplify greatly.

This very blog post is being submitted to the vast expanse of content controlled and managed by the cloud. I have no simple way of traversing the internet and picking up all the odds and ends that I have put there.
This is the real direction of Git I think, and I want to figure out how to use it for more than just source code management because I think it could change the way the internet works. What if this blog was simply a mirror of the "Blog" folder on my hard drive, which was mirrored on every machine I use and was also shareable to other collaborators who mirrored their own unique versions? And what if my photo page on flickr and Facebook were simply mirrors of a folder called "Published Photos" on my hard drive which were mirrors of... and so on.

Vapor Trails

The fundamental problem of cloud computing is the owners right to content and tracking. This is generally possible with today's technology, but never practical. I have 65 documents in Google Docs at the moment and I could download all of them in one go into plain text files, but all the metadata would be garbage, and I couldn't easily merge them with the existing contents of my hard drive. Sure, I could spend a bit of time diff-ing them with my files and organizing them into logical places, but imagine if I was talking about the entire contents of my home directory. du | wc -l command shows 5,627 files in my home directory and I don't even have my music collection on this computer! Yes, the data is basically safe in the cloud, but what if I want to take it with me or move it elsewhere? What if I want to host this blog from my own server, how would I transfer it? The current cloud model only takes uploading and viewing seriously and neglects personal ownership rights. Google docs has special code written for exporting, blogger doesn't, facebook and flickr don't, youtube doesn't.

They are all greedy information gathering tools. They are only concerned with gathering your information and storing it on their sites. There are "sync" tools for most platforms, but their only intent is to gather your content with more ease and transparency.

Git looks promising in that it allows you to publish your information, yet still control the source of it.


  1. I am a year behind you, but on the same page. What are you latest thoughts in this vein?

  2. Thanks for your comment! I'm sorry I haven't been keeping up with this blog, I moved to a new blog but haven't really been posting much.

    I wish I could say that I've found some silver-bullet solution. The truth is that even though the tools are getting better at maintaining valuable metadata, the tools themselves are multiplying. This makes it harder to consolidate things.

    Git, or similar tools are invaluable for software development, but it's not really designed for the average user maintaining a digital collection.