Blogging about our lives online.


Photo Archiving


Backups and Archives are two different things. Ideally you should have a good archiving system, and your backup would be a redundant copy of that.

In photography, a robust archiving system has between two and four copies of any file. These copies are:

  • Camera Card
  • Working Copy
  • Source
  • Backup

In this case, Source and Backup are read-only copies that are exact duplicates of the file off the camera. The Backup copy should be on a seperate physical media and ideally in a seperate physical location.

The working copy is the one you edit, view and share. Once a file is edited, it is important to make a Source and Backup copy of them as well.

It is good to view these copies as layers of mutability.

Camera Card: Changes the most. Every shooting session will create and erase content from the card.

Working Copy: Can change with each editing session and these files can be safely erased once backups are made.

Source Copy: Should never change, it is the copy that is accessed for viewing and making working copies. Any Source copy that is deleted will be deleted forever.

Backup Copy: A mirror image of your Source copy. It will only be accessed in the rare event that your source copies are compromised (HD failure, fire, theft, etc.).


The way to implement this structure will vary with the tools that you are using. The first step might be to disable automatic importing when you connect your camera. A better solution when connecting your camera is to make source and backup copies immediately.

If you are using DVD's as your Backup copy, it probably makes sense to use a USB thumb drive as an intermediate backup until you have enough to fill a DVD.

It is important though to do regular integrity checks on your Backup copies. You need to make sure that they aren't compromised without you knowing. If you use an external hard drive, and are command-line savvy you might do "diff -r PictureSource/ PictureBackup/" to compare the entire contents of the two folders. If you are using DVD's you either have to trust the quality of your media, or do periodic checks of your media.

  1. Connect Camera.
  2. Make Source copy, check file count.
  3. Organize Source photos into events, deleting any Absolute Garbage.
  4. Make Backup copies, check file count.
  5. Make Working Copy from Source copy by importing into your editor of choice.

Resist the temptation to organize your Source folder too much. Use only large, time-based, linear chunks.

  • DO: Pete's Wedding, Reception, Saloman Bay Beach...
  • DON'T: Flowers, Trees, Close-ups, Saloman Bay Beach...

The attentive reader will notice that the beach name is both in the do and don't list. This is because in the first, I've assumed that it was an afternoon of shooting at the beach. In the second it is a couple shots of that subject interspersed with shots of other subjects.

On a related note, don't rename your files. The filenames from your camera are a very good shooting record of your camera. If you want to attach descriptions and tags, do it in the file's metadata.

Now you are ready to edit your files. A typical editing session might look like this.

  1. Make a Working Copy from Source copy (if it's not already done).
  2. Edit the file(s).
  3. Make a new source copy next to the original (eg. DSC1255-edit1.jpg)
  4. Make a Backup of what has changed in the Source.

If you are using DVD's, you probably want to make a new folder for any edited files, because the original folder might already be burned to disc. Again, use the original name with a standard suffix (Pete's Wedding-edited).

Generalized Strategy

Photos are actually quite easy to organize, compared to the myriad of other file types out there. But the systematic logic can be transferred to other types of files as well.

By being systematic, a lot of confusion can be avoided. It also makes it possible to work toward automating the process entirely. But as the number of file types increases, so does the tendency to use different sorting methods or mixing methods.

What The Internet Needs Now


I've been blogging a fair bit lately about metadata, archiving and dealing with the masses of information that we produce and consume today.

It all comes from an idea that has been alternatively percolating and distilling in my mind. It's the idea of stability.

The internet age is highly energized and highly transient. The next big thing comes along, is adopted by huge masses of people, and is discarded in a matter of months or years. Examples abound, but my working example is waning interest in Facebook.

I recently printed a PDF of my entire facebook history because I'm a dork and find that type of thing interesting. I just kept scrolling down and clicking "Older Posts" until I reached the "Andy joined Facebook" post. Here are my findings:

  • It's only a 36 page PDF (small font though). Though it would be more.
  • I joined in June 2007.
  • My first friend is someone I haven't spoken to since.
  • My first status update is still one of my favorite quotes: "The chief enemy of creativity is good taste. - Pablo Picasso"
  • I was really into "Graffiti" in the early days. I printed those off too.
  • I "connected" with many people that I had forgotten about for years. Then promtly forgot about them again! Some I genuinely want to keep in touch with though, so that's good.
  • I wish there was a graphic scale of friend numbers. I think it would resemble a logarithmic curve, rapidly rising, but plateauing at around 220.

Okay, it's a silly game, but I find it alarming that we invest so much time into something that is probably doomed to extinction or, more likely, habituation. Facebook has become an e-mail replacement and when is the last time you were truly excited about e-mail?

The tool has become stable: less exciting, but stable. The end result is a tool that is far less robust and stable technically than the one it replaced. E-mail may be getting archaic, but it is a completely open and robust mechanism that is largely independant of any specific implementation. Facebook on the other hand is completely tied to corporate interest and a locked down API.

I think we must all make a conscious effort to aim for stability in the turbulent age we live in. To research the tools we use and what the long term strategy is for them. And for those who are in the business of creating the tools, to think long and hard about how robust the system is in the long term. Is a locked and site-specific format necessary? (no, never!) Is your user policy going to limit the long-term viability of the system?

In the case of this blog, I've made a conscious decision to keep control of the content. The site is just the publishing medium, the actual product, for me, is just a folder of plain-text files on my hard drive in chronological order. It may not look flashy with fixed-width fonts and markdown formatting, but it's simple, stable and guaranteed to work long into the future.


Auto Metadata


One interesting part of the weekend photographic seminar I just got back from was the emphasis on metadata.

It's important to take great photos, but it's just as important to know how you got those results. In the days of film, A photographer would keep a shooting log, recording aperature, speed, lens, etc. A decent digital camera will record all that information for you as long as you know where to look for it.

Photography is one of the best examples of how useful metadata can be. The reason it is so useful is that it is unambiguous and immutable. Any metadata that has these qualities is easy to work with.

Pure Metadata

This is the simplest kind of metadata to work with and often the most useful. It usually has a direct connection to something tangible and concrete. Some types of pure metadata:

  • File size
  • File Attributes (resolution, color space)
  • Physical Settings (ISO, aperature, shutter speed)
  • Date*

Dates are very useful but only if they are relevant. If the date is not the actual shooting date it is actually less help than no date at all. This can happen with downloaded files or files that are saved to a new location.

Impure Metadata

File names are probably the least valuable indexing tool out there. They don't have to have any relevance to the content, an identical file can have different names and totally different files can have the same name.

Tags are good, but I don't find them as useful as they could be. Again, there isn't necessarily any correlation with the content, there is no standard set of tags or naming conventions enforced. But the primary reason it doesn't work for me is that it's a manual process. Any metadata that isn't automatically applied to all content is too much work.

Folder heirarchies are actually metadata and we use them every day. It's important to understand this when backing up or rearranging files because the heirarchy probably has meaning that would be lost if you moved those files.

The biggest problem with impure metadata, like the one's I've mentioned, is they can be very hard to normalize. There may be information attached as file names, tags and folder structures but it's all been manually assigned by what makes sense at the time. This manual metadata will always be very fragmented and incomplete.


Meota Summer Photographic Seminar


Just finishing up the weekend photographic seminar at the lake. Got some good shots and some good practice at composition, lighting and rhythm.


Chaotic Data


Data is chaotic. Our attempts to tame it are largely attempts at dehumanizing ourselves. Here's a transcript of a piece of cardboard on my Grandfathers wall:

|                         |
|  WALLET    LIST         |
|            ------       |
|   EARS                  |
|             CANE        |
|  WATCH                  |
|                         |
|   SWIM                  |
|                         |

Even by typing it here, I am imposing a fair bit more order than the original had. The original was written with a Sharpie and, although it was very legible, the intent and structure was difficult to parse. "List" may or may not have been the title or one of the items to remember. "Ears", I assume, meant his hearing aid. All the items seemed like a general checklist for leaving the house until "Swim", which makes the list seem very specific to a certain day, or day of the week. He has had this list posted by the door for quite some time.

Grandpa's list was not very Search Engine Optimized, but he didn't seem to bothered by the shortcoming.

This list has a few other interesting qualities that don't transfer well to the digital domain:

  1. It's taped to the door. Location and size mean a lot in the physical world.
  2. It has no definite structure. Anything could be added or crossed off the list with ease.
  3. It's author is obvious. The handwriting is an echo of the mind of the author.

My point is not that my Grandpa has a habit of making strange artifacts, my point is that we are more unique, creative and human when we're not forced to order our data along the way.

Our attempts to structure and control our data are effectively dehumanizing us. Yes, parsing the messy world of human language and thought is not a simple task for computer systems. It is much easier to build rules and frameworks and force humans to fit their ideas into them; to meet the computer halfway. This has a few effects:

  1. it allows the less creative minds to be efficiently less creative and feel organized
  2. it encourages independant thinkers to constantly sabotage the system for entertainment
  3. it erodes the capacity for beautiful, unstructured and creative enterprises.

I think we too often forget the distinction between the technologies that are built for enterprise and those built for personal use. This is probably because all of the tools are built for enterprise. All of them. The ones that aren't are garbage. Web2.0 or 3.0 or whatever, only really make sense for businesses and consumers. Not for people.


Champions Of Order


Software Engineers assume that your data can be ordered logically. This assumption is built in to the file system. They assume that it's an easy task, but that it's best left to the user. Better than imposing an order that doesn't make sense for some users.

But the software engineers are wrong and here's why:

Case Study: Plain Text Library

This is the simplest of all systems to represent digitally. All the items are in the same format, each is a distinct entity. So, you start ordering your items by Author into a heirarchy. The index itself is also an item, but not like the other items, so you decide to put it at the top of the heirarchy.

It is simple and unambiguous. But, like all libraries, there will come a day when you want to extend it a bit. You might want to add a scientific paper, a scientific journal, a DVD box-set or an untitled and anonymous poem. You might want to seperate fiction from non-fiction or keep a commentary together with it's source. At every step, you must decide how to incorporate these new elements and whatever decision you make must be applied unerringly in the future.

From the start, the heirarchy must be robust enough to handle any future additions. The end result - it never works. It's hard enough for libraries with full-time staff, training and documented processes to keep things in order. The average computer user has no chance of making a logical heirarchy that will make sense now or into the future.