Wednesday, May 16, 2007

DNA Data Storage

Some ideas inspire awe, some ideas inspire the comment, "Damn, why didn't I think of that?"

A recent paper in Biotechnology Progress provoked the latter comment from me. Here Yachie et al. 2007 describe a living data storage system. The ability of DNA to encode information is well known; DNA sequences code for amino acid sequences of proteins. Combine that with the ability to construct custom DNA sequences and to clone specified DNA sequences into organisms, you've got a super data storage system.

Yachie et al. write, "Duplicated data encoded by different oligonucleotide sequences was inserted redundantly into multiple loci of the Bacillus subtilis genome. Multiple alignment of the bit data sequences decoded by B. subtilis genome sequences enabled the retrieval of stable and compact data without the need for template DNA, parity checks, or error-correcting algorithms."

Oh that's pretty cool. Say you want to save the information, "Mary had a little lamb." You could have nucleotide base triplets signify the letters of the alphabet, i.e. aaa = A, aac = B, aag = C and so on. Construct a DNA sequence reflecting your code for "Mary had a little lamb." Insert the oligionucleotide sequences redundantly into multiple locations in the non-coding regions of the B. subtilis genome. When you want to read your message at a later time, use primers to amplify the coded regions, then sequence and enjoy! The redundancy ensures that any errors (i.e. mutations) will be detected during multiple alignment software.

Oh and there's more. B. subtilis can be induced to form spores that can survive for millions of years. Compare that to DVDs and CDs. Seriously... cassette tapes were more reliable than CDs. Of course storing info this way is a bit expensive, but when the costs drop, perhaps we will use bacteria or viruses instead of silicon chips!

Photo by Yoshiaki Ohashi.

6 comments:

  1. First of all, congrat for your Blog, I really enjoy it. I just read your post and I think that is a little bit hilarious. I can imagine many forms and many ways to store info in a better way than coding it in DNA… like you said process like mutation or recombination (process like conjugation, transduction or transformation) will obstruct the decode of our info. Even more, due to other properties of the genome like transposons, or genome evolution this storage will be very difficult.

    In my opinion we are so far away from creating a data storage that emulates DNA…

    ReplyDelete
  2. First of all, congrat for your Blog, I really enjoy it. I just read your post and I think that is a little bit hilarious. I can imagine many forms and many ways to store info in a better way than coding it in DNA… like you said process like mutation or recombination (process like conjugation, transduction or transformation) will obstruct the decode of our info. Even more, due to other properties of the genome like transposons, or genome evolution this storage will be very difficult.

    In my opinion we are so far away from creating a data storage that emulates DNA…

    ReplyDelete
  3. Ha, what a perfect application for synthetic biology.

    ReplyDelete
  4. nice thought, especially the one with the spores, though indeed, bacteria are way to sloopy in the way they handle their genetic code. i would not like my harddrive leaking its contents to my neighbors. however: DNA is the stuf to think about for the time being. it has the potential of a bit to the power of two in saving information - thats what we ought to investigate in further. put dna in our HDDs ;)

    ReplyDelete
  5. This is very possible. As the tecnology advances we may so day use this for long term storage.

    ReplyDelete
  6. by the way, someone has done the Mary had a little lamb!

    http://www.biotechniques.com/BiotechniquesJournal/2009/September/An-improved-Huffman-coding-method-for-archiving-text-images-and-music-characters-in-DNA/biotechniques-176744.html

    ReplyDelete