Big data meets tiny storage!
Data can be a drag. Whether you work in weather, satellite surveillance, astronomy or particle physics, data is stacking up. Even as hard disks get larger and cheaper, some say DNA, the information reservoir of life, could offer a dramatically better storage mechanism.
In Nature this week, information scientists demonstrated what they called a practical mechanism for encoding computer data in artificial DNA, and then reading it with perfect accuracy at the other end.
First author Nick Goldman, at the European Bioinformatics Institute in the United Kingdom, says that in genetics, like many other fields, “storage is a real problem. Databases are growing exponentially, but budgets sadly aren’t.”
A computer can store every letter, number and punctuation mark on our keyboards in one “byte” (a string of eight ones or zeroes). Likewise, the four “bases” of DNA (labeled C, G, A and T) can be converted into a biological code to convey the same information as natural language.
In biology, a three-letter string of DNA codes for a single amino acid, the building blocks of proteins.
Theoretically, a string of five DNA bases could carry 1024 distinct meanings, but because placing identical bases side by side raises the chance of errors during artificial synthesis, Goldman’s string had considerably lower, but still formidable, capacity.
By translating digital data into this DNA code, Goldman and company created a 739-kilobyte cache containing one photo, all 154 Shakespearean sonnets, audio from Martin Luther King’s “I have a dream” speech, and a pdf of the 1953 journal article that unveiled the structure of DNA.
After each byte of data was converted into a five-base string of DNA, a machine in California squirted out more than 150,000 strings of DNA holding the encoded data. The DNA then flew across the globe via Fed-Ex to Europe, no cooling needed.
Super string theory
Instead of making one giant molecule, the synthesizer created strings with 117 bases. Each string contained data (that series of five letters), and an “index” section to position that data in the output.
Goldman says the coding process could be used to store any digital information from a computer. “DNA has a very dense rate of information storage; it’s light and small, and our coding scheme could be used for a zettabyte — a million million gigabytes, which is pretty much the total amount of digital information estimated to be around today.”
However, as Goldman admits, this “would be breathtakingly expensive right now,” due largely to the cost of synthesizing DNA. But he says that cost has fallen 100-fold over a decade, and an equal drop in the next decade could make the technique competitive with other storage technologies — for data to be stored for 50 years.
To avoid crashes, hard disks generally spin full-time, sucking up electricity. Magnetic tapes are used for larger amounts of data, and although the tapes are usually idle, they need periodic rewriting, and are clumsy to handle.
If the cost of DNA synthesis continues to fall, “there must be some point in time when it is cheaper to store information in DNA than in something that requires electricity or other maintenance costs,” says Goldman. “A great property of DNA is that you don’t need electricity to store it. If it’s cold, dry and dark, DNA lasts for a very long time. We can routinely sequence woolly mammoth DNA that has been kept in those conditions for thousands of years.”
Although a group at Harvard announced data storage in DNA last fall, the current effort introduces error correction, says coauthor Ewan Birney. “That was part of trying to think of this as a realistic technology. Error correction is ubiquitous, in hard disks, in mobile phones. In almost every circumstance, the information gets a little bit corrupted; the point is … to recover and correct.”
Because each stretch of DNA is created multiple times, it can be read multiple times during decoding. If data fails to correspond on strings that are supposed to be identical, the correct version is chosen by majority vote.
True data democracy.
So what would a Shakespeare sonnet weigh, once encoded in DNA? Less than a millionth of a millionth of a gram. Do the math: a gram of engineered DNA could hold a trillion sonnets.
Note to William S: Buy another quarto of foolscap: Now is no time for writer’s block!
— David J. Tenenbaum
- Towards a practical, high-capacity, low-maintenance information storage system in synthesized DNA, Nick Goldman et al, Nature published online 23 Jan. 2013. ↩
- Book on tape? No, book on DNA! ↩
- Harvard delves into DNA data storage ↩
- A brief history of DNA storage ↩
- Wait, what’s DNA again? ↩
- DNA, photographed with an electron microscope ↩