A startup named Bitcasa made a splash at the TechCrunch Disrupt conference last week by promising unlimited backups to the cloud for just $10 a month. Bitcasa said it provided privacy and security by encrypting all files, but was able to offer a very inexpensive service because it avoided the storage of duplicate files, especially music and movies. The savings from de-duplication could be considerable because with entertainment content, large numbers of people tend to have copies of a small number of movies or songs.
But at first glance, these claims seemed to be contradictory. If files are encrypted, no one but the owner, or someone else given the key, has any idea of what the contents are. And if you don’t know what is in files, the sort of de-duping Bitcasa promises seems impossible.
But, as Bitcasa CEO Tony Gauda explains in this TechCrunch interview, the company is taking advantage of a new and very clever trick called convergent encryption. If you want to get deep into the weeds of the technology, it is explained in detail in this paper, but here is how it works.
The trick is to use the file to create its own encryption key. A mathematical function called a one-way hash reduces the file to a relatively short string of digits, typically 256 or 512 bits. It’s called one-way because while each file generates a unique hash (there is a vanishingly small possibility that two different files could generate the same hash), there is no way to reconstruct the original from the hash. The hash is then used to encrypt the file using the Advanced Encryption Standard.
Say Alice and Bob each have the same song on their hard drives and both use Bitcasa. Alice backs up first, so the song is reduced to a hash and then encrypted on her computer using that hash. When it’s Bob’s turn, his computer goes through the same process and creates an encrypted file identical to Alice’s. Bitcasa checks its server records, finds a match, and realizes it doesn’t need another copy of the file. In fact, it could save the cost and trouble of transmitting the file to its data center, but it’s not clear whether it actually does that.
Of course, it’s not quite as simple as that. With normal encryption, all files are scrambled using the same key, so the user only has to hang on to one vital chunk of information. In the convergent technique, each file uses its own key so the system has to include a separate map file that links each user’s keys to the files they will decrypt. Then all the user needs is the key to the map file.
Converged encryption has some weaknesses compared with conventional encryption. One is that it makes possible a sort of traffic analysis. An adversary who has access only to the encrypted files could still learn that Alice, Bob, Carol, and Dave all had copies of the same group of files because they would be storing identical cyphertexts (or more accurately, identical keys to those files.) They might just have the same taste in music, or they might be collaborating on a secret project. More sophisticated attacks my also be possible, but assuming that it is properly implemented–always a huge assumption when dealing with encryption–the approach does seem good enough for most encryption needs.