All the crypto code you’ve ever written is probably broken
tl;dr: use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption.
Do you keep up on the latest proceedings of the IACR CRYPTO conference? No? Then chances are whenever you have tried to use a cryptographic library you made some sort of catastrophic mistake which would lead to a complete loss of confidentiality of the data you’re trying to keep secret.
The most important question is: are you using an authenticated encryption mode? If you don’t know what authenticated encryption is, then you’ve probably already made a mistake. Here’s a hint: authenticated encryption has nothing to do with authenticating users into a webapp. It has everything to do with ensuring the integrity of your data hasn’t been compromised, i.e. no one has tampered with the message.
Why is authenticated encryption so poorly known despite being so important? I don’t know. Perhaps it’s because the need for it wasn’t formally proven until the year 2000. And chances are you’ve never heard of authenticated encryption at all, because despite the best efforts of the cryptographic community it remains a relatively poorly-known concept.
Most of the cryptographic APIs you’ve ever encountered have probably made you run a gambit of choices for how you want to encrypt data. You might think AES-256 is the way to go, but by default your crypto API might select ECB mode, which is so bad and terribly insecure it isn’t even worth talking about. Perhaps you select CBC or CTR mode, but your crypto API doesn’t make you specify a random IV and will always encrypt everything with an IV of all zeroes, which if you ever reuse the same key will compromise the confidentiality of your data.
Let’s say you’ve gotten through all of that and are now using something like AES-CTR mode with a random IV per message. Great. Do you think you’re secure now? Probably not. A sophisticated attacker might attempt a man-in-the-middle attack, which gives him the ability to execute “chosen ciphertext” attacks (CCAs). To defend against these you must also ensure the integrity of your data, or otherwise confidentiality might be lost.
You may have learned you need to use a MAC to do this (and if you didn’t you’re most likely insecure!). You may have selected HMAC for this purpose. But you’re still left with three options here! Do you compute the MAC of the plaintext or the ciphertext. If you compute the MAC of the plaintext, do you encrypt it along with the plaintext, or do you append it to the end of the ciphertext? Or to spell it out more precisely, which of the following do you do?
- Encrypt and MAC: encrypt the plaintext, compute the MAC of the plaintext, and append the MAC of the plaintext to the ciphertext
- Encrypt then MAC: encrypt the plaintext, compute the MAC of the ciphertext, and append the MAC of the ciphertext to the ciphertext
- MAC then Encrypt: MAC the plaintext, append the MAC to the plaintext, then encrypt the plaintext and the MAC
Edit: (this is important enough I feel the need to edit it retroactively)
If you have answered any of the above questions incorrectly (the correct answer to the above question is “encrypt then MAC”) you’ve quite likely created an insecure cryptographic scheme. Unless you really know what you’re doing and can answer all these questions correctly (and even then!), you probably shouldn’t be trying to build your own cipher/MAC constructions and should defer to cryptographic experts who specialize in that sort of thing. These cipher/MAC constructions are called authenticated encryption modes.
If you find yourself reaching for any form of encryption that isn’t an authenticated encryption mode, you’re probably doing it wrong. You shouldn’t ever be choosing between CBC or CFB or CTR (or god forbid ECB). Unless you’re a cryptographer, these should be considered dangerous low-level primitives not for the consumption of mere mortals.
That said, what should you be using?
NIST-approved AEAD block ciphers: AEAD stands for Authenticated Encryption with Associated Data, and represent ciphers that simultaneously provide confidentiality and integrity of data. Examples of these ciphers include EAX, GCM, and CCM modes. (Edit: some cryptographers have suggested I probably shouldn’t even recommend using this directly as there are still a number of attacks you will probably be susceptible unless you know what you’re doing, especially if you have a service accessible to an active attacker)
djb’s authenticated encryption modes in NaCl: there are two authenticated encryption modes available in the Networking and Cryptography library by Daniel J. Bernstein: crypto_secretbox and crypto_box, which respectively provide symmetric and pubkey modes of encryption and integrity checking.
(Edit: adding this retroactively) Google Keyczar also provides a high-level cryptographic toolkit with authenticated encryption modes.
(Edit: also adding this retroactively) GPG is one of the easiest cryptographic tools to use that provides high-level functionality intended for cases where you would like authenticated encryption.
EAX is one of the recommended modes and is relatively easy to understand: it’s a combination of AES-CTR mode and CMAC (a.k.a. OMAC1) which is a MAC derived from a block cipher (in this case AES). While EAX mode is relatively simple to understand and you may be tempted to implement it yourself it if it’s unavailable in your language environment, you probably shouldn’t, as there are a number of potential pitfalls that await you and unless you know what you’re doing (and even then!) you’re likely to get it wrong.
If I’ve scared you enough by now, you my be googling around to discover if there’s an implementation of any of the above modes in your respective programming language environment, and sadly in many language environments you may turn up empty. In these cases, there’s not much you can do except petition your language maintainers who specialize in cryptography to expose APIs to authenticated encryption modes.
Authenticated encryption is something you should use as a complete package, implemented as a single unit by a well-reputed open source cryptographic library and not assembled piecemeal by people who do not specialize in cryptography.
Bottom line: unless you’re using authenticated encryption, you are opening yourself up to all sorts of attacks you can’t even anticipate, and shouldn’t consider the data you’re storing confidential.
Edit: several people have asked about more information on everything I’ve described here, most notably why various MACing schemes are secure or insecure. If you are really interested in this topic, I strongly recommend you take the Stanford Crypto class on Coursera which is what inspired me to write this blog post to begin with.