Passwords, user models, and Adobe’s mistake

The following is a phenomenal story for illustrating how real-life cybersecurity disasters come from a combination of technical and social failures. In this case, both were necessary for making things as catastrophic as they were.

A couple days ago, it was announced that 130 million Adobe account credentials were compromised by a cyberattack. (If you are an Adobe customer, please make sure you’ve changed your password on any account that shared the same password.) There’s been a file circulating around the Internet that contains the email address, encrypted password, and unencrypted password hint for all these accounts. It’s not too hard to find.

The first interesting thing we learned from this file is that Adobe didn’t salt and hash their passwords before storing. Instead, they used a well-known symmetric encryption algorithm (3DES, ECB-mode) with the same secret key for every account. Even without knowing the secret key, it’s not hard to recover plaintext passwords with high confidence using basic statistics. Simple example: The encrypted password that appears most often in the dataset is probably going to decipher to “123456” or “password”. For this particular dataset, knowing that alone gives you the password for 2+ million accounts. (I’ll explain how you can avoid statistical attacks like this one by salting/hashing in a footnote below.)

The other interesting part is the password hints. A staggering number of people have password hints that are literally, “pwd is 123456”, which helps confirm some of the password guesses that we can make by statistical analysis. My friend Nick Semenkovich posted a sanitized version of the full user dataset (.gz, 1.1G) with emails redacted, which I filtered into a much shorter list of 8262 lines-that-might-contain-the-actual-plaintext-password using grep -Ei ‘\s+(password|pwd?)\s*(is|==?|:)’.

If you take a look at the list, you’ll see an astonishing number of password hints that either seem to give the actual password or say something like, “Password is the same as for Gmail.” The latter is especially bad because if your password is fairly common, I can probably figure it out from statistical analysis and then login to your email account.

Essentially, this is a situation where if EITHER Adobe engineers implemented secure password and password hint obfuscation OR every Adobe user created perfectly random passwords and secure password hints (== probably no hint at all), things would be mostly okay despite the database breach.

I don’t want to be presumptuous about why Adobe didn’t follow recommended password storage protocols, but it seems that the twofold lesson here is that engineers need to have a realistic user model and users need to have less trust in engineers to protect them.

====

A quick primer on salting and hashing passwords:

Hashing for password storage is a pretty magical thing. Unlike encryption (which is an invertible function once you have the secret key), usually the only way to get the input to a secure hash function (aka: the user password) from the output (aka: the password ciphertext) is to brute-force. Hash functions are at least half-magical because similar inputs get mapped to completely different outputs except by coincidence.

If all the users of your website had completely random passwords all the time, you could feel free to post the hashes of their passwords publicly as long as you trusted the hash function, because there’s no easy way to undo a secure hashing process. Something like bcrypt or scrypt with a high work factor.

The problem is that humans tend to reuse strings exactly for passwords, so if we get a database dump we can do statistical deductions and grab the nearest table of precomputed hashes for common strings. So we need adjust our user model from “people who make perfectly random passwords” to “people who might just use ‘password’.”

Then the solution is pretty simple: we just generate a random string for each user when they make their password (a “salt”) and append/prepend the salt to their password before hashing. We have to store the salt in our database alongside each user’s salted-hashed password, but that’s fine because it reveals no information about the plaintext password.

Note that you should never use the same salt for multiple users; otherwise you end up in basically the same situation as Adobe when someone gets your password database dump.