I was doing various things, for various reasons (as one does), when I stumbled upon the 150M Adobe password leak file.
Let's take a look!
The use of the quotes around encrypted comes from the fact Adobe stated that all of the passwords were encrypted. The password values themselves are encoded in base64, I wouldn't put it past a company to state that the data was encrypted, when they really meant it was encoded via base64. Decoding the base64 doesn't give us much- it's certainly not plaintext passwords. Interestingly, the decoded passwords' lengths are all multiples of 8 (8, 16, 24, 32...) . To me, if we assume these passwords really were encrypted, this implies they used a block cipher. No matter how long your password is, they will pad it to the next multiple of 8- if your password was 11 characters, it is encrypted to be 16 characters. If they instead used a stream cipher, each encrypted password would be the length of the password itself. We are getting somewhere!
Can we learn anything from this? Normally passwords are hashed- if you brute force hash a large collection of passwords, eventually your hashes collide and you know you have broken the password. If this data is encrypted, we would have to determine the algorithm they used and the key they used to encrypt it. Depending on the type of encryption used, trying to break the key could prove to be almost impossible. So for now, I have ruled out that path.
Scrolling through the file, I see what appears to be a glitch in the matrix- Two different encrypted passwords, for two different users are the exact same. What?! It seems like Adobe used an ECB method of encryption. The classic example of how ECB fails is on Wikipedia. Essentially, the same plaintext blocks will always encrypt to the same ciphertexts.
This is big! We can't know what the passwords are, because they are encrypted, but we can find patterns. Let's look at an example:
|email@example.com-||-YieH39Fn3eedenMflfXBxzA==-||-what am I?||--|
|firstname.lastname@example.org-||-YieH39Fn3eefioxG6CatHBw==-||-What are you?||--|
|email@example.com-||-YieH39Fn3eefioxG6CatHBw==-||-What am I?||--|
All of these people use the same password, or at least the same initial characters in their password. And many of the password hints are the same! We're cooking with fire!
What else can we tease out? The most common password in the set is "EQ7fIpT7i/Q=". Looking at the password hints for people who used this password we see things like:
We know all these people used "123456" as their password. Even though we can't decrypt the passwords, Adobe has leaked so much info using improper crypto. Now if we see this block anywhere we know it contains 123456. Do this a few times and you start to slowly "unlock" everyone's password.
As always, a relevant XKCD.