Archive for 'Crypto'
Note: Post has been updated below
Salted hashes? Have I decided to blog about breakfast?
No. By “Hash”, I mean “cryptographic hashes” and by “Salt”, I mean “additional input added to a one way hashing function”. Back in Episode 4 of my Podcast, I talked about a system that was written from the ground up to manage users, passwords, and permissions. During my little rant, I talk about storing passwords as the result of a one-way hashed value, but I didn’t really elaborate.
I realize that many of my regular readers may know this information, but I’ve been surprised at how many that I’ve found who do not. Hopefully, I can shed some light to those who don’t know and also become a viable source in search engine results for when the question is asked.
Let’s get the easy part out of the way first. We KNOW not to store plain text passwords, right? Some people know that and choose instead to store the passwords via two-way cryptography, meaning they can encrypt and then decrypt the password to compare it or email it you. That is also a terrible idea. Now, your entire system is only as secure as the security around your decryption key or decryption certificate. You’ve just made an attacker’s job very easy.
The better way to store passwords is to only store the result of a one-way hash. Then, when someone presents their password for authentication, you just hash the input and compare that to what you have stored in the database. However, even though this is good, it is still not right.
Take this for instance. Here is a sample table with hashed passwords.
Right away, you should be able to see a problem. The hashes for pete, jeff, and ron are all the same. A common attack against hashed passwords is a rainbow table. In that case, dictionary words (or common known phrases) are pre-hashed and those hashes can then be compared against a compromised database. Let’s take a look.
||SHA-3 (256) Value
Now, by comparing, we can see that the password for pete is the word password. That means that the password for jeff and ron are also “password”. By only cracking one hash, we gain access to two other accounts. This is not good.
The fix is to “salt” the password before hashing it. You want that salt to be a unique value. Some people create a random value and then store the salt alongside the password in another database column. Others derive the salt from something like the row’s primary key, etc. Either way is fine (as long as your derived value won’t change).
Now, let’s examine our user table.
We notice right away that none of the user’s hashes are the same. I didn’t change the passwords, but the salt values made the passwords unique so that they all hashed differently. We can no longer tell whose passwords are identical. Also, our plain dictionary attack no longer works. Even though we’ve telegraphed to the attacker what salt to use, the attacker would have to generate rainbow tables across their entire dictionary for each individual salt.
This isn’t 100% secure (nothing is), but this is a best practice and certainly will slow the attackers down. This method of storage, combined with strong passwords should keep your data as safe as it can be.
Thoughts? Disagreements? Share them in the comments section below.
EDIT (5/16/2014): I talked on my podcast referenced above about how easy it is to get behind or to overlook things if you do your own security as yet another reason NOT to do it. I recommended just using existing products or frameworks that have already been hardened over rolling your own. As a perfect example, I talked about doing all of this, but forgot about bcrypt (and others) that are much more secure, salt the value for you, and already have libraries in all of the major languages.
Last time, I showed you guys a method of encoding and decoding values that I created and used to send “secret” messages back and forth. It was stupid and naive, but didn’t hurt anyone because it was only used privately. However, I did step it up a notch the next time and it turns out that I knew just enough to be dangerous.
In a production system (albeit an internal one), we had to do our own authentication. I was “smart” enough to know not to store passwords in plain text in the DB. I also knew that storing them with my weak system wouldn’t be good enough. Somewhere I had come across the idea that you store the passwords as the result of some one way mechanism and then when you want to authenticate, you perform your mechanism on the input and compare the results.
That was all well and good.
What I didn’t know was that this was basically what hashing was. What I also didn’t know was that I had several built-in ways to hash values. So, what I did was modify my original encoding code to make it so that I could no longer reverse the process to get the original values. I figured that I could just do some multiplication or division and ditch the remainder, which would ensure that I could never actually recreate the original value.
I don’t remember exactly what I did, but this code below follows the same general idea and is just as dumb.
In this case, the values Abcdef1 and Abcdef2 both “hash” out to 6199818961914390671, which is called a “collision” and which is BAD. When done this way, it means that someone with a password of Abcdef1 could also use Abcdef2 to get into their account. Any number of valid passwords greater than 1 is a FAIL!
I realize that there are collisions in MD5 and SHA1, but even those would have been more secure than my nonsense. However, at this time, I had SHA256 available to me and could have been reasonably safe (given the limits of computing power at that given time). The worst part is that my “solution” was audited. We explained that we were one-way hashing and that was good enough. The auditors didn’t know enough to realize that errors could be there.
The moral of the story is that you should NEVER try to write your own cryptography or cryptographic hashes. You probably aren’t smart enough. Even the people who are smart enough publish their work and their very very smart peers try like crazy to break their work. I mean, if Bruce Schneier wouldn’t even use his own algorithms without strenuous peer review, then you shouldn’t either.
Be smart and learn from my mistakes. Use safe, tested, tried and true solutions and never ever roll your own crypto.
Let’s just begin with the obvious. I’m an idiot. Fortunately, (I believe) that I’m less of an idiot now than I was over a decade ago. I mean, I see why this stuff is dumb, so that has to count for something, right? I sleep better at night believing that that is the case.
I’ve always been fascinated by security and encoded/encrypted messages ever since I was little, even before I was interested in programming computers. I used to play the game Hacker on my Commodore 64 and pretend that was me doing things for real. I used to pretend that I was a spy who could get into anything. I used to make up “unbreakable” secret codes so that my friends and I could pass “secret messages” at drops around the neighborhood and school. You get the point.
Well, as soon as I learned anything about programming when I was older, one of the first things I did was “invent” a way to encode messages back and forth. I decided to take a page out of the old A=1, B=2 code book and use the ASCII values for characters. The problem was that if they were left as a string of 2 and 3 digit numbers, it would soon become obvious what they were. I decided that I would just mash them all together and make one long string of numbers to kind of disguise what they were (yay, security through obscurity!).
My first issue was that while A is 65 and Z is 90, a is 97 and z is 122. I can’t easily figure out from a long string of numbers how they should be chunked. I needed them to always be available in a predictable chunk. I figured out that if I multiplied the ASCII value by 4, every character that I cared about would become a 3 digit number. Finally, I had my chunking.
I created a VB6 program that had two textboxes and two associated buttons that encoded and decoded messages for you. I don’t have the source code for that program handy (I’m sure it is on a backup somewhere), but it was easy enough to recreate the important methods here below:
The results of running that program are here:
You see that it basically works as advertised. I used it over IM with my brother-in-law a few times to prove the concept and was pretty happy with myself for the results.
Any of you who have your thinking caps on are already starting to see several problems here. If someone got ahold of the program, they could try some things to see if there is a predictable pattern and there is. For instance, A always shows up as 260. Once you know that, you can easily figure out any message with a simple decoder key. You don’t even need a computer at any point. Even if you don’t know that, the encoded messages are still vulnerable (for that reason) to frequency analysis and every other basic code breaking trick.
Pretty harmless exercise as it stands now, but next time I’ll cover how I parlayed this into something that was actually colossally stupid.
Part 2 is located here