Category: Crypto

Code Tips

Why Salt Your Hashes?

Note: Post has been updated below

Salt Salted hashes? Have I decided to blog about breakfast?

No. By “Hash”, I mean “cryptographic hashes” and by “Salt”, I mean “additional input added to a one way hashing function”. Back in Episode 4 of my Podcast, I talked about a system that was written from the ground up to manage users, passwords, and permissions. During my little rant, I talk about storing passwords as the result of a one-way hashed value, but I didn’t really elaborate.

I realize that many of my regular readers may know this information, but I’ve been surprised at how many that I’ve found who do not. Hopefully, I can shed some light to those who don’t know and also become a viable source in search engine results for when the question is asked.

Let’s get the easy part out of the way first. We KNOW not to store plain text passwords, right? Some people know that and choose instead to store the passwords via two-way cryptography, meaning they can encrypt and then decrypt the password to compare it or email it you. That is also a terrible idea. Now, your entire system is only as secure as the security around your decryption key or decryption certificate. You’ve just made an attacker’s job very easy.

The better way to store passwords is to only store the result of a one-way hash. Then, when someone presents their password for authentication, you just hash the input and compare that to what you have stored in the database. However, even though this is good, it is still not right.

Take this for instance. Here is a sample table with hashed passwords.

user	password
pete	b68fe43f0d1a0d7aef123722670be50268e15365401c442f8806ef83b612976b
bill	59dea5f67aea4662c26a5ac6452233e783407d55c4f96d6c4df6f0d7c06c58af
jeff	b68fe43f0d1a0d7aef123722670be50268e15365401c442f8806ef83b612976b
andy	b6642c42bd670b0c070dd45d087877a4bc8d6ee29c88df59273ea48ed72b76c4
ron	b68fe43f0d1a0d7aef123722670be50268e15365401c442f8806ef83b612976b

Right away, you should be able to see a problem. The hashes for pete, jeff, and ron are all the same. A common attack against hashed passwords is a rainbow table. In that case, dictionary words (or common known phrases) are pre-hashed and those hashes can then be compared against a compromised database. Let’s take a look.

password	SHA-3 (256) Value
password	b68fe43f0d1a0d7aef123722670be50268e15365401c442f8806ef83b612976b
letmein	ceaa5fd0a764ad8202f43f2efc860d8c7472911ca9d1ccea2dc232713ae1fc0d
blink182	aadfce5bdba224673c168fb861f45cdd6ebf4e34d35001ae933bd53b7f6b337f
password1	abbe6325ea0d23629e7199100ba1e9ba2278c0a33a9c4bfc6cd091e5a2608f1a

Now, by comparing, we can see that the password for pete is the word password. That means that the password for jeff and ron are also “password”. By only cracking one hash, we gain access to two other accounts. This is not good.

The fix is to “salt” the password before hashing it. You want that salt to be a unique value. Some people create a random value and then store the salt alongside the password in another database column. Others derive the salt from something like the row’s primary key, etc. Either way is fine (as long as your derived value won’t change).

Now, let’s examine our user table.

user	salt	password
pete	I7Yrs9THQyLxpVllSwbf	9b7ec6d82075a9e7d8227897e8919785031b9a7cdab5750dea044390d1fd1f46
bill	K0kJJCQcVVqfLzykcpbP	297d00ae29ff3c32fe874c00d0154085ac862a154b061c17cd465de7f1cdee9a
jeff	NwV7PdmPUKY6GgScEUqu	c2936d36583d0513980e496005872e4954d142ed823b7b0b1abf28211efc538f
andy	GpHrXjbQRTjObZWM7jbd	0338bd60f7d761ce9c8922087e87c9ccb7936bb5f9c5c28d72fd28f4d8708e6b
ron	iHh8SX7fQEF2WFUOfxEp	07f459276c9be7d63aa8d57dac7468c8b16dd4367e91615fb9972543a707c403

We notice right away that none of the user’s hashes are the same. I didn’t change the passwords, but the salt values made the passwords unique so that they all hashed differently. We can no longer tell whose passwords are identical. Also, our plain dictionary attack no longer works. Even though we’ve telegraphed to the attacker what salt to use, the attacker would have to generate rainbow tables across their entire dictionary for each individual salt.

This isn’t 100% secure (nothing is), but this is a best practice and certainly will slow the attackers down. This method of storage, combined with strong passwords should keep your data as safe as it can be.

Thoughts? Disagreements? Share them in the comments section below.

EDIT (5/16/2014): I talked on my podcast referenced above about how easy it is to get behind or to overlook things if you do your own security as yet another reason NOT to do it. I recommended just using existing products or frameworks that have already been hardened over rolling your own. As a perfect example, I talked about doing all of this, but forgot about bcrypt (and others) that are much more secure, salt the value for you, and already have libraries in all of the major languages.

Classic POS

Dumb Things I’ve Done in the Name of Crypto (part 2)

Collision, originally from carinsurancecomparison.com Last time, I showed you guys a method of encoding and decoding values that I created and used to send “secret” messages back and forth. It was stupid and naive, but didn’t hurt anyone because it was only used privately. However, I did step it up a notch the next time and it turns out that I knew just enough to be dangerous.

In a production system (albeit an internal one), we had to do our own authentication. I was “smart” enough to know not to store passwords in plain text in the DB. I also knew that storing them with my weak system wouldn’t be good enough. Somewhere I had come across the idea that you store the passwords as the result of some one way mechanism and then when you want to authenticate, you perform your mechanism on the input and compare the results.

That was all well and good.

What I didn’t know was that this was basically what hashing was. What I also didn’t know was that I had several built-in ways to hash values. So, what I did was modify my original encoding code to make it so that I could no longer reverse the process to get the original values. I figured that I could just do some multiplication or division and ditch the remainder, which would ensure that I could never actually recreate the original value.

I don’t remember exactly what I did, but this code below follows the same general idea and is just as dumb.

The results of the horrible 'hash' function

In this case, the values Abcdef1 and Abcdef2 both “hash” out to 6199818961914390671, which is called a “collision” and which is BAD. When done this way, it means that someone with a password of Abcdef1 could also use Abcdef2 to get into their account. Any number of valid passwords greater than 1 is a FAIL!

I realize that there are collisions in MD5 and SHA1, but even those would have been more secure than my nonsense. However, at this time, I had SHA256 available to me and could have been reasonably safe (given the limits of computing power at that given time). The worst part is that my “solution” was audited. We explained that we were one-way hashing and that was good enough. The auditors didn’t know enough to realize that errors could be there.

The moral of the story is that you should NEVER try to write your own cryptography or cryptographic hashes. You probably aren’t smart enough. Even the people who are smart enough publish their work and their very very smart peers try like crazy to break their work. I mean, if Bruce Schneier wouldn’t even use his own algorithms without strenuous peer review, then you shouldn’t either.

Be smart and learn from my mistakes. Use safe, tested, tried and true solutions and never ever roll your own crypto.

Classic POS

Dumb Things I’ve Done in the Name of Crypto (part 1)

Let’s just begin with the obvious. I’m an idiot. Fortunately, (I believe) that I’m less of an idiot now than I was over a decade ago. I mean, I see why this stuff is dumb, so that has to count for something, right? I sleep better at night believing that that is the case.

I’ve always been fascinated by security and encoded/encrypted messages ever since I was little, even before I was interested in programming computers. I used to play the game Hacker on my Commodore 64 and pretend that was me doing things for real. I used to pretend that I was a spy who could get into anything. I used to make up “unbreakable” secret codes so that my friends and I could pass “secret messages” at drops around the neighborhood and school. You get the point.

Well, as soon as I learned anything about programming when I was older, one of the first things I did was “invent” a way to encode messages back and forth. I decided to take a page out of the old A=1, B=2 code book and use the ASCII values for characters. The problem was that if they were left as a string of 2 and 3 digit numbers, it would soon become obvious what they were. I decided that I would just mash them all together and make one long string of numbers to kind of disguise what they were (yay, security through obscurity!).

My first issue was that while A is 65 and Z is 90, a is 97 and z is 122. I can’t easily figure out from a long string of numbers how they should be chunked. I needed them to always be available in a predictable chunk. I figured out that if I multiplied the ASCII value by 4, every character that I cared about would become a 3 digit number. Finally, I had my chunking.

I created a VB6 program that had two textboxes and two associated buttons that encoded and decoded messages for you. I don’t have the source code for that program handy (I’m sure it is on a backup somewhere), but it was easy enough to recreate the important methods here below:

The results of running that program are here:

The results of my encoding/decoding program.

You see that it basically works as advertised. I used it over IM with my brother-in-law a few times to prove the concept and was pretty happy with myself for the results.

Any of you who have your thinking caps on are already starting to see several problems here. If someone got ahold of the program, they could try some things to see if there is a predictable pattern and there is. For instance, A always shows up as 260. Once you know that, you can easily figure out any message with a simple decoder key. You don’t even need a computer at any point. Even if you don’t know that, the encoded messages are still vulnerable (for that reason) to frequency analysis and every other basic code breaking trick.

Pretty harmless exercise as it stands now, but next time I’ll cover how I parlayed this into something that was actually colossally stupid.

Part 2 is located here

Pete on Software