Hash Functions

Connected: An Internet Encyclopedia
Hash Functions

Up: Connected: An Internet Encyclopedia
Up: Topics
Up: Concepts
Up: Cryptography
Up: Algorithms
Prev: Algorithms
Next: Block Ciphers

Hash Functions

Hash Functions Hash Functions take a block of data as input, and produce a hash or message digest as output. The usual intent is that the hash can act as a signature for the original data, without revealing its contents. Therefore, it's important that the hash function be irreversible - not only should it be nearly impossible to retrieve the original data, it must also be unfeasible to construct a data block that matches some given hash value. Randomness, however, has no place in a hash function, which should completely deterministic. Given the exact same input twice, the hash function should always produce the same output. Even a single bit changed in the input, though, should produce a different hash value. The hash value should be small enough to be manageable in further manipulations, yet large enough to prevent an attacker from randomly finding a block of data that produces the same hash.

MD5, documented in RFC 1321, is perhaps the most widely used hash function at this time (2001). It takes an arbitrarily sized block of data as input and produces a 128-bit (16-byte) hash. It uses bitwise operations, addition, and a table of values based on the sine function to process the data in 64-byte blocks. RFC 1810 discusses the performance of MD5, and presents some speed measurements for various architectures.

The other major hash function is NIST's Secure Hash Algorithm (SHA), specified in FIPS PUB 180-1. It is similar to MD5 in that it does various bit shuffling to process the input data in blocks. SHA's output is a 160-bit hash.

Hash functions can't be used directly for encryption, but are very useful for authentication. One of the simplest uses of a hash function is to protect passwords. UNIX systems, in particular, will apply a hash function to a user's password and store the hash value, not the password itself. To authenticate the user, a password is requested, and the response run through the hash function. If the resulting hash value is the same as the one stored, then the user must have supplied the correct password, and is authenticated. Since the hash function is irreversible, obtaining the hash values doesn't reveal the passwords to an attacker. In practice, though, people will often use guessable passwords, so obtaining the hashes might reveal passwords to an attacker who, for example, hashes all the words in the dictionary and compares the results to the password hashes.

Another use of hash functions is for interactive authentication over the network. Transmitting a hash instead of an actual password has the advantage of not revealing the password to anyone sniffing on the network traffic. If the password is combined with some changing value, then the hashes will be different every time, preventing an attacker from using an old hash to authenticate again. This approach is used in PPP's CHAP authentication scheme. The server sends a random challenge to the client, which combines the challenge with the password, computes the hash value, and sends it back to the server. The server, possessing both the stored secret password and the random challenge, performs the same hash computation, and checks its result against the reply from the client. If they match, then the client must known the password to have correctly computed the hash value. Since the next authentication would involve a different random challenge, the expected hash value would be different, preventing an attacker from using a replay attack.

A slight variant of this scheme can be used to generate data signatures. Assume that both the sender and recipient of some data message share a secret value. Then, by combining the data message with the secret, and running it through a hash function, a signature is generated in the form of the hash value. The data message is transmitted along with the signature. The recipient combines the received message with the secret, generates a hash value, and checks to make sure it's identical to the signature. The message's authenticity is thus verified. The widely used HMAC (RFC 2104) algorithm is based on this idea.

Thus, hash functions, though not encryption algorithms in their own right, can be used to provide significant security services, mainly identity authentication. Furthermore, as the next page discusses, hash functions can be easily converted into full-blown encryption algorithms, and thus form the basis for all modern cryptosystems.

Next: Block Ciphers

Connected: An Internet Encyclopedia
Hash Functions