There
are several words that mean different things to technical and
non-technical people. One of these is "mole;" another is "unionized."
Then there's "hash." To the common man, hash is a comestible, a dish
prepared with chopped meat, combined with vegetables, and served with
gravy. The word hash derives from the French word, "hatcher," which means "to chop." A hash, or hash function,
in the technical sense, is a way to provide a fingerprint to an
electronic file. This fingerprint, or hash value, is generally a string
of between twenty and thirty-two bytes. Changing only a single
character in a file will change its hash value so completely that is
bears no resemblance to the hash value of the original file.
Importantly, it is infeasible to construct a file with the same hash
value as a given file unless it copies the given file exactly,
sequential character for sequential character, a property of hash
functions called "anti-collision."
An important use of hash functions is the maintenance of login
passwords. When you log onto a computer, the computer doesn't compare
your typed password to a previously stored password. Instead, it
compared the hash value of what you typed with the stored hash value of
your password, and this protects your password. If someone gains access
to the hashed password file on the computer, they still don't know your
password, since the hash is a "one-way function." Knowledge of the hash
value gives you no knowledge of the number and/or letter string that
generated it. That's why a system administrator is able to reset your
password to a new value, but he's not able to tell you your current
password.
Why is the anti-collision property important? As an example, let's
consider a sales contract with a negotiated price of $10,000. The
seller decides to increase his profit by adding a single character,
another zero, to make the price $100,000. This change will be reflected
immediately in the hash value of the contract. Perhaps the seller
realizes this, so he decides to add a few extra spaces, some
non-printing characters, or perhaps a benign appendix that looks as if
it was part of the contract, to generate the same hash value.
Anti-collision makes this impossible.
Hash functions are so important to electronic commerce that they are standardized by the U.S. National Institute of Standards and Technology (NIST).
As computer technology advances, especially the computation speed of
computers, there is a need for better and better hash functions. The
most used hash function, released by the U.S. National Security Agency (NSA)
in 1995, is called SHA-1. SHA stands for "Secure Hash." Now, twelve
years and many CPU cycles later, SHA-1 is starting to show it's age.
There's always a trade-off between the speed of a hash function and its
security. SHA-1 was designed to give reasonable security at the
processing speed of computers in 1996. Now, twelve years later,
cryptanalysts have discovered some weaknesses in the SHA-1 algorithm. A
concerted effort may now allow generation of files with the same hash
values, demolishing the anti-collision property.
Waiting in the wings is a similar, but more complex, hash function
called SHA-256. Still, both SHA-1 and SHA-256 are based on principles
established fifteen years ago, so it's time to rethink hash. In this
effort, NIST is enlisting the aid of academic cryptographers by
establishing a hash competition. This approach is not new. In 1997,
NIST decided that the current encryption algorithm, called the Data Encryption Standard (DES),
was becoming vulnerable to attack, so it started a competition for a
replacement encryption algorithm. NIST chose a slightly modified
version of an algorithm submitted by two Belgian cryptographers,
Rijndael, for its Advanced Encryption Standard.
There were fifteen submissions from ten countries in the AES
competition. NIST sponsored two workshops on the requirements for a new
hash function in 2005 and 2006. A draft of the minimum acceptability
requirements, submission requirements, and evaluation criteria were
published in the January 23, 2007, issue of the Federal Register.
Submissions are due by fall 2008, and the standard will be announced in
2011. By the way, there is no prize in this competition, other than the
envy of your peers, and hash immortality.
References:
1. Bruce Schneier, "An American Idol for Crypto Geeks." (Wired Magazine Online, Feb, 08, 2007)
2. Hash Function (Wikipedia).
3. Advanced Encryption Standard (Wikipedia).
4. Advanced Encryption Standard (PDF File).
5. NIST's Plan for New Cryptographic Hash Functions.
6. The SHA-1 hash value of this file (before I added this reference!)
was 5158509721f10d91f729c080aed2f4263521bac2. If you have access to a
Linux system, you can get the hash of any file using the command,
openssl dgst -sha1 {filename). For a bytewise listing
(51:58:50:97:21:f1:0d:91:f7:29:c0:80:ae:d2:f4:26:35:21:ba:c2), type
openssl dgst -c -sha1 {filename).