This page lists resources for those interested in the field of one-way cryptographic hash functions. It contains background material on current hash algorithms, papers on the strengths and weaknesses of hashes, and other material that is relevant to the current discussions of hash algorithms. This page will hopefully be useful both to experts and those new to the field of cryptographic hashes.
This site is a (rarely-updated) work-in-progress; if you have suggestions for additions or comments on the text here, by all means please send mail to Paul Hoffman <firstname.lastname@example.org>.
Table of contents
1. Introduction to hashes
2. Algorithm definitions
3. Attacks and constructions
4. Cryptography books
Many parts of cryptography make some sense when described briefly, and hash functions follow this rule. Most computer professionals know a bit about hash functions, mostly from using the MD5 or SHA-1 functions to generate fingerprints for files. Of course, there is much more to hashes than that. This section gives an overview to hash functions for the general public; experts should feel free to skip over it.
Much of this section is lifted from RFC 4270, Attacks on Cryptographic Hashes in Internet Protocols. Also, there is a list of good cryptography books near the end of this page.
A hash function converts an input of an arbitrary-length string of bits into a particular fixed-length output. In general, the output (which is often called "the hash value" or just "the hash") can be used as a fixed-size representation of the input.
Externally, the most obvious feature of a hash function is the length of its output (often called L). There are many well-known hash functions; some of them include MD5, SHA-1, and SHA-256. L for MD5 is 128 bits; L for SHA-1 is 160 bits; L for SHA-256 is 256 bits.
Good hash functions have some features in common:
(As a side note, finding an arbitrary set of three messages that have the same hash should require about 2^(L/1.5) guesses.)
A typical non-cryptographic use for hash functions is for validating that the contents of a file have not been changed in transit. The originator computes the hash of the contents of the file and transfers it to the recipient of the file. The recipient uses the same hash function and verifies that the two hash values are the same.
The property of good hash functions that there is even probability that each bit in the output is a 0 or 1 for an arbitrary input makes hashes particularly good for human-based content validation. If a single bit in the input is changed, or the length of the input is changed by a single bit, it is likely that half of the bits in the output will change, and thus the representation of the hash will change radically.
Hash values are usually expressed in hexadecimal notation.
Thus, the hash value of a file (using the SHA-1 hash function)
might look like:
Changing a single bit in the file will change the hash value to:
In cryptographic constructs, hashes are used in two distinct areas: as shorter representatives of typically-longer messages, and in message authentication codes (MACs) that use symmetric cryptography to authenticate that a message has not changed.
Working on shorter values in cryptographic operations is often faster than working on longer values. In particular, encrypting a short value is faster than encrypting a long value, so encrypting just the hash of a message (that is, the representation of a message) is faster than encrypting the entire message, assuming that the message is longer than the length of the hash function. In typical cryptographic uses, most inputs are longer than L, which is 16 to 32 bytes for common hash functions.
The basic idea of a MAC is quite simple: instead of the originator hashing just a message, he concatenates a secret that is only known to the recipient to the message, and then hashes the combined message. The recipient of such a MAC combines the secret with the message and verifies that the hash matches the MAC that came with the message. This both verifies the integrity of the message and assures the recipient that it was the originator who created the MAC. As long as the secret has enough entropy, an attacker cannot spoof the message, nor can the attacker figure out the secret.
Like every cryptographic function, hashes are susceptible to brute-force attacks. The longer L is, the more work an attacker has to do to mount an attack; however, hashes with longer L also are usually slower to computer.
There are three important attacks on hashes:
The difference between a collision attack and either of the two preimage attacks is crucial. At the time of this writing, there are no practical preimage attacks, meaning that if your use of hashes is only susceptible to preimage attacks, even MD5 is just fine because at attacker would have to make 2^128 guesses, which will be infeasable for many decades (if ever).
Systems susceptible to a collision attack are in much more danger for two reasons. For hash functions with a short L such as MD5, the attacker can possibly mount a serious collision attack today with lots of computer power. But, more seriously, researchers have recently discovered some weaknesses in the collision-resistance of MD5 and SHA-1 which make them even more susceptible. In fact, finding collisions in MD5 is now fairly easy, with some restrictions.
By far, the two most commonly-used hashes are MD5 and the SHA family. There are a host of other hash functions (and more on the way!); a few are listed here.
RFC 1321, The MD5 Message-Digest Algorithm
Secure Hash Standard is the official descriptions of the SHA family from NIST.
US Secure Hash Algorithms (SHA) is an internet draft with an alternate description of the SHA family, and includes C code for each of the functions.
A hash function modelled on AES encryption by Vincent Rijmen and Paulo S. L. M. Barreto; general description, documentation and sample implementations.
(Are there any web sites with proper descriptions of RIPEMD-160 that do not simply rip off the ISO standard?)
NIST is holding a competition to carefully select the next widely-used hash function. Dozens of cryptographers submitted entries that will eventually be whittled down. An alternate take on the competition can be found at the SHA-3 Zoo.
This section lists papers and articles that cover research and practice in attacking cryptographic hashes. The term "construction" is used in addition to "attack" because some researchers prefer "construction" for mechanisms that weaken a cryptographic function in a way that does not directly lead to a useful result.
RFC 4270, Attacks on Cryptographic Hashes in Internet Protocols by Paul Hoffman and Bruce Schneier describes the properties of collisions and preimages, as well as how Internet protocols use hash algorithms.
Professor Xiaoyun Wang and her colleagues have been prolific in recent years with papers that show methods for producing collisions in popular hash functions. Some of those papers include:
A paper by Arjen Lenstra and Benne de Weger that shows how to create two PKIX certificates with the same signature but that have different public keys.
A web site describing work by Magnus Daum and Stefan Lucks that shows how two PostScript files with the same MD5 hash can display two very different results.
A paper by Ondrej Mikle showing MD5 collisions in binary self-extracting programs.
A detailed paper on how to create a rogue CA if a trusted CA signs using MD5 and predictable values for the certificate serial number, start date, and end date.
A paper by John Kelsey and Bruce Schneier that shows how to greatly reduce the work needed to find second preimages for extremely huge messages. For example, for messages that are millions of terabytes, the work needed is 2^106 attempts instead of the expected 2^160 attempts when using SHA-1.
If you have only learned about cryptography from an article in a magazine or the web, it would behoove you to read more about the field in a good book before delving into hash functions. There are many good mid- and high-level books on cryptography that describe a wide variety of topics.
The four books listed here have a variety of writing styles and intended audiences. Bookstores with large computer sections may have one or two of the books, but cryptography is a terribly popular subject, so it is likely that most bookstores will have none of them in stock. Online bookstores will usually have them all.
Practical Cryptography, Niels Ferguson and Bruce Schneier, ISBN 0471223573. This is the most direct of the four books, with a good balance between the nitty-gritty technical descriptions and higher-level applications. The author's design philosophy keeps the book well-focused, and the pseudo-code examples make it the most accessible to programmers.
Applied Cryptography (Protocols, Algorithms, and Source Code in C, 2nd Edition), Bruce Schneier, ISBN 0471117090. This is by far still the most popular book for learning cryptography. Because it is now a decade old, its age makes it weak in some respects, but the tone and coverage are still applicable today. This book is often called "the Schneier book" even though Bruce is also co-author of Practical Cryptography.
Network Security (Private Communications in a Public World, 2nd Edition), Charlie Kaufman, Radia Perlman, and Mark Speciner, ISBN 0130460192. Internet users and developers may find this book more practical than the others, given how it intimately mixes networking and cryptography to show how the two are tied together in modern usage. The authors' tone is conversational and fun. It also works quite well as a textbook, with interesting homework questions at the end of each chapter.
Handbook of Applied Cryptography, Alfred Menzes, Paul van Oorshot, and Scott Vanstone, ISBN 0849385237. This is the most academic and technical of the four, and is about as old as Applied Cryptography. It may over the heads of many potential readers, but the book also has some of the best in-depth descriptions of how various parts of cryptography are related to each other. It is more exhaustive, and exhausting, than other general cyptography books. This book is variously referred to as "HAC" (after the title), "MOV" (after the authors), and "CRC" (after the publisher).
This web site is sponsored by the VPN Consortium.
VPNC members | VPN technologies | Mailing list | Join VPNC
Interoperability testing | Documentation profiles | SimpleCA | IPsec archives
VPN standards | IPsec features chart | SSL features chart | VPN white papers
VPN conferences | IPsec bakeoff | Definitions | HIPAA | VPNC home