Hash Length Extension Attacks

It seems that many penetration testers rarely test cryptographic vulnerabilities. I’ve always been interested in cryptography, so I’ve made it a goal to understand how web application developers misuse crypto, and how to exploit the flaws that the misuse of cryptography create.

In January, I did some independent research on how to perform hash length extension attacks against poorly implemented message authentication codes (MACs). I found several good research papers and blog posts that discussed how these attacks work in a very general sense. However, there was not much information that specifically explained the details of a length extension attack. In this post, I’ll be explaining exactly what does happen.

Message Authentication Codes 101

Message authentication codes (MACs) are a way to verify the authenticity of a message. In the more naive implementation of a MAC, the server has a secret key that it concatenates with a message, and then hashes the combination with an algorithm, such as MD5 or SHA1. For example, consider an application that is designed to give an authorized user the ability to download specific files. The site might create a MAC for the filename like this:

def create_mac(key, fileName)
   return Digest::SHA1.hexdigest(key + fileName)
end

The resulting URL might look something like:

http://example.com/download?file=report.pdf&mac=563162c9c71a17367d44c165b84b85ab59d036f9

When the user sends the request to download a file, the following function is executed:

def verify_mac(key, fileName, userMac)
    validMac = create_mac(key, filename)
    if (validMac == userMac) do
        initiateDownload()
    else
        displayError()
    end
end

With this code, the server should only call initiateDownload if the user has not tampered with the filename… or so the theory goes. In reality, this method of creating a MAC leaves the site vulnerable to an attack where attackers can append their own content to the end of the file parameter.

Length Extension Attacks, The Simple Explanation

Cryptographic hash functions, such as MD5, SHA1, SHA2, etc., are based on a construct known as Merkle–Damgård. An interesting issue arises with this type of hash function: If you have a message that is concatenated with a secret and the resulting hash of the concatenated value (the MAC) – and you know only the length of that secret – you can add your own data to the message and calculate a value that will pass the MAC check without knowing the secret itself.

Example: message + padding + extension

Continuing the example from above, an extension attack against the hypothetical file download MAC would look like this:

http://example.com/download?file=report.pdf%80%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00
%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00
%00%A8/../../../../../../../etc/passwd&mac=ee40aa8ec0cfafb7e2ec4de20943b673968857a5

Length Extensions In Depth

To understand why this attack works, you first must understand what happens inside a hash function.

How Hash Algorithms Work

Hash functions work on blocks of data. As an example, 512 bits is the block length for MD5, SHA1 and SHA256. Most messages that are hashed will have a length that is not evenly divisible by a hash function block length. Thus, the message must be padded to match a multiple of the block length. Using the file download MAC example above, the message after padding would look like this (the ‘x’s represent the secret key):

xxxxxxxxxxxreport.pdf\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xA8

In SHA1, which is the algorithm being used in this example, a hash consists of a series of five integers. When displayed, these integers are usually in hexadecimal format and concatenated together. The initial value − aka, the registers − are set to this value when the algorithm is run: 67452301, EFCDAB89, 98BADCFE, 10325476, C3D2E1F0. Then, once the message has been padded, it is broken up into 512 bit blocks. The algorithm runs through these blocks, performing a series of calculations with the blocks to update the registers. Once these calculations are completed, the contents of the registers are the resulting hash of the message.

Calculating An Extension

The first step in calculating an extension is to create a new MAC. To do this, the contents we are extending the message with must be hashed: ‘/../../../../../../../etc/passwd’ in our example. However, when performing this hash, the initial registers must be overridden with the MAC from the origional message. You can think of this as making the SHA1 function start off at the state where the server’s hash function left off.

Attacker's MAC = SHA1(extension + padding) <- but with overridden registers

For this attack to work, the extension must be in its own block when it goes into the server’s hash function. The second step is to calculate enough padding so that key + message + padding == some multiple of 512 bits. In this example, the key is 11 characters long. Therefore, the padded message would look like this:

report.pdf\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xA8

The padded and extended message is then sent to the server, with the new MAC:

http://example.com/download?file=report.pdf%80%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00
%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00
%00%00%A8/../../../../../../../etc/passwd&mac=ee40aa8ec0cfafb7e2ec4de20943b673968857a5

Here is what the server hashes when it hashes the attacker’s hacked message: secret + message + padding to the next block + extension + padding to the end of that block. The result of the server’s hash will be ee40aa8ec0cfafb7e2ec4de20943b673968857a5, which provides the same result as hashing the extension while overriding the registers with the original MAC. This occurs because the attacker’s hashing operation essentially started off at the same state the server’s hash operation is at when the server has hashed half of the attack.

How To Run The Attack

For simplicity, in this example I revealed that the key length was 11 characters. In a real-world attack, where the attacker will not know the length of the key, he will need to determine the key length.

Continuing the example, let’s say that the vulnerable website returns different errors (HTTP response codes, error messages in a response body, etc.) when a MAC validation failed versus when the validation succeeded, but the file was not found. An attacker can then calculate multiple extensions, one for each possible key length, and send each extension to the server. When the server responds with an error indicating that the file was not found, then, conversely, a length extension vulnerability has been found, and the attacker is free to calculate new extensions aimed at gaining unauthorized access to sensitive files on the server.

How To Defend Against This Attack

The solution to this vulnerability is to use an algorithm known as HMAC. Instead of just hashing the key concatenated with the message, HMAC does something like this:

MAC = hash(key + hash(key + message))

How HMAC actually works is a bit more complicated, but you get the general idea. The important part is that because it is hashed into the message twice, the key is not vulnerable to the extension attack described in this post. HMAC was first published in 1996, and has since been implemented in just about every programming language’s standard library.

Summary

Though there are still some crazy people who write their own cryptographic algorithms, most people have gradually figured out that writing their own crypto is a bad idea. However, it is essential to do more than merely use publicly vetted crypto algorithms: You’ve must use those algorithms in the right way. Unless you thoroughly understand how the algorithms you use work – and know how to use them correctly – it is always safer to rely on professionally vetted, high-level libraries that will take care of the low-level stuff for you.

This entry was posted in Web Application Security on by .

About Douglass Clem

Douglass Clem is a geek with an interest in information security and cryptography. Since late 2010, he has lived in the San Francisco Bay Area, where he works as an Application Security Engineer at WhiteHat Security. When he isn’t hacking or writing code, Douglass enjoys dabbling in photography, among other things.