Monday, January 7, 2013

Passwords Found in the Wild for December 2012

In the late 1990's when I started analyzing passwords it was much harder to find samples to review.  My password collection routine consisted mainly of begging colleagues to share data or volunteering to perform the cracking for their security assessments.  Occasionally I would get lucky and find a publicly readable password file on the Internet.  Then I could dedicate a computer for several months to cracking each password database because it would certainly take at least that long before another sample showed up.

Today I find that I am actually overwhelmed with the opportunities to gather passwords.  The raw number of Internet sites that register users and collect their passwords is huge.  Correspondingly, the number of these sites that are susceptible to SQL injection or other vulnerabilities that allow attackers to extract their user databases has also grown.  Hackers are regularly exploiting these flaws and publishing password dumps to embarrass companies, to attract attention to their causes, or to simply stroke their egos.

I decided to monitor password dumps in December 2012 to get a better idea of how widespread this practice has become.  I monitored several sources (though mainly the Pastebin.com web site) for announcements of dumps and analyzed the data posted.

Study Methodology

Snippet of Password Dump Tracking Data
Some data dumps contained user or customer information but not passwords. Others contained only the administrator password or the passwords of a very limited number of users.  I ignored these and focused only on sources that contained passwords (hashed or plaintext) of at least a dozen or more users.  I also attempted to eliminate duplicate dumps, a practice where one hacker copies a full or partial dump from someone else and reposts it as their own.

In some cases the dump poster also noted that they included only a subset of the available user passwords.  However, we should assume that the attacker had access to the complete user database, which would increase the actual number of passwords exposed.

When reviewing these figures keep in mind that they account only for the publicly posted data of which I was made aware.  Hackers certainly compromised the passwords of other sites and kept this activity secret, or shared the data over more private channels.  Brian Krebs covered the underground marketplace for the more valuable passwords in his recent blog post.

Password Dump Findings

In December I found 154 dumps which met my criteria for analysis.  A few of the dumps contained data from multiple sites.  They named more than 125 different organizations and domains as the source of the leaks.  Passwords belonged to users at businesses, governments, schools, industry groups, and discussion forums.  Some dumps didn't identify the source of their data, or were gathered from multiple personal computers instead of from a centralized web site.

From this collection, 66 dumps consisted primarily of plaintext passwords, exposing roughly 221,000 passwords.   Another 82 dumps primarily contained hashed passwords, adding approximately 222,000 passwords to the count. So while the number of hashed password dumps was greater than the plaintext dumps, the number of passwords exposed was nearly equal.  Six more dumps had a mixture of plaintext and hashed passwords, but only accounted for 6,000 passwords.

Altogether, I found that almost 450,000 passwords were publicly exposed during the month. There were 103 dumps containing less than 1,000 passwords, and 17 dumps containing more than 10,000 passwords.  About 184,000 passwords (41% of the total) came from several dumps simultaneously released as part of Team GhostShell's Project WhiteFox on December 10th.

Finding that half of the exposed passwords lack the security provided by basic password hashing is disappointing.  While some of the affected sites likely have low security requirements, storing only password hashes is a pretty standard security practice that should be followed by almost every site.

Without password hashing both the poorly and well constructed passwords are exposed during leaks like these.  A user may think their password is secure only to find that their account has been compromised due to insecure password storage that was beyond their control.

Even hashed passwords can only offer resistance against attacks once they have been stolen from an organization.  Password crackers have become faster and more proficient at trying common words, names, phrases, and other combinations of guesses that can disclose a password after it has been hashed.

Conclusion

My feelings are mixed when it comes to the results of this study.  On one hand I'm frustrated with the security vulnerabilities that continue to plague many Internet sites, and on the other hand I'm eager to see what wisdom is provided by examining these leaked passwords.  The wide variety of passwords from these different sources allows researchers like me to pick and choose the password samples that seem most interesting or likely to produce the information we seek.

So these days instead of begging for passwords I'm finding myself begging for help to sort through all the password data that is available to me.