Studying the passwords dumped on the Internet by hackers back in December provided a good opportunity for me to measure the scope of the problem. Following that experience I decided to collect and correlate some new information when analyzing password dumps from January.
Overview of Password Dumps
Last month I found 110 password dumps which met my criteria* for analysis, down from 154 in December. A few of the dumps contained data from multiple sites. There were 90 specific organizations or domains named as the source of the passwords. The remaining dumps didn't identify the source of their data, or were gathered from multiple personal computers instead of from a centralized web site.
From this collection, 40 dumps consisted primarily of plaintext passwords, exposing roughly 61,000 passwords (36% of the monthly total). Another 64 dumps primarily contained hashed passwords, adding approximately 101,000 passwords (59% of the total). Six more dumps had a mixture of plaintext and hashed passwords, accounting for 9,000 passwords (5% of the total).
Compared to 450,000 passwords dumped in December, this month's total of 170,000 passwords was significantly lower. A contributing factor to this was the smaller number of dumps, but maybe more importantly there also tended to be fewer large password dumps. This month only had two dumps containing more than 10,000 passwords, while last month had 17.
I wasn't really surprised to see this result. The number of sites that are compromised and the number of passwords they disclose will always change based on what is getting hacked that month and how large the vulnerable sites happen to be.
Over time a baseline average of 150,000 - 300,000 passwords dumped each month might emerge, but this number would skyrocket every time a large site was affected by a security breach. In June of 2012 we watched LinkedIn lose 6.5 million passwords and eHarmony lose around 1.5 million. The next month 450,000 passwords were leaked from Yahoo Voices. Just this past Friday Twitter announced a forced password change for around 250,000 users whose password hashes may have been accessed by hackers (although these have yet to be publicly dumped).
There were 40 different hackers or hacker groups claiming credit for January's password dumps, and more hackers that chose to dump their data anonymously. So even the retirement, capture, or poor motivation of any particular hacker seems unlikely to have a large impact in the flow of monthly password dumps.
The biggest deterrent to future password dumps is more likely to be improvements in the security of the code and development frameworks used by the vulnerable sites, or a widespread adoption of specific security countermeasures (e.g. web application firewalls or intrusion prevention systems). This brings us to the subject of what types of sites I found to be vulnerable today.
Sites Vulnerable to Password Dumps
In January I decided to visit all the sites named in the password dumps and gather information on the software coding language they used, the category of the function they served, and the country in which they were hosted. I hoped this might provide some further insight on whether these attacks were targeted in any way or simply opportunistic.
SQL injection attacks appear to be the primary supplier of the database dumps containing passwords. Many of the dumps include actual output from the tools (like Havij) that hackers can use to automate the extraction of database contents. This seemed pretty intuitive since SQL injection is a common web application vulnerability and one that may not require hackers to gain any other illicit access on the targeted site.
What did surprise me was that the majority of the 87 named sites targeted with password dumps were developed using the PHP programming language, as shown in the following chart.
Netcraft's Web Server Survey for this same month shows that 39% of all web sites (around 244 million) are running PHP. While that is a large portion of the Internet, the market share by itself doesn't seem to justify PHP sites making up 91% of the total. After all, sites using ASP and JSP can be just as vulnerable to SQL injection attacks as PHP. If I didn't know that SQL injection was the primary attack method I would suspect that hackers were exploiting some PHP-specific vulnerabilities.
A more likely explanation is that the popularity of the language has led to the rapid deployment of PHP sites and PHP-based content management systems (CMS) by people who lack an education in web application security. Even though the risk of SQL injection in PHP should be fairly well understood, some organizations still end up deploying code that doesn't implement proper security controls.
Interestingly enough, I found that one of the organizations suffering a January password dump had actually showed up previously in a hacker's report of sites vulnerable to SQL injection posted on Pastebin.com over 6 months ago. So either they never learned they were vulnerable in the subsequent months or were unable to completely fix the problem before it was exploited to dump their entire user database.
Another possible explanation is that some of these sites might be unintentionally allowing attackers to connect directly to the site database and download records that way. This should be prevented by host firewall segmentation and proper database authentication, but sites may have been deployed without these precautions. I see evidence within the posted password dump files that this is happening, but I believe it is secondary to the more popular SQL injection attacks.
While all software developers and web admins should learn about SQL Injection prevention and other secure site management practices, it appears that the PHP community needs the most help catching up.
Categories of Targeted Sites
When I visited these vulnerable sites I also assigned them a category based on the site's purpose. Most of these category labels should be self explanatory, but I'll provide a description for a few of them.
Education sites were mainly universities, although there was one primary school. Business sites were a corporate web presence that mainly published data and didn't offer online services to customers. Entertainment sites could be a discussion forum or a site sharing information on a specific topic (e.g. movies or sports).
An Online Store was a business site where customers could make purchases. This type of site would be particularly valuable to attackers looking to capture stored billing information or to order merchandise and charge it to customers. As mentioned last month, sites like this often don't make it into public password dumps because attackers can sell the account data in the underground marketplace.
Info Services are the sites that sell information as a single-use or subscription service. These sites could also be valuable to attackers, depending on what type of data they make available to customers. Finance sites are banks, credit unions, or other related institutions, but in this month's case it was a single investment firm.
The chart below shows how many sites matched each category in the January password dumps.
I didn't really have many preconceived ideas about the categories of sites I expected to see targeted. Government, Business, Medical, Education, News, and Political certainly make sense if the attacker hopes to gain media attention with their data dump.
The Online Stores, Info Services, and Financial sites make sense if the attacker hoped to make a financial gain from the attack. Although going after these sites could also certainly just be for bragging rights.
There is probably some ratio of opportunistic and targeted attacks in this mix, but it is difficult to detect unless the hackers specifically outline their motivations in the dump descriptions.
Countries Hosting the Targeted Sites
I was able to identify the country of origin for 96 of the January password dumps. There were 35 different countries represented in that total. The top 10 countries that experienced the most password dumps are shown in the chart below.
Seven other countries tied the Philippines with two vulnerable sites each. South Africa was given a bit more attention in January than normal due to one hacker group specifically targeting organizations in that country as part of a political statement.
Another observation was that half of the vulnerable sites in India were Education category sites. They were the only country that had such a large percentage of their sites in the same category. This might indicate greater web site security problems at universities in India, or it could just be chance.
Impacts on Users of Targeted Sites
At least some users of the hacked sites are likely to experience problems as a result of these password dumps. If passwords were stored in a plaintext format then any account is vulnerable to misuse by unauthorized individuals. Even if hashed, password cracking software can produce the plaintext passwords fairly quickly for all but the stronger passwords. Whether hackers care to use these stolen credentials will depend partially on what the account can be used for and partially on how motivated they are to annoy users.
Some web sites use an email address for the username, or record the email address during the account registration process. Email addresses of users were found in 96 (87%) of the analyzed password dumps in January.
The use of email addresses isn't necessarily a problem by itself, but the reality of user password practices can result in email address leaks endangering their identities on other Internet sites. If one organization leaks email addresses and passwords this allows attackers to try those credentials elsewhere. In fact, hackers have developed software tools that automate the process of trying discovered email and password combinations against a list of popular web sites.
Password reuse makes this a real problem, although mainly for the users and not necessarily for the site that was vulnerable to the password dump in the first place. Even if a user chose a stronger password to reuse, it only takes one site storing that password in plaintext to potentially expose all of their accounts.
My suggestion to users is that passwords should never be reused across any sites where you would care if your account gets hacked. Always choose a strong and unique password for each site and then store it in a password manager if you are concerned about forgetting it.
Web sites often can't do much to prevent password reuse other than enforcing good password selection controls that might eliminate the worst of the reused passwords. However, they can make sure that they use adequate password hashing and salting techniques that make the task of cracking user passwords much more difficult for hackers. Sites should also notify users when a database breach is detected and warn them to change their password anywhere else it was used, in addition to forcing a change on the hacked site itself.
I wasn't able to complete my analysis of all the information provided by the password dumps in January, but I'll continue to work on it and will post new results here on the blog. In the meantime, if you have any questions or comments leave them below, or contact me on Twitter @PwdRsch.
* Study Methodology
I monitored Twitter and other sites for notices that a data dump had been publicly posted. Some data dumps contained user or customer information but not passwords. Others contained only the administrator password or the passwords of a very limited number of users. I ignored these and focused only on sources that contained passwords (hashed or plaintext) of at least a dozen or more users.
I also attempted to eliminate duplicate dumps, a practice where one hacker copies a full or partial dump from someone else and reposts it as their own. Sometimes these dumps are from the same month and sometimes they are from previous months.
In a few cases the dump poster noted that they had included only a subset of the available user passwords. While I didn't count the unposted passwords, we should assume that the attacker had access to the complete user database, which would increase the total number of passwords exposed.
When reviewing these figures keep in mind that they account only for the publicly posted data which I was able to discover. Hackers certainly compromised the passwords of other sites and kept this activity secret, or shared the data over more private channels.