28 Aug 2015
In case you haven’t noticed, I recently migrated my blog to Jekyll. I did this as a response to the meltdowns I experienced the last time I was on the front page of HN. I could have upgraded my server of course, but I’m stubborn. Besides, Jekyll gave me an opportunity to learn something new.
Anyway, while going through the migration exercise, I became curious as to what the current WordPress install-base looks like. As it often does, general curiosity gave way to brainstorming a method of doing this kind of check. I figured that if I could use a fingerprint for a WordPress install, I could again comb through the scans.io http dumps. So that’s what I did.
To better answer this question, I figured I’d also need to check some historical records and compare them to recent dumps.
I used the following process to collect this data:
- Download each http dump from 09/2013 until 08/2015
- I got a list of the dumps from [scans.io site] (https://scans.io/json) and parsed out each sonar.http entry.
- I downloaded each of them. Since the disk space is severely limited on my VPS, I had to comb through the data as it streamed down. This added a lot of time– the overall process took about 10 days to get everything I was looking for. I parsed the data looking for three things.
- The date of the scan
- The IP of the server. I did this so I could track versions over time by source.
- The version.
- Parse each dump and look for a marker that identifies the page as a WordPress installation
- Since most unmodified WordPress contain the version in a meta tag, I got the value by looking for the regex “content="WordPress [0-9].[0-9].?[0-9]?”
- Run reports on data -
cat log | sort |uniq -c works, but I wanted to try something new.
- Send the data to Splunk
- I decided on Splunk to get more exposure. Lots of companies use it. I set up a Splunk server for this using their free licensing model, but that is outside the scope of this.
- Export the parsed data into a format easy for me to configure Splunk to process. I opted for json.
Problems with my methods
- This process doesn’t include https. I get this– and realize that I am missing a large chunk of data. There’s still a lot to be analyzed here so I am good with this for now.
- This doesn’t include custom configs that have excluded the WordPress metatags. I am making the assumption that this an exception to the rule.
- Custom version values in the metatags can pollute the data. Another aSSumption…
Here are some of the reports I’ve run so far:
Count of WordPress installs on individual IPs as of August 2015
index=wp | dedup ip | stats count
There appear to be 386,357 installs which fit the criteria above.
Top 20 wordpress versions as of August 2015
index=wp| top limit=20 version
At the time of the last scan, 4.2.4 is the clear winner. However, it was surprising to me that there are some early version 3s in the top 20.
Top 20 WordPress versions over time (9/1/2013 to 8/1/2015)
index=wp| timechart count by version limit=20 useother=f
What’s interesting about this one is that you can see clear spikes when new versions are released. It also appears that there is generally about a 3 month overlap between a release and its major successor’s rise.
Or a pretty picture of the top 30 in the same date range just for fun–or execs…
index=wp| timechart count by version limit=30 useother=f (using the area visualization)
Other ideas for reports:
- Versions with the least amount of change by IP. This could indicate some kind of canned WordPress site– maybe.
- Exposure based on versions with known vulnerabilities.
The reports can go on and on, but you get the idea.
Overall I learned a bit more about Splunk and got a clearer picture of the state of WordPress installs.
If you have any questions, let me know.
17 Jul 2015
Hey all,this is a pretty simple post, so I’ll keep it quick.
Yesterday, someone released a dump containing several archives of Darknet black-market sites for research purposes. This looked interesting, so I took them and did a little research.
One of the suggested uses by gwern was:
“deanonymization and information leaks (eg GPS coordinates in metadata, usernames reused on the clearnet, valid emails in PGP public keys)”
Sounds like a good start to me.
- Some of these sites and forums were probably custom coded so they may not have sanitized exif data.
- Some people who posted images probably used their mobile devices.
- Some people were not aware that some devices record your location when taking a picture by default.
What I did:
For my target, I chose a random archive with a decent amount of data. I wanted something that had potential. I also decided to only look at .jpg images. I did this so I could standardize on the method in which I collected the data.
I then hacked together a script that would extract all of the files I wanted from the tar.gz. The script would then get each file’s latitude and longitude if it existed within the metadata of each image.
After parsing hundreds of thousands of images, I came across about 37 unique images that were not properly sanitized. This means that the files contained exif data which may identify the latitude and longitude where the pictures were taken. (Keep in mind, this data could also be spoofed). Overall, it appears as if these images came from just a handfull of individuals.
For the curious, this is a sanitized montage of the images:
- You cannot depend on TOR alone to render yourself truly anonymous. If you don’t understand, it’s probably better if you don’t use it.
- Don’t do illegal things. You’ll get caught eventually.
So that’s it. Have a good weekend!
13 Jul 2015
I did it again.
So- hey again. Several weeks ago, I wrote about emailing over 97,000 people their own passwords and documented the results. The post was pretty popular. I got lots of feedback from HN, Reddit, via email, and through comments. People were mostly positive and many had very good suggestions. So- I took some of the feedback from folks and did a second experiment.
- Got banned from Mandrilla
- Got prebanned from SendGrid
- Rolled a mail server
- Cleaned up password lists
- Rebuilt Email Templates
- Sent 281,317 emails
- Received $82 in donations
I realized that in order to make this more successful I would need to make some changes. I am relatively new to the mass email arena, so I tried to partner with some known vendors.
I chose Mandrilla. Using their site, I found it very easy to convert my message into one that appeared much more professional and trustworthy. I have to admin, their interface is slick. So I began with a test run and queued up around two thousand emails. I decided to try and track open rates to see if messages were successful. I kept an eye on the GUI all day.
About 900 emails got out before I was banned. I couldn’t log in or recover any data - nothing. After discussing with their technical support teams, there was nothing they could do.
I then called Sendgrid– wanting to see if I could partner with them before I went through all the work of setting things up again. I just got a no.
So I ended up rolling my own mail server. I cleaned up my email template. I tested and retested the spamminess of my message. I asked my wife for her opinion. I cleaned up the password lists.
So here are the numbers:
- Thanks: 68 (+59) or .02%
- Unsubscribes: 29 (-12) or .01%
- Requests when opting out: 4 (+3) or .001%
- stop sending spam
- F**K OFF!!!!!!!!!!!!!!!!!!!!!
- f**k off slime bag - do your best to hack me and i’ll find you and let my pet camel f**k you in the *ss.
- F**k off scammer
- Opens: 3,478+ (This one is difficult to measure due to image blocking and other tracking blocks)
- Donations: $82 (+$82)
I’m not sure. If I’m going to turn this into a service I will need to better optimize delivery and take care of some back office type stuff. If I do this, all of the donations will be reinvested back into the service.
I still consider this effort successful. I look forward to helping more people.
For those that are curious, the project itself is located here.
23 Jun 2015
I run across lots of passwords on the webs. Passwords to bank accounts, Netflix accounts, email accounts- you name it. Pastebin and its clones are very popular repositories for this kind of information.
Now, there are a couple of solutions a person can use to collect this password data. Not all of them are malicious.
Some of these scripts are often used to alert a person when one of their own accounts are compromised as a kind of canary. I’ve seen various services where a person can opt-in to be notified if one of their accounts has been compromised. A “Canary As A Service” if you will. I can see two issues with this:
- Most users have no idea these services exist.
- Many users are wary of sending the information they care most about to another online service.
I wondered what would happen if I just emailed this information to the people who owned it. Instead of asking people to opt-in — I could offer them the chance to opt-out.
I decided to do this as part of urhack.com and call it
canary Robin (the reasoning behind this change is there). I set up the email and a reply address to offer people a chance to unsubscribe. I even set up a PayPal donation button. I didn’t expect anything in return, but thought , “Why not?” five dollars would cover the VPS time.
For 3 days, I scraped Pastebin looking for email address/password combinations. This seemed to be the easiest target since it was the most active. After removing the garbage, I was left with over 97,000 email:password combinations.
On May 19th 2015, I sent out the emails. I could have waited for more, but this was only an experiment– and honestly I was getting impatient.
I tried to keep the message simple:
- 9 Thank Yous (0.009%)
- The thank you notes I got were sincere. One of them validated the entire effort when the person indicated that they use the same password for everything and wanted to know which account had been compromised.
- 100 Delivery Status Notification (Failure) (0.1%)
- Many of the addresses contacted were no longer in use for obvious reasons.
- 41 unsubscribes (0.041%)
- Including one request to F**k off. (0.001%) :)” />
- 29 Spam (0.029%)
- Some of these addresses were either compromised accounts which reply to emails with spam or were planted for this purpose.
- I received no donations. This was not unexpected– but since the campaign didn’t cost me much, it’s also absolutely fine.
Overall I consider this experiment a success. I hope that many people were helped and did not reply instead of ignoring or losing the email to spam filters.
My next list has been running since May 19th. My current count has around 300k accounts.
I might just do this again.
30 Apr 2015
The [Scans.IO] project is hosting a new dataset courtesy of Mr. Hanno Böck. This data contains the result of scans against Alexa’s top 1 million domains looking for DNS servers which have allowed unauthenticated requests for Zone Transfer.
The purpose of DNS Zone Transfer (AXFR) is to replicate DNS data across DNS servers. Usually this information is protected with ACLs, but there are many DNS servers which allow unauthenticated requests and provide potentially sensitive information. This information is often used by hackers while conducting recon.
US-CERT even put out an alert in April of 2015. – https://www.us-cert.gov/ncas/alerts/TA15-103A
I parsed through the files very briefly using standard command-line tools and found some interesting things.
- There are 67,647 domains exposed or 6.7% of the Alexa top 1 million scanned.
- There are 47,025 unique DNS servers listed.
- 451 (or .95%) appear to contain records indicating the use of DNSSEC.
- 2,282 of the records contain the word intranet
- 285 of the records contain HINFO data
- There are 15,382 HINFO records
- 102 of the DNS servers use the .gov TLD.
- Of these, 8,166 records are exposed.
- 779 records contain the word password
- Of these, 58 contain both the words password and reset
- 39.5% are .com domain servers
- 26,083 records contain the word proxy.
Top 15 DNS Domains by Count of Exposed Domains
- xserver.jp – 11022
- secure.net – 2003
- mainnameserver.com – 1689
- pointhq.com – 1187
- linuxpl.com – 1010
- firstvds.ru – 920
- sedoparking.com – 902
- sixcore.ne.jp – 878
- wpx.ne.jp – 875
- dnsexit.com – 820
- a2hosting.com – 727
- parklogic.com – 722
- netsons.com.- 662
- 1gb.ru – 597
- linode.com – 508
Number of DNS Servers by TLD
- .com – 18598
- .net – 5982
- .ru – 3380
- .org – 1384
- .pl – 1237
- .br – 1170
- .jp – 1168
- .ir – 970
- .de – 628
- .uk – 551
- .nl – 532
- .ua – 514
- .tw – 514
- .kr – 479
- info – 444
Looks like there may be some nasty domains with enough traffic to be listed in the Alexa top 1 million too:
axfr loginj.com @ns1-king.vivawebhost.com.
This domain has about 1400 subdomains– all appearing to be phishing related.
According to Alexa, 91.5% of visitors to this site are from the US.
While the site itself is registered to someone in Australia
Registrant City: Deer Park
Registrant State/Province: Victoria
Registrant Postal Code: 3023
Registrant Country: Australia
So now what?
I did reach out to US-CERT and to one of the VPS providers on the list.
The response from the provider was:
Thank you for bringing this to our attention. While our servers are configured to allow AXFR, the ability to perform them is disabled by default — it is the responsibility of the user to configure their ACL’s to allow access to the servers they wish to allow replication between.[…]
I did not receive anything from US-CERT. However, with the alert above, I’m assuming they are aware of the .gov exposure.
Anyway, the scans are located here: https://scans.io/study/hanno-axfr.
If you find anything interesting, let me know.
Earlier versions of this post credited Rapid7 for this data. The data was instead gathered by Mr. Hanno Böck and is being hosted by scans.io