Studying Bulk Twitter Account Creation

Kurt Thomas presented one of our recent research papers at USENIX Security 2013 (in Washington, DC). The paper is available here, and USENIX will be putting the talk online. The other  content is available from the USENIX Security site and is free.

Several news articles have been written about it, here’s a few:

As far as our work goes, this continues our line of research looking at the abuse of social networks. We have documented abuse in several forms before (CCS’10, S&P’11, IMC’11, LEET’12), but the goal with this project was to develop an understanding of how accounts are created in bulk, as well as the market for these accounts.

We set out to perform this study by buying accounts from the underground market. Once we started buying accounts, we determined that we could build a classifier to retroactively identify accounts that came from any of the merchants we had bought accounts from. This let us examine several aspects of the automation infrastructure that enables this marketplace, such as IP address diversity, captcha solving rates, and others. In all, the classifier found millions of accounts that had been created by the 27 merchants we bought from (with the top few merchants responsible for the vast majority). Twitter then suspended the accounts we identified in several large batches.

Posted in research | Leave a comment

censoring political speech on twitter

We have a paper this year at LEET — it’s a interesting look at policitically motivated spam on Twitter. We caught this thanks to an article Brian Krebs wrote ( and followed up investigaing the attack. Not only did this type of attack happen again during the actual Russian elections, but is a problem elsewhere in the world too (see:

It appears that the infrastructure that was used to perform the attack (accounts and computers) was drawn from general spam-as-a-service stores and you can read more in our paper, or come see Kurt Thomas give the talk on April 24th:

“Adapting Social Spam Infrastructure for Political Censorship.” Kurt Thomas, Chris Grier, and Vern Paxson. To Appear in the Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET). April 2012. PDF BIB

Posted in research | Leave a comment

paper at IEEE S&P 2012

I’m a co-author on a paper this year at IEEE S&P that looks at the methodology behind how malware experiments have been conducted in recent papers at top-tier venues.

“Prudent Practices for Designing Malware Experiments: Status Quo and Outlook,” Christian Rossow, Christian J. Dietrich, Christian Kreibich, Chris Grier, Vern Paxson, Norbert Pohlmann, Herbert Bos and Maarten van Steen.

The goal is to come up with a set of guidelines to help guide malware experiments so that other members of the research community are able to better assess the proposed techniques. We found that often malware researchers fail to perform critical steps in their experiments (leading to misleading results) as well as omit important details in their paper, leaving the reader unable to understand the evaluation. There’s a website up for people to comment and suggest guidelines for better experimental methodology:

If you would like a copy of the paper, email me:

Posted in research | Leave a comment

VMware vSphere Java examples

I had to automate some VMware tasks the other day, and with the latest ESXi it seems the best way is the VI Java API. Note: I typically not code in Java!

We use linked clones heavily because it saves on disk space and is quicker for us to roll out a entirely fresh set of VMs. A linked clone doesn’t have a entire copy of the parent VM’s disk, instead it just includes a reference to the disk file and stores differences. Create a linked clone:

When we clone a VM, we change the network settings and VNC info. To do that:

Then we snapshot the VM:

That’s it!

Posted in research | Tagged , | Leave a comment

Chrome extensions and security

Adrienne wrote a blog post about some of her recent work analyzing Google Chrome extensions for security related bugs. It’s a nice read and illuminates mistakes made by a surprisingly large number of extension developers (27 / 100 extensions leak private information!).

Although I don’t use Chrome on a regular basis, I had believed that  simple APIs and (presumably) more thought that went into security related design would have made it more difficult for developers to write vulnerable extensions.

It’s not just extensions that are problematic either, in a recent screenshot of a Blackhole Exploit Pack’s control panel, the exploits it served were far more successful against Google Chrome (in % of visitors) than all versions of FF and IE combined.

Posted in Uncategorized | Comments Off on Chrome extensions and security

paper at IMC 2011

This year we have a paper studying the activity of suspended users on Twitter, which will appear at IMC in November. The title is “Suspended Accounts In Retrospect: An Analysis of Twitter Spam“, and the paper presents a unique perspective on spam as compared to our previous papers (in CCS and S&P). We look back in time and collect the spam tweets sent by users who were eventually suspended, and then try and tie as much together as we can.

For example, one spam campaign, advertising a single landing page, can use well over 100,000 twitter accounts, and send millions of tweets. Each of the accounts involved was created for the purpose of sending spam, and generally has never sent a non-spam tweet (there are some exceptions to this of course!). The resources of the Twitter spammers is quite impressive — being able to throw away 100k accounts (they all get suspended eventually) after sending a few tweets demonstrates the account resources they have at hand.

Anyway, read more in a couple weeks when we post the PDF!

Posted in Uncategorized | Comments Off on paper at IMC 2011

Anti-virus labels are not suitable for system evaluation

I won’t name names, but there’s plenty of researchers out there that rely on anti-virus labeling in their research. While this could work, without manual validation there’s very little chance the AV labels can be used as any sort of ground truth.

Here’s 5 reports:
1. fc39ce1593cfb6ca1eb0c289a2ca561c

2. c4d93b536f35b350a992a402dfd72e12

3. c77ba55255c1db38568ca3a73d4b8a72

4. e57d938e0754e4fbb3b87cf818a0fc69

5. e397696b7835ccdcfad9d768cf1a091c

Quick highlights in classification from each report:
1. Bredolab, Krap, Ursnif, Downloader, Generic, etc…
2. Krap, Kryptic, Generic packed, etc…
3. Bredolab, Oficla, Krap, Zbot, Ldpinch, etc…
4. Bredolab, Harnig, Krap, Ursnif, etc…
5. FakeAV, Bubnix, etc…

Based on those 5 reports, it’s certainly not obvious that these samples are all the exact same family of malware. In fact, if you run each one, they issue nearly identical HTTP requests. Report #3 seems to have the most diverse set of well-known names, almost a grab bag of popular malware.

There’s a few things I can say for certain: It’s definitely malware. It’s not Bredolab. It’s also not Harnig, Zbot, Ldpinch, Oficla, or any sort of FakeAV. I’m not sure what a few of the names, like Krap and Ursnif, refer to, so I can’t definitively say it’s not those.

Based on these reports, if someone were to go and develop a malware classification technique and validate it against a set of malware (see lots of papers from IEE S&P, Usenix, ACM CCS, and everywhere else!), using ground truth obtained from Virus Total labels: Which AV should be trusted? Will that same AV perform well on another family of malware? Do any of the labels have more or less meaning than others?

If an AV says a binary is Bredolab (Report #1), what does that mean? Did engineers determine that a particular binary, with a specific MD5, is Bredolab? Did they find a few bytes in the binary that typically indicates Bredolab? Did the network traffic match Bredolab?

In summary, the labels that AV programs produce for malware are too noisy to be used with any confidence to evaluate a system unless each sample is manually validated.

Posted in research | Tagged , | Comments Off on Anti-virus labels are not suitable for system evaluation

Click Trajectories press!

The paper, “Click Trajectories: End-to-End Analysis of the Spam Value Chain”, got quite a bit of pres recently so there’s a number of great articles that summarize the paper content and have gone out to get quotes from banks and other security researchers and experts.

Newspaper, online news, and blogs:

You can even watch a video!

Posted in research | Comments Off on Click Trajectories press!

Papers at 2011 IEEE Symp. on S&P

We had two papers at Oakland this year, and I’ve put the PDFs up online. Kirill and Kurt  presented on Tuesday afternoon (schedule)

NYT Article on the “Click Trajectories” work:

“Click Trajectories: End-to-End Analysis of the Spam Value Chain”, Kirill Levchenko, Andreas Pitsillidis, Neha Chachra, Brandon Enright, Mark Felegyhazi, Chris Grier, Tristan Halvorson, Chris Kanich, Christian Kreibich, He Liu, Damon McCoy, Nicholas Weaver, Vern Paxson, Geoffrey M. Voelker, and Stefan Savage. Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, May 2011. PDF


“Design and Evaluation of a Real-Time URL Spam Filtering Service”, Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, and Dawn Song. Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, May 2011. PDF

Posted in Uncategorized | Comments Off on Papers at 2011 IEEE Symp. on S&P

Naming some popular spambots

Part of what I’ve been doing lately is finding, running, and maintaining bots in a controlled environment. The first part, finding, which includes identifying the binaries I’m running, turns out to be difficult to do.

Through a few “special” techniques, I come up some new binaries that produce spam. For example, the binary with MD5 f03077adfdedc55b9ae906be897f2cc0. It runs, connects a C&C, has a obfuscated C&C protocol, and ends up sending spam. So what is it? Virus Total says: Screenshot 1

What does that mean? Well, in my  opinion, it means that none of the AV signatures have a clue, they just say it’s probably bad stuff. This binary happens to be a installer for a newer version of Rustock, which I can verify by watching it run. I have several thousand binaries that I’ve acquired using the same technique as this one, most of which also have useless AV labels.

Why is that? Malware distribution is complicated. There’s a lot of steps, intermediate binaries, packers, crypto, etc… What happens is that somewhere along the line of installation, the AV signature matched and then labeled other things according to the same signature. I see this a lot with generic droppers, the bot binaries that are run become labeled with Virut (a old school generic dropper), or Harnig (another generic dropper), both of which can drop any number and type of malware binaries. In some experiments, I’ve seen over 15 different binaries be downloaded and executed by a single dropper, and this behavior changes on subsequent executions.

Posted in research | Tagged , , | Comments Off on Naming some popular spambots