I won’t name names, but there’s plenty of researchers out there that rely on anti-virus labeling in their research. While this could work, without manual validation there’s very little chance the AV labels can be used as any sort of ground truth.
Here’s 5 reports:
1. fc39ce1593cfb6ca1eb0c289a2ca561c
2. c4d93b536f35b350a992a402dfd72e12
3. c77ba55255c1db38568ca3a73d4b8a72
4. e57d938e0754e4fbb3b87cf818a0fc69
5. e397696b7835ccdcfad9d768cf1a091c
Quick highlights in classification from each report:
1. Bredolab, Krap, Ursnif, Downloader, Generic, etc…
2. Krap, Kryptic, Generic packed, etc…
3. Bredolab, Oficla, Krap, Zbot, Ldpinch, etc…
4. Bredolab, Harnig, Krap, Ursnif, etc…
5. FakeAV, Bubnix, etc…
Based on those 5 reports, it’s certainly not obvious that these samples are all the exact same family of malware. In fact, if you run each one, they issue nearly identical HTTP requests. Report #3 seems to have the most diverse set of well-known names, almost a grab bag of popular malware.
There’s a few things I can say for certain: It’s definitely malware. It’s not Bredolab. It’s also not Harnig, Zbot, Ldpinch, Oficla, or any sort of FakeAV. I’m not sure what a few of the names, like Krap and Ursnif, refer to, so I can’t definitively say it’s not those.
Based on these reports, if someone were to go and develop a malware classification technique and validate it against a set of malware (see lots of papers from IEE S&P, Usenix, ACM CCS, and everywhere else!), using ground truth obtained from Virus Total labels: Which AV should be trusted? Will that same AV perform well on another family of malware? Do any of the labels have more or less meaning than others?
If an AV says a binary is Bredolab (Report #1), what does that mean? Did engineers determine that a particular binary, with a specific MD5, is Bredolab? Did they find a few bytes in the binary that typically indicates Bredolab? Did the network traffic match Bredolab?
In summary, the labels that AV programs produce for malware are too noisy to be used with any confidence to evaluate a system unless each sample is manually validated.