Commit 84d3229c authored by Babak's avatar Babak

initial commit of code

parent acdc14e7
AVType is a tool to identify the type of malicious files.
# AVType
Coming Soon!
\ No newline at end of file
AVType is a tool to identify the type of malicious files. AVType attempts to group known malicious files into types. For each malicious file it uses multiple AV labels to derive their behavior type (e.g., fakeAV, ransomware, dropper, etc.).
AVType receives as input the anti virus (AV) labels for a given file and produces the type of malware based on the AV labels.
For more details about the purpose of this tool and how it works, please refer to our DSN 2017 paper titled: "Exploring the Long Tail of (Malicious) Software Downloads" (Babak Rahbarinia, Marco Balduzzi, Roberto Perdisci).
**Disclaimer**: Due to high amount of noise and inconsistency in anti virus labels, this tool works based on *best effort* approach and is not guaranteed to produce expected results. We encourage manual analysis of the results.
## Type Extraction
AVType expects a JSON input file. Each line of the input should be a dictionary with 2 keys: i) "sha1" ("md5" and "sha256" are also accepted) and ii) "av_labels". The "av_labels" is a dictionary of AV's and their AV labels. The following is an example of one line of input:
```
{"sha1": "3f2fee39787ca4c69882485390ffa89fef84b397", "av_labels": {"Kaspersky": "HEUR:Trojan-Downloader.Win32.Generic"}}
```
**Note**: We plan to add support for Virus Total files in near future.
AVType analyzes the AV labels and assigns an AVType to the file. In the case of example above, the type would be determined as "dropper".
Currently, AVType understands the labels of 5 AVs: Microsoft, Symantec, McAfee, Kaspersky, and Trend Micro. The more obvious labels of other AVs might also be understood by AVType.
AVType could categorize the malware types into: trojan, banker, bot, adware, pup, ransomware, spyware, fakeav, worm, dropper, and generic.
**How To Run?**
AVType is written in Python (Python 2.7). It accept 3 input arguments as follows:
```
usage: avtype.py [-h] -i INPUT [-m MODE] [-v]
AVType
optional arguments:
-h, --help show this help message and exit
-i INPUT input AV labels file
-m MODE 3 modes: unanimous, majority, aggressive
-v verbose output for debuging
```
The MODE flag accepts 3 possible inputs: i) unanimous: only assigns a type if *all* AV labels agree with each other, ii) majority: assigns a type if *majority* (>50%) of AV labels agree with each other, and iii) aggressive: makes best effort to assign a type, for example, by assigning the most *specific* type among the types extracted from AV labels. The aggressive mode might produce unexpected results. For example, if a file has two AV labels and one AV's type is spyware and one AV's type is adware, then AVType cannot decide between the two and will produce trojan as output since it covers both spyware and adware.
We recommend to provide the verbose flag at all times. It provides more information about why AVType decided to assign a specific type to a given file based on the types extracted from each individual AV label.
#! /usr/bin/python
import argparse
import json
import sys
from operator import itemgetter
parser = argparse.ArgumentParser(description="AVType")
parser.add_argument("-i", action="store", dest="input", help="input AV labels file", required=True)
parser.add_argument("-m", action="store", dest="mode", help="3 modes: unanimous, majority, aggressive", default="aggressive")
parser.add_argument("-v", action="store_true", dest="verbose", help="verbose output for debuging", default=False)
args = parser.parse_args()
def main():
if args.mode not in ['unanimous', 'majority', 'aggressive']:
print 'mode must be unanimous, majority, or aggressive'
sys.exit(-1)
for line in read_input():
try:
line = json.loads(line.strip())
except:
continue
if 'md5' in line:
file_key = 'md5'
elif 'sha1' in line:
file_key = 'sha1'
elif 'sha256':
file_key = 'sha256'
if args.verbose:
temp1 = ''
for t in get_type_v1(line['av_labels'])['types']:
temp1 += '%s=%.2f\t' % (t[0], t[1])
temp2 = ''
for av in line['av_labels']:
temp2 += '%s=%s\t' % (av, line['av_labels'][av])
print '%s\t%s\t%s\t%s\n' % (line[file_key], get_type_v3(line['av_labels']), temp1, temp2)
else:
print '%s\t%s' % (line[file_key], get_type_v3(line['av_labels']))
def read_input():
with open(args.input) as f:
for line in f:
if len(line) > 0:
yield line
def mal_type(vnd, lbl):
vnd = vnd.lower()
lbl = lbl.lower()
if vnd == 'trendmicro':
if label_checker(lbl, ['adw_', 'adware_']):
return 'adware'
if label_checker(lbl, ['spyw_', 'spyware_']):
return 'spyware'
if label_checker(lbl, ['troj_fakeav']):
return 'fakeav'
if label_checker(lbl, ['Ransom', 'troj_ransom', 'tspy_ransom']):
return 'ransomware'
if label_checker(lbl, ['troj_', 'troj64_', 'brex_', 'vbs_', 'mal_', 'pe_sality', 'pe_expiro']) and not label_checker(lbl, ['troj_ge']):
return 'trojan'
if label_checker(lbl, ['TSPY_', 'BKDR_', 'hs_zbot.smla2', 'hktl_', 'pe_virux', 'pe_neshta', 'pe_ramnit']) and not label_checker(lbl, ['bkdr_generic']):
return 'bot'
if label_checker(lbl, ['tspy_']): # can't get here!
return 'banker'
if label_checker(lbl, ['gray_', 'crck_', 'dialer_', 'joke_']):
return 'pup'
if label_checker(lbl, ['worm_']):
return 'worm'
if label_checker(lbl, ['heur_', 'heurspy_', 'possible_virus_', 'pe_generic', 'troj_ge',
'pak_generic', 'cryp_xed', 'possible_virus']):
return 'generic'
elif vnd == 'microsoft':
if label_checker(lbl, ['adware', 'program:win32/mmsassist', ]):
return 'adware'
if label_checker(lbl, ['Ransom', 'trojan:win32/ransom', 'trojan:msil/ransom']):
return 'ransomware'
if label_checker(lbl, ['pws:', 'trojan:win32/banker', 'trojanspy:win32/banker']): # trojanspy also considered banker
return 'banker'
if label_checker(lbl, ['monitoringtool', 'trojanspy', 'spyware']):
return 'spyware'
if label_checker(lbl, ['trojandownloader', 'trojandropper', 'trojan', 'trojanclicker', 'trojanproxy',
'virtool', 'virus', 'exploit', 'program:win32/pameseg.aa', 'hacktool',
'browsermodifier:win32/spacekito']): # could only use trojan
return 'trojan'
if label_checker(lbl, ['Backdoor', 'ddos', 'flooder', 'remoteaccess', 'spammer', 'constructor:win32/bifrose',
'constructor:win32/eyestye']):
return 'bot'
if label_checker(lbl, ['worm']):
return 'worm'
if label_checker(lbl, ['rogue']):
return 'fakeav'
if label_checker(lbl, ['dialer', 'joke', 'softwarebundler:win32/costmin', 'softwarebundler:win32/chindo',
'softwarebundler:win32/squarenet', 'misleading:win32/optimizerelite',
'misleading:win32/perfectoptimizer', 'softwarebundler:win32/lolliport']):
return 'pup'
elif vnd == 'mcafee':
if label_checker(lbl, ['adware']):
return 'adware'
if label_checker(lbl, ['spyware', 'spy-']):
return 'spyware'
if label_checker(lbl, ['trojan', 'exploit', 'htool', 'new malware', 'obfuscated', 'packed',
'w32/virut', 'keylog-']):
return 'trojan'
if label_checker(lbl, ['pws-', 'BackDoor', 'bot-', 'new backdoor', 'pwszbot', 'w32/ircbot',
'zeroaccess']):
return 'bot'
if label_checker(lbl, ['pws-']): # can't get here!
return 'banker'
if label_checker(lbl, ['FakeAlert', 'Fake-SecTool', 'fakesectool-fch']):
return 'fakeav'
if label_checker(lbl, ['Ransom', 'RDN/Ransom']):
return 'ransomware'
if label_checker(lbl, ['potentially unwanted program', 'pup-', 'dialer-', 'joke', 'cryptinno',
'somoto-betterinstaller', 'cryptvittalia']):
return 'pup'
if label_checker(lbl, ['w32/autorun.worm', 'w32/worm', 'rdn/sdbot.worm']):
return 'worm'
if label_checker(lbl, ['artemis', 'generic', 'rdn/generic', 'rdn/suspicious']):
return 'generic'
elif vnd == 'symantec':
if label_checker(lbl, ['adware', 'download.adware']):
return 'adware'
if label_checker(lbl, ['spyware', 'winspy']):
return 'spyware'
if label_checker(lbl, ['Trojan.FakeAV']):
return 'fakeav'
if label_checker(lbl, ['trojan.ransom']):
return 'ransomware'
if label_checker(lbl, ['infostealer', 'trojan.zbot']):
return 'banker'
if label_checker(lbl, ['trojan', 'packed', 'bo2k.trojan']) and not label_checker(lbl, ['trojan.gen']):
return 'trojan'
if label_checker(lbl, ['Backdoor', 'ddos.trojan', 'irc trojan', 'irc.backdoor.trojan', 'hacktool']):
return 'bot'
if label_checker(lbl, ['pua.', 'joke', 'dialer', 'softwareversionupdater', 'yontoo.c', 'regcleanpro',
'bitcoinminer', 'vopackage', 'advsystemprotector', 'anyprotect', 'optimizerpro',
'optimumpcboost', 'speedupmypc']):
return 'pup'
if label_checker(lbl, ['w32.spybot.worm', 'w32.cridex', 'w32.mabezat', 'als.kenilfe', 'w32.botou',
'w32.changeup!gen15', 'w32.colowned.a', 'w32.downadup.b', 'w32.gammima.ag',
'w32.hitapop', 'w32.hllw.electron', 'w32.looked.bk', 'w32.mibling', 'w32.pagipef.b',
'w32.phopifas', 'w32.pilleuz', 'w32.pykspa.d', 'w32.qakbot', 'w32.rotinom',
'w32.sillydc', 'w32.sillyfdc', 'w32.sillyim', 'w32.spyrat', 'w32.svich',
'w32.tapin', 'w32.wergimog.b', 'w32.winiga', 'w32.extrat', 'w32.harakit', 'w32.shadesrat',
'w32.ircbot', 'w32.inabot']):
return 'worm'
if label_checker(lbl, ['heuristic', 'ws.reputation', 'trojan.gen', 'suspicious']):
return 'generic'
elif vnd == 'kaspersky':
if label_checker(lbl, ['not-a-virus:adware', 'not-a-virus:heur:adware']):
return 'adware'
if label_checker(lbl, ['not-a-virus:monitor']):
return 'spyware'
if label_checker(lbl, ['trojan-psw', 'trojan-spy', 'trojan-banker']):
return 'banker'
if label_checker(lbl, ['trojan-fakeav', 'Trojan.Win32.FakeAV']):
return 'fakeav'
if label_checker(lbl, ['Trojan-Ransom']):
return 'ransomware'
if label_checker(lbl, ['Backdoor', 'HackTool', 'Net-Worm', 'VirTool', 'dos.', 'trojan-ddos']):
return 'bot'
if label_checker(lbl, ['trojan', 'trojan-dropper', 'exploit', 'flooder', 'packed', 'rootkit',
'trojan-clicker', 'trojan-downloader', 'virus',
'uds:dangerousobject.multi.generic']):
return 'trojan'
if label_checker(lbl, ['email-worm', 'im-worm', 'p2p-worm', 'worm.']):
return 'worm'
if label_checker(lbl, ['heur:']):
return 'generic'
if label_checker(lbl, ['hoax:', 'not-a-virus:webtoolbar']):
return 'pup'
elif vnd in ['avast', 'avg', 'sophos', 'bitdefender', 'eset-nod32']:
if label_checker(lbl, ['adware'], sw=False):
return 'adware'
if label_checker(lbl, ['banker'], sw=False):
return 'banker'
if label_checker(lbl, ['pup', 'pua'], sw=False):
return 'pup'
if label_checker(lbl, ['ransom'], sw=False):
return 'ransomware'
if label_checker(lbl, ['worm'], sw=False):
return 'worm'
if label_checker(lbl, ['spyware'], sw=False):
return 'spyware'
if label_checker(lbl, ['fakeav'], sw=False):
return 'fakeav'
if vnd == 'mcafee':
lbl = lbl.split('!')[0]
return lbl
def label_checker(lbl, sig, sw=True):
for s in sig:
if sw:
if lbl.startswith(s.lower()):
return True
else:
if lbl.find(s.lower()) != -1:
return True
return False
def get_type_v1(av_labels):
sorted_av_types = []
types = dict()
dropper = True
has_known_label_av = False
for x in av_labels:
if x.lower() in ['microsoft', 'trendmicro', 'symantec', 'kaspersky', 'mcafee']:
has_known_label_av = True
break
for x in av_labels:
avv = x
lbl = av_labels[avv]
current_mal_type = mal_type(avv, lbl)
if current_mal_type not in ['trojan', 'worm', 'bot', 'ransomware', 'fakeav',
'banker', 'pup', 'adware', 'spyware', 'generic']:
current_mal_type = 'unknown'
if dropper:
if is_dropper(lbl):
current_mal_type = 'dropper'
else:
if current_mal_type == 'unknown' and is_dropper(lbl):
current_mal_type = 'dropper'
if current_mal_type == 'unknown':
if has_known_label_av:
if avv.lower() not in ['microsoft', 'trendmicro', 'symantec', 'kaspersky', 'mcafee']:
continue
if current_mal_type not in types:
types[current_mal_type] = 0
types[current_mal_type] += 1
sorted_av_types.append([avv, simplify_av_label(avv, lbl), current_mal_type])
sorted_av_types = sorted(sorted_av_types, key=itemgetter(0))
consolidated_types = dict()
for t in types:
if t in ['trojan', 'worm', 'bot', 'fakeav', 'ransomware', 'banker', 'dropper']:
key = 'trojan'
elif t in ['pup', 'adware', 'spyware']:
key = 'pup'
elif t in ['generic']:
key = 'generic'
else:
key = 'unknown'
if key not in consolidated_types:
consolidated_types[key] = 0
consolidated_types[key] += types[t]
total_votes = sum(types.values())
output = dict()
for t in types:
output[t] = types[t] * 1.0 / total_votes
types = sorted(types.items(), key=itemgetter(1), reverse=True)
output = sorted(output.items(), key=itemgetter(1), reverse=True)
total_votes = sum(consolidated_types.values())
consolidated_output = dict()
for t in consolidated_types:
consolidated_output[t] = consolidated_types[t] * 1.0 / total_votes
consolidated_types = sorted(consolidated_types.items(), key=itemgetter(1), reverse=True)
consolidated_output = sorted(consolidated_output.items(), key=itemgetter(1), reverse=True)
return {'types': output, 'consolidated': consolidated_output,
'final_type': majority_mal_type(output, consolidated_output), 'details': sorted_av_types}
def is_dropper(lbl):
lbl = lbl.lower()
droppers = ['dropper', 'downloader', 'downldr', 'dowloader']
for drp in droppers:
if lbl.find(drp) != -1:
return True
return False
def simplify_av_label(av, lbl):
av = av.lower()
lbl = lbl.lower()
if av == 'trendmicro':
lbl = lbl.rsplit('.', 1)[0]
elif av == 'microsoft':
lbl = lbl.split('!')[0]
lbl = lbl.rsplit('.', 1)[0]
elif av == 'kaspersky':
lbl = lbl.rsplit('.', 1)[0]
elif av == 'symantec':
if len(lbl.split('.')) > 2:
lbl = lbl.rsplit('.', 1)[0]
elif av == 'mcafee':
lbl = lbl.split('!')[0]
lbl = lbl.rsplit('.', 1)[0]
else:
# raw_input('ERROR: AV %s not supported' % av)
pass
return lbl
def get_type_v2(av_labels, unanimous=False, majority=False):
type_info = get_type_v1(av_labels)
types = type_info['types']
final_type = type_info['final_type']
if len(types) == 1:
return {'final_type': final_type, 'rule': 'no_conflict', 'spc': False}
if unanimous:
return {'final_type': 'unknown', 'rule': '', 'spc': False}
if majority:
votes = __count_votes(types)
if votes['max'] > 0.5:
return {'final_type': votes['max_type'][0], 'rule': 'vote', 'spc': False}
else:
return {'final_type': 'unknown', 'rule': '', 'spc': False}
specificity = False
while True:
votes = __count_votes(types)
if votes['max'] > 0.5:
if votes['max_type'][0] in ['generic', 'unknown']:
types = types[1:]
if __subtract(types, ['generic', 'trojan', 'unknown']) == []:
return {'final_type': votes['max_type'][0], 'rule': 'vote', 'spc': specificity}
else:
specificity = True
continue
else:
return {'final_type': votes['max_type'][0], 'rule': 'vote', 'spc': specificity}
elif len(votes['max_type']) == 1:
if votes['max_type'][0] in ['generic', 'unknown', 'trojan']:
types = types[1:]
if __subtract(types, ['generic', 'trojan', 'unknown']) == []:
return {'final_type': votes['max_type'][0], 'rule': 'vote', 'spc': specificity}
else:
specificity = True
continue
else:
return {'final_type': votes['max_type'][0], 'rule': 'vote', 'spc': specificity}
else:
if len(votes['spc']) == 1:
specificity = True
return {'final_type': votes['spc'][0], 'rule': 'vote', 'spc': specificity}
elif len(votes['spc']) > 1:
if len(votes['spc']) == 2 and set(votes['spc']) == set(['adware', 'pup']):
return {'final_type': 'pup', 'rule': 'vote', 'spc': specificity}
else:
return {'final_type': 'trojan', 'rule': 'mix', 'spc': specificity}
else: # if len(spc) == 0
if 'trojan' in votes['max_type']:
return {'final_type': 'trojan', 'rule': 'vote', 'spc': specificity}
types = types[len(votes['max_type']):]
if len(types) == 0:
if 'trojan' in votes['max_type']:
return {'final_type': 'trojan', 'rule': 'vote', 'spc': specificity}
elif 'generic' in votes['max_type']:
return {'final_type': 'generic', 'rule': 'vote', 'spc': specificity}
else:
return {'final_type': 'unknown', 'rule': 'vote', 'spc': specificity}
else:
if __subtract(types, ['generic', 'trojan', 'unknown']) == []:
if 'trojan' in votes['max_type']:
return {'final_type': 'trojan', 'rule': 'vote', 'spc': specificity}
elif 'generic' in votes['max_type']:
return {'final_type': 'generic', 'rule': 'vote', 'spc': specificity}
else:
return {'final_type': 'unknown', 'rule': 'vote', 'spc': specificity}
else:
specificity = True
continue
def get_type_v3(av_labels):
if args.mode == 'unanimous':
unanimous = True
elif args.mode == 'majority':
majority = True
else:
unanimous = False
majority = False
return get_type_v2(av_labels, unanimous=unanimous, majority=majority)['final_type']
def majority_mal_type(sorted_types, sorted_consolidatd):
if len(sorted_types) == 1:
return sorted_types[0][0]
if sorted_types[0][1] > sorted_types[1][1]:
if sorted_types[0][1] in ['banker', 'fakeav', 'worm', 'bot', 'ransomware', 'spyware',
'adware', 'pup', 'dropper']:
return sorted_types[0][1]
if sorted_types[0][0] == 'trojan':
real = []
genunk = []
for x in sorted_types:
if x[0] in ['banker', 'fakeav', 'worm', 'bot', 'ransomware', 'spyware', 'dropper']:
real.append(x[0])
if x[0] in ['unknown', 'generic', 'pup', 'adware']:
genunk.append(x[0])
if len(genunk) == 0:
if len(real) == 1:
return real[0]
else:
return 'trojan'
else:
return 'trojan'
if sorted_types[0][0] == 'generic':
real = []
genunk = []
trojan = False
for x in sorted_types:
if x[0] in ['banker', 'fakeav', 'worm', 'bot', 'ransomware', 'spyware', 'dropper']:
real.append(x[0])
if x[0] in ['unknown', 'pup', 'adware']:
genunk.append(x[0])
if x[0] == 'trojan':
trojan = True
if not trojan and len(genunk) == 0:
if len(real) == 1:
return real[0]
else:
return 'trojan'
else:
if len(real) > 0:
return 'trojan'
else:
return 'generic'
if sorted_types[0][0] == 'unknown':
real = []
genunk = []
trojan = False
for x in sorted_types:
if x[0] in ['banker', 'fakeav', 'worm', 'bot', 'ransomware', 'spyware', 'dropper']:
real.append(x[0])
if x[0] in ['generic', 'pup', 'adware']:
genunk.append(x[0])
if x[0] == 'trojan':
trojan = True
if not trojan and len(genunk) == 0:
if len(real) == 1:
return real[0]
elif len(real) > 1:
return 'trojan'
else:
return 'unknown'
else:
if len(real) > 0:
return 'trojan'
else:
if trojan:
return 'trojan'
else:
return 'unknown'
# if sorted_types[0][0] in ['generic', 'unknown']:
# others_sum = 0
# for x in sorted_consolidatd:
# if x[0] not in ['generic', 'unknown', 'pup']:
# others_sum += x[1]
# if others_sum > 0:
# return 'trojan'
# if sorted_consolidatd[0][1] > sorted_consolidatd[1][1]:
# return sorted_consolidatd[0][0]
# else:
# return sorted_types[0][0]
# else:
# return sorted_types[0][0]
else:
all_types = []
for x in sorted_types:
all_types.append(x[0])
if len(all_types) == 2:
if 'trojan' in all_types:
if set(['banker', 'fakeav', 'worm', 'bot', 'ransomware', 'dropper']) & set(all_types) != set([]):
if all_types[0] == 'trojan':
return all_types[1]
else:
return all_types[0]
if 'generic' in all_types or 'unknown' in all_types and not ('generic' in all_types and 'unknown' in all_types):
if set(['banker', 'fakeav', 'worm', 'bot', 'ransomware', 'dropper']) & set(all_types) != set([]):
if all_types[0] in ['generic', 'unknown']:
return all_types[1]
else:
return all_types[0]
if len(sorted_consolidatd) == 1:
return sorted_consolidatd[0][0]
if sorted_consolidatd[0][1] > sorted_consolidatd[1][1]:
return sorted_consolidatd[0][0]
else:
value = sorted_types[0][1]
for x in sorted_types:
if x[1] == value:
if x[0] in ['trojan', 'worm', 'bot', 'fakeav', 'ransomware', 'banker', 'dropper']:
return x[0]
for x in sorted_types:
if x[1] == value:
if x[0] in ['pup', 'adware', 'spyware']:
return x[0]
value = sorted_consolidatd[0][1]
for x in sorted_consolidatd:
if x[1] == value:
if x[0] in ['trojan']:
return x[0]
return sorted_types[0][0]
def __count_votes(types):
max_votes = 0
max_type = []
spc = []
for x in types:
if x[1] > max_votes:
max_votes = x[1]
for x in types:
if x[1] == max_votes:
max_type.append(x[0])
if x[0] not in ['generic', 'unknown', 'trojan']:
spc.append(x[0])
return {'max': max_votes, 'max_type': max_type, 'spc': spc}
def __subtract(types, l):
output = []
for x in types:
if x[0] not in l:
output.append(x)
return output
if __name__ == "__main__":
main()
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment