Monday, May 27, 2013

Splunking Virustotal PoC

Doing malware analysis and research on a frequent basis I'm all about trying to make life easier, getting information faster. Bro, Splunk and Virustotal are tools that I'm constantly interfacing with. I thought it would be awesome if I could use Virustotal's api to search md5's gathered from Bro logs on Splunk. These three tools provide an amazing amount of useful information, with their powers combined I hoped it would make life a bit easier and help me connect the dots faster.



Requirements

To test this concept I'm using CentOS and the limited version of Splunk. Beyond that you will also need:

  • Register with Virustotal to get an API key.
  • Python Development libraries
  • Install Splunk and have log source containing md5's (Bro!)

Splunk Configuration

To get started we are going to create a generic Splunk app and copy over our python scripts. Next we configure the Splunk lookups and test it out.

Create a new Splunk App, choose "Manage apps..."
Click create app

Add in the name, location of app and save.

Now would be a great time to import some logs containing md5's or setup Bro and acquire them. You will want to extract the md5 field from your logs as well, or you can use rex on the fly.

Python Scripts

Since Splunk's version of Python is bare bones you'll need to create a wrapper that calls the actual script. Searching Splunk's site I found that someone had created a script already to do just this.

Save this to /opt/splunk/etc/app/vtLookup/bin/wrapper.py
import os, sys
for envvar in ("PYTHONPATH", "LD_LIBRARY_PATH"):
if envvar in os.environ:
del os.environ[envvar]
python_executable = "/usr/bin/python"
real_script = "/opt/splunk/etc/apps/vtlookup/bin/vt.py"
os.execv(python_executable, [ python_executable, real_script ] + sys.argv[1:])
Now we create the script that takes the md5 from Splunk and does a lookup using Virustotals API.

Save this to /opt/splunk/etc/app/vtLookup/bin/vtLookup.py Don't forget to enter in the API key.
import csv,sys,urllib,urllib2

def lookup(md5):
  try:
    response = urllib2.urlopen('https://www.virustotal.com/vtapi/v2/file/report', \
      'apikey=Enter in your API key here&resource=' + md5)
    lines = response.read()
    return lines
  except:
    return ''

def main():
  if len(sys.argv) != 3:
    print "python vt.py MD5 VT"
    sys.exit(0)

  md5f = sys.argv[1]
  vtf = sys.argv[2]
  r = csv.reader(sys.stdin)
  w = None
  header = []
  first = True

  for line in r:
    if first:
      header = line
      if vtf not in header or md5f not in header:
        print "missing vt or md5 field"
        sys.exit(0)
      csv.writer(sys.stdout).writerow(header)
      w = csv.DictWriter(sys.stdout, header)
      first = False
      continue

    result = {}
    i = 0
    while i < len(header):
      if i < len(line):
        result[header[i]] = line[i]
      else:
        result[header[i]] = ''
      i += 1

    if len(result[md5f]) and len(result[vtf]):
      w.writerow(result)
    elif len(result[md5f]):
      result[vtf] = lookup(result[md5f])
      if len(result[vtf]):
        w.writerow(result)

main()

Next we tell Splunk the location of the scripts and create a lookup. In the Splunk manager select Lookups:



Then Lookup definitions: 

The 'Type' is external since we are calling an external script. The command is 'wrapper.py md5 vt', supported fields are md5, vt. Once you have that entered in, click Save.



Splunking

Now lets test it out and see if it works. The query to test was to call one known good md5 and pass it to the lookup script. The first part is specifing fields that are not "-" then send it to top and only give me one result back. The part we are concerned with is "lookup vtLookup md5".

Running the search we see the new field "vt" with the response from Virustotal. Great! but I really want to search all time and find out some trends.

When I bump of the search to return 10 responses we start seeing no response from Virustotal since our api call requests are limited. Boooo. 




The limitations set by Virustotal doesn't make this very practical in Splunk. It was fun to try and maybe this will come in handy in the future.


Edit: Python scripts added to git repo https://code.google.com/p/splunk-virustotal/