Developing our first forensic script – usb_lookup.py
Now that we've gotten our feet wet writing our first Python script, let's write our first forensic script. During forensic investigations, it is not uncommon to see references to external devices by their vendor ID (VID) and product ID (PID) values; these values are represented by four hexadecimal characters. In cases where the vendor and product name are not identified, the examiner must look up this information. One such location for this information is the webpage http://linux-usb.org/usb.ids. For example, on this webpage, we can see that a Canon EOS Digital Rebel has a vendor ID of 0x04A9 and a product ID of 0x3084. We will use this data source when attempting to identify the vendor and product names, using the defined identifiers.
First, let's look at the data source we're going to be parsing. A hypothetical sample illustrating the structure of our data source is mentioned later. There are USB vendors and for each vendor, a set of USB products. Each vendor or product has four-digit hexadecimal characters and a name. What separates vendor and product lines are tabs because products are tabbed over once under their parent vendor. As a forensic developer, you will come to love patterns and data structures, as it is a happy day when data follows a strict set of rules. Because of this, we will be able to preserve the relationship between the vendor and its products in a simple manner. Here is the earlier-mentioned hypothetical sample:
0001 Vendor Name 0001 Product Name 1 0002 Product Name 2 ... 000N Product Name N
This script, named usb_lookup.py
, takes a VID and PID supplied by the user and returns the appropriate vendor and product names. Our program uses the urllib2
module to download the usb.ids
database to memory and create a dictionary of vendor IDs and their products. If a vendor and product combination are not found, error handling will inform the user of any partial results and exit the program gracefully.
The main()
function contains the logic to download the usb.ids
file, store it in memory, and create the USB dictionary using the getRecord()
helper function. The structure of the USB dictionary is somewhat complex and involves mapping a vendor ID to a list, containing the name of the vendor as the first element, and a product dictionary as the second element. This product dictionary maps product IDs to their names. The following is an example of the USB dictionary containing two vendors, VendorId_1
and VendorId_2
, each mapped to a list containing the vendor name, and a dictionary for any product ID and name pairs:
usbs = {VendorId_1: [VendorName_1, {ProductId_1: ProductName_1, ProductId_2: ProductName_2, ProductId_N: ProductName_N}], VendorId_2: [VendorName_2, {}], etc.}
It may be tempting to just search for VID and PID in the lines and return the names rather than creating this dictionary that links vendors to their products. However, products can share the same ID across different vendors, which could result in mistakenly returning a product from a different vendor. With our method, we can be sure that the product belongs to the associated vendor.
Once the USB dictionary has been created, the searchKey()
function is responsible for querying the dictionary for a match. It first checks that the user supplied two arguments, VID and PID, before continuing execution of the script. Next, it searches for a VID match in the outermost dictionary. If VID is found, the innermost dictionary is searched for the responsive PID. If both are found, the resolved names are printed to the console. See the following code:
001 import urllib2 002 import sys ... 009 def main(): ... 043 def getRecord(): ... 057 def searchKey(): ... 092 if __name__ == '__main__': 093 main()
For larger scripts, such as this, it is helpful to view a diagram that illustrates how these functions are connected together. Fortunately, a library named code2flow, available on Github (https://github.com/scottrogowski/code2flow.git), exists to automate this process for us. The following schematic illustrates the interactions between the main()
function, which first calls and receives data from getRecord()
, before calling the searchKey()
function. There are other libraries that can create similar flow charts. However, this library does a great job of creating a simple and easy to understand flow chart:
Understanding the main() function
Let's start by examining the main()
function, which is called on line 93 as seen in the previous code block. As discussed previously, the particular if
statement on line 92 is a common and preferred way of calling our starting function. The if
statement will only evaluate to True
if the script is called by the user. If, for example, it were imported, the main function would not be called and the script would not run. However, the function could still be called just like any other imported module as follows:
009 def main(): 010 """ 011 The main function opens the URL and parses the file before searching for the 012 user supplied vendor and product IDs. 013 :return: Nothing. 014 """
On lines 15 through 20, we create our initial variables. The url
variable stores the URL containing the USB data source. We use the urlopen()
function from the urllib2
module to create a list of strings from our online source. We will use a lot of string
operations, such as startswith()
, isdigit()
, islower()
, and count()
, to parse the file structure and store the parsed data in the usbs
dictionary. The curr_id
variable, defined as an empty string
on line 20, will be used to keep track of which vendor is currently being processed by our script:
015 url = 'http://www.linux-usb.org/usb.ids' 016 usbs = {} 017 # The urlopen function creates a file-like object and can be treated similarly. 018 usb_file = urllib2.urlopen(url) 019 # The curr_id variable is used to keep track of the current vendor being processed into the dictionary. 020 curr_id = ''
On line 22, we begin to iterate through our USB data we retrieved with urllib2
. Starting with line 24, we create a conditional clause to identify vendor, product, and trivial lines. Trivial lines, which our script passes on, are those that .start with()
a pound symbol or is a blank line. If a line is important, we check whether it is a vendor or product line as follows:
022 for line in usb_file: 023 # Any commented line or blank line should be skipped. 024 if line.startswith('#') or line == '\n': 025 pass
On line 28, we check whether the line does not start with a tab and that the first character is either a number or lower case. We perform this check because there are entries that do not follow the general structure seen in the first three quarters of the data source. Upon further inspection, the lines starting with an upper case character are inconsequential to our current task and so we disregard them:
026 else: 027 # Lines that are not tabbed are vendor lines. 028 if not(line.startswith('\t')) and (line[0].isdigit() or line[0].islower()):
If the line is what we have defined as a vendor line, we call the getRecord()
function on line 30 and pass the current line. The line is first stripped of any newline characters to avoid any potential errors in the getRecord()
function. Note that we have assigned two variables—id
and name
—to capture the returned output from the getRecord()
function. This is because the function returns a tuple
of values and hence why we capture the results into two variables. Consider the following code:
029 # Call the getRecord helper function to parse the Vendor Id and Name. 030 id, name = getRecord(line.strip())
On line 31, we set the curr_id
variable to the current vendor ID. We add our id
key to the usbs
dictionary
and map the key to a list containing the name of the vendor and an empty dictionary
, which we will populate with product information. This completes the logic necessary to handle vendor lines. Now, let's take a look at the following code to see how product lines are handled:
031 curr_id = id 032 usbs[id] = [name, {}]
A line is a product line if it starts with one tabbed character. There are lines in the dataset that contain more than one tab, but again, they are the last quarter of the data source that do not contribute to our current goal. If we do encounter a product line, we send it for processing on line 36 by passing it to the getRecord()
function.
On line 37, we update the usbs
dictionary by adding the id
key to the embedded dictionary
within the list
mapped to the curr
_id
key. Great, lots of brackets with words and numbers following our usbs
variable, you might ask "Didn't you tout Python as highly readable?" Yep. You'll grow accustomed to accessing items in this manner for lists
and dictionaries
, but let's break it down one by one.
The first variable, curr
_id
, is our current vendor ID. By the time we access a product line, its vendor line will have already been encountered and therefore will have been added as a key to usbs
dictionary
. At this point, we have access to list
, which contains the vendor name and the product dictionary
in the zero and first index, respectively. We want to add our product ID and name to the product dictionary
, which is accomplished by specifying the first index of the list. Then, it is a matter of setting the id
key to our name
value. See the following code:
033 # Lines with one tab are product lines 034 elif line.startswith('\t') and line.count('\t') == 1: 035 # Call the getRecord helper function to parse the Product Id and Name. 036 id, name = getRecord(line.strip()) 037 usbs[curr_id][1][id] = name
Finally, we call the searchKey()
function to look for the user-supplied vendor and product IDs in our usbs
dictionary. The line is indented in a manner so that it falls outside the logic of the for
loop and will not be called until all lines have been processed as follows:
039 # Pass the usbs dictionary and search for the supplied vendor and product id match. 040 searchKey(usbs)]
This takes care of the logic in the main()
function. Now, let's take a look at the getRecord()
function called on lines 30 and 36 to determine how we separate the id
and name
for vendor and product lines.
Exploring the getRecord() function
This helper function, defined on line 43, takes the vendor or product line and returns its ID and name. The following is an example of one of the vendor lines in the file and what our function outputs:
- Input from the
usb.ids
file:"04e8 Samsung Electronics Co., Ltd"
getRecord()
output: "04e8", "Samsung Electronics Co., Ltd"
We accomplish this by using string slicing. In the previous example, the vendor ID, 04E8, is separated from the name by one space. Specifically, it will be the first space on the left side of the string
. We use the string
.find()
function, which searches from left to right in a string for a given character. We give this function a space as its input to find the index in the string
of the first space. In the previous example, our split
variable would contain a value of 4
. Knowing this, we know that the ID is everything to the left of the space. Our name is everything to the right of the space. We add one to our split
variable for the name in order to not capture the space before the name:
043 def getRecord(record_line): 044 """ 045 The getRecord function splits the ID and name on each line. 046 :param record_line: The current line in the USB file. 047 :return: record_id and record_name from the USB file. 048 """ 049 # Find the first instance of a space -- should be right after the id 050 split = record_line.find(' ') 051 # Use string slicing to split the string in two at the space. 052 record_id = record_line[:split] 053 record_name = record_line[split + 1:] 054 return record_id, record_name
Interpreting the searchKey() function
The searchKey()
function, originally called on line 40 of the main()
function, is where we search for the user-supplied vendor and product IDs and display the results to the user. In addition, all of our error handling logic is contained within this function.
Let's practice accessing nested lists or dictionaries. We discussed this in the main()
function; however, it pays to actually practice rather than take our word for it. Accessing nested structures requires us to use multiple indices rather than just one. For example, let's create list
and map that to key_1
in a dictionary
. To access elements from the nested list
, we will need to supply key_1
to access the list
and then a numerical index to access elements of the list
:
>>> inner_list = ['a', 'b', 'c', 'd'] >>> print inner_list[0] a >>> outer_dict = {'key_1': inner_list} >>> print outer_dict['key_1'] ['a', 'b', 'c', 'd'] >>> print outer_dict['key_1'][3] d
Now, let's switch gears, back to the task at hand, and leverage our new skills to search our dictionary to find vendor and product IDs. The searchKey()
function is defined on line 57 and takes our parsed USB data as its only input as follows:
057 def searchKey(usb_dict): 058 """ 059 The search_key function looks for the user supplied vendor and product IDs against 060 the USB dictionary that was parsed in the main and getRecord functions. 061 :param usb_dict: The dictionary containing our list of vendors and products to query against. 062 :return: Nothing. 063 """
On line 66, we use a try
and except
block to assign the first and second index of the sys.argv
list to variables. If the user does not supply this input, we will receive an IndexError
, in which case, we print to the console, requiring additional input and exit our script with an error (traditionally, non-zero exits indicate an error) as follows:
064 # Accept user arguments for Vendor and Product Id. Error will be thrown if there are less 065 # than 2 supplied arguments. Any additional arguments will be ignored 066 try: 067 vendor_key = sys.argv[1] 068 product_key = sys.argv[2] 069 except IndexError: 070 print 'Please provide the vendor Id and product Id separated by spaces.' 071 sys.exit(1)
We use another try and except block on line 75 to find our vendor in the USB dictionary. When querying a dictionary by key, if that key is not present, Python will throw a KeyError
. If this is the case, then the vendor ID does not exist in our USB dictionary that we print to the user. Because this wasn't technically an error and just the case of our data source being incomplete, we perform a zero exit. We have the following code:
073 # Search for the Vendor Name by looking for the Vendor Id key in usb_dict. The zeroth index 074 # of the list is the Vendor Name. If Vendor Id is not found, exit the program. 075 try: 076 vendor = usb_dict[vendor_key][0] 077 except KeyError: 078 print 'Vendor Id not found.' 079 sys.exit(0)
In a similar manner, if the product key is not found under its vendor key, we print out this result to the user. In this scenario, we can at least print the vendor information to the console before exiting, as follows:
081 # Search for the Product Name by looking in the product dictionary in the first index of 082 # the list. If the Product Id is not found print the Vendor Name as a partial match and exit. 083 try: 084 product = usb_dict[vendor_key][1][product_key] 085 except KeyError: 086 print 'Vendor: {}\nProduct Id not found.'.format(vendor) 087 sys.exit(0)
If both our vendor and product IDs were found, we print this data to the user. On line 89, we use the string format()
function to print the vendor and product names separated by a newline character as follows:
088 # If both the Vendor and Product Ids are found, print their respective names to the console. 089 print 'Vendor: {}\nProduct: {}'.format(vendor, product)
Running our first forensic script
The usb_lookup
script requires two arguments—vendor and product IDs for the USB of interest. We can find this information by looking at a suspect HKLM\SYSTEM\%CurrentControlSet%\Enum\USB
registry key. For example, supplying the vendor, 0951, and product, 1643, from the subkey VID_0951&PID_1643
, results in our script identifying the device as a Kingston DataTraveler G3.
Our data source is not an all-inclusive list, and if you supply a vendor or a product ID that does not exist in the data source, our script will print that the ID was not found. The full code for this and all of our scripts can be downloaded from https://packtpub.com/books/content/support.