Python Digital Forensics Cookbook
上QQ阅读APP看书,第一时间看更新

How it works...

First, we import the required libraries to handle argument parsing, creating counts of objects, and copying files:

from __future__ import print_function
import argparse
from collections import Counter
import shutil
import os
import sys

This recipe's command-line handler takes one positional argument, OUTPUT_DIR, which represents the desired output path for the HTML dashboard. After checking whether the directory exists, and creating it if it doesn't, we call the main() function and pass the output directory to it:

if __name__ == "__main__":
# Command-line Argument Parser
parser = argparse.ArgumentParser(
description=__description__,
epilog="Developed by {} on {}".format(
", ".join(__authors__), __date__)
)
parser.add_argument("OUTPUT_DIR", help="Desired Output Path")
args = parser.parse_args()

main(args.OUTPUT_DIR)

Defined at the top of the script are a number of global variables: DASH, TABLE, and DEMO. These variables represent the various HTML and JavaScript files we create as a product of the script. This is a book about Python, so we will not get into the details of how these files are structured and how they work. However, let's look at an example to showcase how jinja2 bridges the gap between these types of files and Python.

A portion of the global variable DEMO is captured in the following snippet. Note that the string block is passed to the jinja2.Template() method. This allows us to create an object for which we can use jinja2 to interact with and dynamically insert data into the JavaScript file. Specifically, the following code block shows two locations where we can use jinja2 to insert data. These are denoted by the double curly braces and the keywords we will refer to them by in the Python code - pi_labels and pi_series, respectively:

DEMO = Template("""type = ['','info','success','warning','danger']; 
[snip] 
        Chartist.Pie('#chartPreferences', dataPreferences,
optionsPreferences);

Chartist.Pie('#chartPreferences', {
labels: [{{pi_labels}}],
series: [{{pi_series}}]
}); [snip] """)

Let's now turn our attention to the main() function. This function is really quite simple for reasons you will understand in the second recipe. This function creates a list of lists containing sample acquisition data, prints a status message to the console, and sends that data to the process_data() method:

def main(output_dir):
acquisition_data = [
["001", "Debbie Downer", "Mobile", "08/05/2017 13:05:21", "32"],
["002", "Debbie Downer", "Mobile", "08/05/2017 13:11:24", "16"],
["003", "Debbie Downer", "External", "08/05/2017 13:34:16", "128"],
["004", "Debbie Downer", "Computer", "08/05/2017 14:23:43", "320"],
["005", "Debbie Downer", "Mobile", "08/05/2017 15:35:01", "16"],
["006", "Debbie Downer", "External", "08/05/2017 15:54:54", "8"],
["007", "Even Steven", "Computer", "08/07/2017 10:11:32", "256"],
["008", "Even Steven", "Mobile", "08/07/2017 10:40:32", "32"],
["009", "Debbie Downer", "External", "08/10/2017 12:03:42", "64"],
["010", "Debbie Downer", "External", "08/10/2017 12:43:27", "64"]
]
print("[+] Processing acquisition data")
process_data(acquisition_data, output_dir)

The purpose of the process_data() method is to get the sample acquisition data into an HTML or JavaScript format that we can drop in place within the jinja2 templates. This dashboard is going to have two components: a series of charts visualizing the data and a table of the raw data. The following code block deals with the latter. We accomplish this by iterating through the acquisition list and adding each element of the table to the html_table string with the appropriate HTML tags:

def process_data(data, output_dir):
html_table = ""
for acq in data:
html_table += "<tr><td>{}</td><td>{}</td><td>{}</td><td>{}</td>" \
"<td>{}</td></tr>\n".format(
acq[0], acq[1], acq[2], acq[3], acq[4])

Next, we use the Counter() method from the collections library to quickly generate a dictionary-like object of the number of occurrences of each item in the sample data. For example, the first Counter object, device_types, creates a dictionary-like object where each key is a different device type (for example, mobile, external, and computer) and the value represents the number of occurrences of each key. This allows us to quickly summarize data across the data set and cuts down on the legwork required before we can plot this information.

Once we have created the Counter objects, we again iterate through each acquisition to perform a more manual summation of acquisition date information. This date_dict object maintains keys for all the acquisition data and adds the size of all acquisitions made on that day as the key's value. We specifically split on a space to isolate just the date value from the date-time string (for example, 08/15/2017). If the specific date is already in the dictionary, we add the acquisition size directly to the key. Otherwise, we create the key and assign its value to the acquisition size. Once we have created the various summarizing objects, we call the output_html() method to populate the HTML dashboard with this information:

    device_types = Counter([x[2] for x in data])
custodian_devices = Counter([x[1] for x in data])

date_dict = {}
for acq in data:
date = acq[3].split(" ")[0]
if date in date_dict:
date_dict[date] += int(acq[4])
else:
date_dict[date] = int(acq[4])
output_html(output_dir, len(data), html_table,
device_types, custodian_devices, date_dict)

The output_html() method starts by printing a status message to the console and storing the current working directory to a variable. We append the folder path to light-bootstrap-dashboard and use shutil.copytree() to copy the bootstrap files to the output directory. Following that, we create three file paths representing the output locations and names of the three jinja2 templates:

def output_html(output, num_devices, table, devices, custodians, dates):
print("[+] Rendering HTML and copy files to {}".format(output))
cwd = os.getcwd()
bootstrap = os.path.join(cwd, "light-bootstrap-dashboard")
shutil.copytree(bootstrap, output)

dashboard_output = os.path.join(output, "dashboard.html")
table_output = os.path.join(output, "table.html")
demo_output = os.path.join(output, "assets", "js", "demo.js")

Let's start by looking at the two HTML files, as these are relatively simple. After opening file objects for the two HTML files, we use the jinja2.render() method and use keyword arguments to refer to the placeholders in the curly brackets from the Template objects. With the file rendered with the Python data, we write the data to the file. Simple, right? The JavaScript file, thankfully, is not much more difficult:

    with open(dashboard_output, "w") as outfile:
outfile.write(DASH.render(num_custodians=len(custodians.keys()),
num_devices=num_devices,
data=calculate_size(dates)))

with open(table_output, "w") as outfile:
outfile.write(TABLE.render(table_body=table))

While syntactically similar to the previous code block, when we render the data this time, we feed the data to the return_labels() and return_series() methods. These methods take the key and values from the Counter objects and format them appropriately to work with the JavaScript file. You may have also noticed a call to the calculate_size() method in the previous code block called on the dates dictionary. Let's explore these three supporting functions now:

    with open(demo_output, "w") as outfile:
outfile.write(
DEMO.render(bar_labels=return_labels(dates.keys()),
bar_series=return_series(dates.values()),
pi_labels=return_labels(devices.keys()),
pi_series=return_series(devices.values()),
pi_2_labels=return_labels(custodians.keys()),
pi_2_series=return_series(custodians.values())))

The calculate_size() method simply uses the built-in sum() method to return each date key's total size collected. The return_labels() and return_series() methods use string methods to format the data appropriately. Essentially, the JavaScript file expects the labels to be within single quotes, which is accomplished with the format() method, and both labels and series must be comma-delimited:

def calculate_size(sizes):
return sum(sizes.values())


def return_labels(list_object):
return ", ".join("'{}'".format(x) for x in list_object)


def return_series(list_object):
return ", ".join(str(x) for x in list_object)

When we run this script, we receive a copy of the report in the specified output directory along with the required assets for loading and rendering the page. We can zip up this folder and provide it to team members, as it is designed to be portable. Viewing this dashboard shows us the first page with the chart information:

And the second page as the table of acquisition information: