Python Digital Forensics Cookbook
上QQ阅读APP看书,第一时间看更新

How it works...

First, we import the required libraries to write spreadsheets. Later on in this recipe, we also import the unicodecsv module:

from __future__ import print_function
import csv
import os
import sys

This recipe does not use argparse as a command-line handler. Instead, we directly call the desired functions based on the version of Python. We can determine the version of Python running with the sys.version_info attribute. If the user is using Python 2.X, we call both the csv_writer_py2() and unicode_csv_dict_writer_py2() methods. Both of these methods take four arguments, where the last argument is optional: these are the data to write, a list of headers, the desired output directory, and, optionally, the name of the output CSV spreadsheet. Alternatively, if Python 3.X is being used, we call the csv_writer_py3() method. While similar, CSV writing is handled a little differently between the two versions of Python, and the unicodecsv module is applicable only to Python 2:

if sys.version_info < (3, 0):
csv_writer_py2(TEST_DATA_LIST, ["Name", "Age", "Cool Factor"],
os.getcwd())
unicode_csv_dict_writer_py2(
TEST_DATA_DICT, ["Name", "Age", "Cool Factor"], os.getcwd(),
"dict_output.csv")

elif sys.version_info >= (3, 0):
csv_writer_py3(TEST_DATA_LIST, ["Name", "Age", "Cool Factor"],
os.getcwd())

This recipe has two global variables that represent sample data types. The first of these, TEST_DATA_LIST, is a nested list structure containing strings and integers. The second, TEST_DATA_DICT, is another representation of this data but stored as a list of dictionaries. Let's look at how the various functions write this sample data to the output CSV file:

TEST_DATA_LIST = [["Bill", 53, 0], ["Alice", 42, 5],
["Zane", 33, -1], ["Theodore", 72, 9001]]

TEST_DATA_DICT = [{"Name": "Bill", "Age": 53, "Cool Factor": 0},
{"Name": "Alice", "Age": 42, "Cool Factor": 5},
{"Name": "Zane", "Age": 33, "Cool Factor": -1},
{"Name": "Theodore", "Age": 72, "Cool Factor": 9001}]

The csv_writer_py2() method first checks whether the name input was provided. If it is still the default value of None, we simply assign the output name ourselves. Next, after printing a status message to the console, we open a File object in the "wb" mode in the desired output directory. Note that it is important to open CSV files in the "wb" mode in Python 2 to prevent intervening gaps between rows in the resulting spreadsheet. Once we have the File object, we use the csv.writer() method to convert this into a writer object. With this, we can use the writerow() and writerows() methods to write a single list of data and a nested list structure, respectively. Now, let's look at how unicodecsv works with lists of dictionaries:

def csv_writer_py2(data, header, output_directory, name=None):
if name is None:
name = "output.csv"

print("[+] Writing {} to {}".format(name, output_directory))

with open(os.path.join(output_directory, name), "wb") as csvfile:
writer = csv.writer(csvfile)
writer.writerow(header)

writer.writerows(data)

The unicodecsv module is a drop in for the built-in csv module and can be used interchangeably. The difference, and it's a big one, is that unicodecsv automatically handles Unicode strings in a way that the built-in csv module in Python 2 does not. This was addressed in Python 3.

First, we attempt to import the unicodecsv module and print a status message to the console if the import fails before exiting the script. If we are able to import the library, we check whether the name input was supplied and create a name if it wasn't, before opening a File object. With this File object, we use the unicodecsv.DictWriter class and supply it with the list of headers. This object, by default, expects the keys present in the supplied fieldnames list to represent all of the keys in each dictionary. If this behavior is not desired or if this is not the case, it can be ignored by setting the extrasaction keyword argument to the string ignore. Doing so will result in all additional dictionary keys not specified in the fieldnames list being ignored and not added to the CSV spreadsheet.

After the DictWriter object is set up, we use the writerheader() method to write the field names and writerows() to, this time, write the list of dictionaries to the CSV file. Another important thing to note is that the columns will be in the order of the elements in the supplied fieldnames list:

def unicode_csv_dict_writer_py2(data, header, output_directory, name=None):
try:
import unicodecsv
except ImportError:
print("[+] Install unicodecsv module before executing this"
" function")
sys.exit(1)

if name is None:
name = "output.csv"

print("[+] Writing {} to {}".format(name, output_directory))
with open(os.path.join(output_directory, name), "wb") as csvfile:
writer = unicodecsv.DictWriter(csvfile, fieldnames=header)
writer.writeheader()

writer.writerows(data)

Lastly, the csv_writer_py3() method operates in mostly the same fashion. However, note the difference in how the File object is created. Rather than opening a file in the "wb" mode, with Python 3, we open the file in the "w" mode and set the newline keyword argument to an empty string. After doing that, the rest of the operations proceed in the same manner as previously described:

def csv_writer_py3(data, header, output_directory, name=None):
if name is None:
name = "output.csv"

print("[+] Writing {} to {}".format(name, output_directory))

with open(os.path.join(output_directory, name), "w", newline="") as \
csvfile:
writer = csv.writer(csvfile)
writer.writerow(header)

writer.writerows(data)

When we run this code, we can look at either of the two newly generated CSV files and see the same information, as in the following screenshot: