上QQ阅读APP看书，第一时间看更新

How it works...

First, we import the required libraries to handle argument parsing, creating counts of objects, parsing dates, writing XLSX spreadsheets, and our custom utilcsv module, which handles CSV reading and writing in this recipe:

from __future__ import print_function
import argparse
from collections import Counter
from datetime import datetime
import os
import sys
from utility import utilcsv

try:
    import xlsxwriter
except ImportError:
    print("[-] Install required third-party module xlsxwriter")
    sys.exit(1)

This recipe's command-line handler takes one positional argument: OUTPUT_DIR. This represents the desired output path for the XLSX file. Before calling the main() method, we check whether the output directory exists and create it if it does not:

if __name__ == "__main__":
    # Command-line Argument Parser
    parser = argparse.ArgumentParser(
        description=__description__,
        epilog="Developed by {} on {}".format(
            ", ".join(__authors__), __date__)
    )
    parser.add_argument("OUTPUT_DIR", help="Desired Output Path")
    args = parser.parse_args()

    if not os.path.exists(args.OUTPUT_DIR):
        os.makedirs(args.OUTPUT_DIR)

    main(args.OUTPUT_DIR)

The main() function is really quite simple; its job is to print a status message to the console, use the csv_reader() method, which is a slightly modified function from the previous recipe, and then write the resulting data to the output directory with the xlsx_writer() method:

def main(output_directory):
    print("[+] Reading in sample data set")
    # Skip first row of headers
    data = utilcsv.csv_reader("redacted_sample_event_log.csv")[1:]
    xlsx_writer(data, output_directory)

The xlsx_writer() starts by printing a status message and creating the workbook object in the output directory. Next, we create two worksheet objects for the dashboard and data worksheets. The dashboard worksheet will contain a graph summarizing the raw data on the data worksheet:

def xlsx_writer(data, output_directory):
    print("[+] Writing output.xlsx file to {}".format(output_directory))
    workbook = xlsxwriter.Workbook(
        os.path.join(output_directory, "output.xlsx"))
    dashboard = workbook.add_worksheet("Dashboard")
    data_sheet = workbook.add_worksheet("Data")

We use the add_format() method on the workbook object to create customized formats for the spreadsheet. These formats are dictionaries with key-value pairs configuring the format. Most of these keys are self-explanatory based on the key name. A description of the various format options and features can be found at http://xlsxwriter.readthedocs.io/format.html:

    title_format = workbook.add_format({
        'bold': True, 'font_color': 'white', 'bg_color': 'black',
        'font_size': 30, 'font_name': 'Calibri', 'align': 'center'
    })
    date_format = workbook.add_format(
        {'num_format': 'mm/dd/yy hh:mm:ss AM/PM'})

With the formats set, we can enumerate through the list of lists and write each using the write() method. This method takes a few inputs; the first and second arguments are the row and column followed by the value to write. Note that in addition to the write() method, we also use the write_number() and write_datetime() methods. These preserve the data type within the XLSX spreadsheet. Specifically, with the write_datetime() method, we supply it with the date_format variable to appropriately format the date object. After looping through all of the data, we have successfully stored the data within the spreadsheet and retained its value types. However, we can do much more than that with an XLSX spreadsheet.

We use the add_table() method to create a table of the data we just wrote. To accomplish this, we must supply the function using the Excel notation to denote the top-left and bottom-right columns of the table. Beyond that, we can also provide a dictionary of objects to further configure the table. In this case, the dictionary only contains the header names for each column of the table:

    for i, record in enumerate(data):
        data_sheet.write_number(i, 0, int(record[0]))
        data_sheet.write(i, 1, record[1])
        data_sheet.write(i, 2, record[2])
        dt = datetime.strptime(record[3], "%m/%d/%Y %H:%M:%S %p")
        data_sheet.write_datetime(i, 3, dt, date_format)
        data_sheet.write_number(i, 4, int(record[4]))
        data_sheet.write(i, 5, record[5])
        data_sheet.write_number(i, 6, int(record[6]))
        data_sheet.write(i, 7, record[7])

    data_length = len(data) + 1
    data_sheet.add_table(
        "A1:H{}".format(data_length),
        {"columns": [
            {"header": "Index"},
            {"header": "File Name"},
            {"header": "Computer Name"},
            {"header": "Written Date"},
            {"header": "Event Level"},
            {"header": "Event Source"},
            {"header": "Event ID"},
            {"header": "File Path"}
        ]}
    )

With the data worksheet complete, let's now turn our focus on the dashboard worksheet. We will create a graph on this dashboard, breaking down the event IDs by frequency. First, we calculate this frequency using a Counter object, as shown in the HTML dashboard recipe. Next, we set a title for this page by merging a number of columns and setting the title text and format.

Once that is complete, we iterate through the frequency of event IDs Counter object and write them to the worksheet. We write them starting at row 100 to make sure the data is out of the way and not at the forefront. Once this data is written, we convert it into a table using the same method discussed previously:

    event_ids = Counter([x[6] for x in data])
    dashboard.merge_range('A1:Q1', 'Event Log Dashboard', title_format)
    for i, record in enumerate(event_ids):
        dashboard.write(100 + i, 0, record)
        dashboard.write(100 + i, 1, event_ids[record])

    dashboard.add_table("A100:B{}".format(
        100 + len(event_ids)),
        {"columns": [{"header": "Event ID"}, {"header": "Occurrence"}]}
    )

Finally, we can plot this chart we keep talking about. We use the add_chart() method and specify the type as a bar chart. Next, we use the set_title() and set_size() methods to properly configure this graph. All that is left is to use the add_series() method to add the data to the chart. This method takes a dictionary with a category and values key. In a bar chart, the categories values represent the x axis and the values represent the y axis. Note the use of Excel notation to designate the range of cells that make up the categories and values keys. Once the data has been selected, we use the insert_chart() method on the worksheet object to display it before closing the workbook object:

    event_chart = workbook.add_chart({'type': 'bar'})
    event_chart.set_title({'name': 'Event ID Breakdown'})
    event_chart.set_size({'x_scale': 2, 'y_scale': 5})

    event_chart.add_series(
        {'categories': '=Dashboard!$A$101:$A${}'.format(
            100 + len(event_ids)),
         'values': '=Dashboard!$B$101:$B${}'.format(
             100 + len(event_ids))})
    dashboard.insert_chart('C5', event_chart)

    workbook.close()

When we run this script, we can review the data in an XLSX spreadsheet and the chart we created summarizing Event IDs: