Learning Python for Forensics
上QQ阅读APP看书,第一时间看更新

Standard data types

With our first script complete, it is now time to understand the basic data types of Python. These data types are similar to those found in other programming languages, but are invoked with a simple syntax described in the following table and sections. For a full list of standard data types available in Python, visit the official documentation at http://docs.python.org/2/library/stdtypes.html.

You will find that constructing most of our scripts can be accomplished using only the standard data types that Python offers. Before we take a look at one of the most common data types, strings, we will introduce comments.

Something that is always said, and can never be said enough, is to comment your code. In Python, comments are formed by a line beginning with the # symbol. When Python encounters this symbol, it skips the remainder of the line and proceeds to the next line. For comments that span multiple lines, we can use three single or double quotes to mark the beginning and end of the comments rather than using a single pound symbol for every line. The following are the examples of types of comments in a file called comments.py. When running this script, we should only see 10 printed to the console, as all comments are ignored.

001 # This is a comment
002 print 5 + 5 # This is an inline comment. Everything to the right of the # symbol does not get executed
003 """We can use three quotes to create
004 multi-line comments."""

Strings and Unicode

Strings is a data type that contains any character including alphanumeric characters, symbols, Unicode, and other codecs. With the vast amount of information that can be stored as a string, it is no surprise that they are one of the most common data types. Examples of areas where strings are found include reading arguments at the command line, user input, data from files, and outputting data. To begin with, let us look at how we can define a string in Python.

There are three ways to create a string: single quotes, double-quotes, or the built-in str() constructor method. Note, there is no difference between single and double quoted strings. Having multiple ways to create a string is advantageous, as it gives us the ability to differentiate between intentional quotes within a string. For example, in the 'I hate when people use "air-quotes"!' string, we use the single quotes to demarcate the beginning and end of the main string. The double quotes inside the string will not cause any issue with the Python interpreter. Let's verify with the type() function that both single and double quotes create the same type of object.

>>> type('Hello World!')
<type 'str'>
>>> type("Foo Bar 1234")
<type 'str'>

As we saw in the case of comments, a block string can be defined by three single or double quotes to create multi-line strings.

>>> """This is also a string"""
'This is also a string'
>>> '''it 
... can span 
... several lines'''
'it\ncan span\nseveral lines'

The \n character in the returned line signifies a line feed or a new line. The output in the interpreter displays these new line characters as \n, although when it's fed into a file or console, a new line is created. The \n is one of the most common escape characters in Python. Escape characters are denoted by a backslash followed by a specific character. Other common escape characters include \t for horizontal tabs, \r for carriage returns, \', \", and \\ for literal single quotes, double quotes, and backslashes among others. Literal characters allow us to use these characters without unintentionally using their special meaning in Python's context.

We can also use the add (+) or multiply (*) operators with strings. The add operator is used to concatenate strings together and the multiply operator will repeat the provided string values.

>>> 'Hello' + ' ' + 'World'
'Hello World'
>>> "Are we there yet? " * 3
'Are we there yet? Are we there yet? Are we there yet?'

Let's look at some common functions that we use with strings. We can remove characters from the start or end of a string using the strip() function. The strip() function requires the character that we want to remove as its input or will replace whitespace if we omit the argument. Similarly, the replace() function takes two inputs: the character to replace and what to replace it with.

# This will remove the colon (`:`) from the start and/or end of the line
>>> ':HelloWorld:'.strip(':') 
HelloWorld

# This will remove the colon (`:`) from the line and place a space (` `) in it's place
 >>> 'Hello:World'.replace(':', ' ') 
Hello World

Using the in statement, we can check if a character or characters is in a string or not. We can also be more specific and check if a string startswith() or endswith() a specific character or characters (you know a language is easy to understand when you can create sensible sentences out of functions). These methods return True or False Boolean objects.

>>> 'a' in 'Chapter 2'
True
>>> 'Chapter 1'.startswith('Chapter')
True
>>> 'Chapter 1'.endswith('1')
True

We can quickly split a string into a list based on some delimiter. This can be helpful to quickly convert data separated by a delimiter into a list. For example, the CSV (comma separated values) data is separated by commas and can be split on that value.

>>> print "This string is really long! It should probably be on two lines.".split('!')
["This string is really long", " It should probably be on two lines."]

Strings can be used to capture Unicode or raw data by prepending either a u or r to the string prior to the opening quote.

>>> u'This is a unicode string'
u'This is a unicode string'
>>> r'This is a raw string, good to capture escape characters such as \ which can break strings'
r'This is a raw string, good to capture escape characters such as \ which can break strings'

Formatting parameters can be used on strings to manipulate and convert them depending on the provided values. With the .format() function, we can insert values into strings, pad numbers, and display patterns with simple formatting. This chapter will highlight a few examples of the .format() method; we will introduce its more complex features throughout this book. The .format() method replaces curly brackets with the provided values in order. This is the most basic operation for inserting values into a string dynamically.

>>> "{} {} {} {}".format("Formatted", "strings", "are", "easy!")
'Formatted strings are easy!'

Our second example displays some of the expressions that we can use to manipulate a string. Inside the curly brackets, we place a colon which indicates that we are going to specify a format for interpretation. We specify that at least 6 characters should be printed following this colon. If the supplied input is not 6 characters long, we prepend zeroes to the beginning of the input.

>>> "{:06d}".format(42)
'000042'

Lastly, the d character specifies that the input will be a base 10 decimal. Our last example demonstrated how we can easily print a string of 20 equals signs by stating that our fill character is the equals symbol, followed by the caret (to center the symbols in the output), and the number of times to repeat the symbol. By providing this format string, we can quickly create visual separators in our outputs.

>>> "{:=^20}".format('')
'===================='

Integers and floats

The integer is another valuable data type that is frequently used. An integer is any whole positive or negative number. The float data type is similar, but it allows us to use numbers requiring decimal level precision. With integers and floats we can use standard mathematical operations, such as: +, -, *, and /. These operations return slightly different results based on the object's type (for example, integer or float).

Integer uses whole numbers and rounding; for example, dividing two integers will result in another whole number integer. However, by using one float in the equation, even one that has the same value as the integer, will result in a float. For example, 3/2=1 and 3/2.0=1.5 in Python. The following are the examples of integer and float operations:

>>> type(1010)
<type 'int'>
>>> 127*66
8382
>>> 66/10
6
>>> 10 * (10 - 8)
20

We can use ** to raise an integer by a power. For example, in the following section we raise 11 by the power of 2. In programming, it can be helpful to determine the numerator resulting from the division between two integers. For this, we use the modulo or the percent (%) symbol. With Python, negative numbers are those with a dash character (-) preceding the value. We can use the built-in abs() function to get the absolute value of any integer or float.

>>> 11**2
121
>>> 11 % 2 # 11 divided by 2 is 5.5 or 5 ½. 
1
>>> abs(-3)
3

A float is defined by any number with a decimal. Floats follow the same rules and operations as integers, with the exception of the division behavior described earlier:

>>> type(0.123)
<type 'float'>
>>> 1.23 * 5.23
6.4329
>>> 27/8.0
3.375

Booleans and None

The integers 1 and 0 can also represent Boolean values in Python. These values are the Boolean True or False objects, respectively. To define a Boolean, we can use the bool() constructor statement. These data types are used extensively in program logic to evaluate statements for conditionals, as covered later in this chapter.

Another built-in data type is the null type, which is defined by the keyword None. When used, it represents an empty object, and when evaluated, it will return False. This is helpful when initializing a variable that may use several data types throughout the execution. By assigning a null value, the variable remains sanitized until reassigned:

>>> bool(0)
False
>>> bool(1)
True
>>> None
>>>

Structured data types

There are several data types that are more complex and allow us to create structures of raw data. These include lists, dictionaries, sets, and tuples. Most of these structures are comprised of previously mentioned data types. These structures are very useful in creating powerful units of values, thus allowing raw data to be stored in a manageable manner.

Lists

Lists are a series of ordered elements. A list supports any data type as an element and will maintain the order of data as they are appended to the list. Elements can be called by position or a loop can be used to step through each item. In Python, unlike other languages, printing a list takes one line. In languages such as Java or C++ it can take three or more lines to print a list. Lists in Python can be as long as needed and can expand or contract on the fly, another feature uncommon in other languages.

We can create lists using brackets with elements separated by a comma, or we can use the list() class constructor with any iterable object. List elements can be accessed by index, where 0 is the first element. To access an element by position, we place the desired index in brackets following the list object. Rather than knowing how long a list is (which can be accomplished with the len() function) we can use negative index numbers to access the last elements in a list.

>>> type(['element1', 2, 6.0, True, None, 234])
<type 'list'>
>>> list('element')
 ['e', 'l', 'e', 'm', 'e', 'n', 't']
>>> len([0,1,2,3,4,5,6])
7
>>> ['hello_world', 'foo bar'][0]
hello_world
>>> ['hello_world', 'foo_bar'][-1]
foo_bar

We can add, remove, or check if a value is in a list using a couple of different functions. First, let's create a list of animals using brackets and assigning it to the variable my_list. Variables are aliases referring to Python objects. We will discuss variables in much greater detail later in this chapter. The append() method adds data to the end of the list which we can verify by printing said list afterwards. Alternatively, the insert() method allows us to specify an index when adding data to the list. For example, we can add the string "mouse" to the beginning, or the zeroth index, of our list.

>>> my_list = ['cat', 'dog']
>>> my_list.append('fish')
>>> print my_list
['cat', 'dog', 'fish']
>>> my_list.insert(0, 'mouse')
>>> print my_list
['mouse', 'cat', 'dog', 'fish']

The pop() and remove() functions delete data from a list either by index or by a specific object, respectively. If an index is not supplied with the pop() function, the last element in the list is popped. This returns the last element in the list to the interactive prompt. We can then print the list to verify that the last element was indeed popped. Note that the remove() function gets rid of the first instance of the supplied object in the list and does not return the item removed to the interactive prompt.

>>> your_list = [0, 1, 2]
>>> your_list.pop()
2
>>> print your_list
[0, 1]
>>> our_list = [3, 4, 5]
>>> our_list.pop(1) 
4
>>> print our_list
[3, 5]
>>> everyones_list = [1, 1, 2 ,3]
>>> everyones_list.remove(1)
>>> print everyones_list
[1, 2, 3]

We can use the in statement to check if some objects are in the list. The count() function tells us how many instances of an object are there in the list.

>>> 'cat' in ['mountain lion', 'ox', 'cat']
True
>>> ['fish', 920.5, 3, 5, 3].count(3)
2

If we want to access a subset of elements, we can use a list slice notation. Other objects, such as strings, also support this same slice notation to obtain a subset of data. Slice notation has the following format, where "a" is our list or string object:

a[x:y:z]

In the preceding example, X represents the start of the slice, Y represents the end of the slice, and Z represents the step of the slice. Note that each segment is separated by colons and enclosed in square brackets. A negative step is a quick way to reverse the contents of an object that supports the slice notation. Each of these arguments is optional. In the first example, our slice returns the second element and up to, but not including, the fifth element in the list. Using just one of these slice elements returns a list containing everything from the second index forward or everything up to and including the fifth index.

>>> [0,1,2,3,4,5,6][2:5]
[2, 3, 4]
>>> [0,1,2,3,4,5,6][2:]
[2, 3, 4, 5, 6]
>>> [0,1,2,3,4,5,6][:5]
[0, 1, 2, 3, 4]

Using the third slice element, we can skip every other element or simply reverse the list with a negative one. We can use a combination of these slice elements to specify how to carve a subset of data from the list.

>>> [0,1,2,3,4,5,6][::2]
[0, 2, 4, 6]
>>> [0,1,2,3,4,5,6][::-1]
[6, 5, 4, 3, 2, 1, 0]

Dictionaries

Dictionaries, otherwise known as dict, are another common Python data container. Unlike lists, this object does not add data in a linear fashion. Instead, data is stored as key and value pairs, where you can create and name keys to act as an index for stored values. It is important to note that dictionaries do not preserve the order in which items are added to them. They are used heavily in forensic scripting, as they allow us to store data in a manner that provides a known key to recall a value without needing to assign a lot of new variables. By storing data in dictionaries, it is possible to have one variable contain structured data.

We can define a dictionary using curly braces, where each key is a string and its corresponding value follows a colon. Additionally, we can use the dict() class constructor to instantiate dictionary objects. Calling a value from a dictionary is accomplished by specifying the key in brackets following the dictionary object. If we supply a key that does not exist, we will receive a KeyError (notice, we have assigned our dictionary to a variable, a). While we have not introduced variables at this point it is necessary here to highlight some of the functions specific to dictionaries.

>>> type({'Key Lime Pie': 1, 'Blueberry Pie': 2})
<type 'dict'>
>>> dict((['key_1', 'value_1'],['key_2', 'value_2']))
{'key_1': 'value_1', 'key_2': 'value_2'}
>>> a = {'key1': 123, 'key2': 456}
>>> a['key1']
123

We can add or modify the value of a pre-existing key in a dictionary by specifying a key and setting it equal to another object. We can remove objects using the pop() function, similar to the list pop() function to remove an item in a dictionary by specifying its key instead of an index:

>>> a['key3'] = 789
>>> print a
{'key3': 789, 'key2': 456, 'key1': 123}
>>> a.pop('key1')
123
>>> print a
{'key3': 789, 'key2': 456}

The keys() and values() functions return a list of keys and values present in the dictionary. We can use the items() function to return a list of tuples containing each key and value pair. These three functions are often used for conditionals and loops as shown:

>>> a.keys()
['key3', 'key2']
>>> a.values()
['789', '456']
>>> a.items()
[('key3', 789), ('key2', 456)]

Sets and tuples

Sets are similar to lists as they contain a list of elements, though they must be unique items. With this, the elements must be immutable, meaning that the value must remain constant. For this, sets are best used on integers, strings, Boolean, floats, and tuples as elements. Sets do not index the elements and therefore we cannot access the elements by their location in the set. Instead, we can access and remove elements through the use of the pop() method mentioned for the list method. Tuples are also similar to lists, though they are immutable. Built using parentheses in lieu of brackets, elements do not have to be unique and can be of any data type:

>>> type(set([1, 4, 'asd', True]))
<type 'set'>
>>> g = set(["element1", "element2"])
>>> print g
set(['element1', 'element2'])
>>> g.pop()
'element1'
>>> print g
set(['element2'])

# Defining a tuple
>>> tuple('foo')
('f', 'o' , 'o')
>>> ('b', 'a', 'r')
('b', 'a', 'r')
# Calling an element from a tuple
>>> ('Chapter1', 22)[0]
'Chapter1'
>>> ('Foo', 'Bar')[-1]
'Bar'