Python Algorithmic Trading Cookbook
上QQ阅读APP看书,第一时间看更新

Creating a pandas.DataFrame object

Now that we are done with handling date and time, let's move on to handling time series data. The pandas library has a pandas.DataFrame class, which is useful for handling and manipulating such data. This recipe starts by creating these objects.

How to do it...

Execute the following steps for this recipe:

  1. Import the necessary modules from the Python standard library:
>>> from datetime import datetime
>>> import pandas
  1. Create a sample time-series data as a list of dictionary objects. Assign it to time_series data:
>>> time_series_data = \
[{'date': datetime.datetime(2019, 11, 13, 9, 0),
'open': 71.8075, 'high': 71.845, 'low': 71.7775,
'close': 71.7925, 'volume': 219512},
{'date': datetime.datetime(2019, 11, 13, 9, 15),
'open': 71.7925, 'high': 71.8, 'low': 71.78,
'close': 71.7925, 'volume': 59252},
{'date': datetime.datetime(2019, 11, 13, 9, 30),
'open': 71.7925, 'high': 71.8125, 'low': 71.76,
'close': 71.7625, 'volume': 57187},
{'date': datetime.datetime(2019, 11, 13, 9, 45),
'open': 71.76, 'high': 71.765, 'low': 71.735,
'close': 71.7425, 'volume': 43048},
{'date': datetime.datetime(2019, 11, 13, 10, 0),
'open': 71.7425, 'high': 71.78, 'low': 71.7425,
'close': 71.7775, 'volume': 45863},
{'date': datetime.datetime(2019, 11, 13, 10, 15),
'open': 71.775, 'high': 71.8225, 'low': 71.77,
'close': 71.815, 'volume': 42460},
{'date': datetime.datetime(2019, 11, 13, 10, 30),
'open': 71.815, 'high': 71.83, 'low': 71.7775,
'close': 71.78, 'volume': 62403},
{'date': datetime.datetime(2019, 11, 13, 10, 45),
'open': 71.775, 'high': 71.7875, 'low': 71.7475,
'close': 71.7525, 'volume': 34090},
{'date': datetime.datetime(2019, 11, 13, 11, 0),
'open': 71.7525, 'high': 71.7825, 'low': 71.7475,
'close': 71.7625, 'volume': 39320},
{'date': datetime.datetime(2019, 11, 13, 11, 15),
'open': 71.7625, 'high': 71.7925, 'low': 71.76,
'close': 71.7875, 'volume': 20190}]
  1. Create a new DataFrame from time_series_data. Assign it to df and print it:
>>> df = pandas.DataFrame(time_series_data)
>>> df

We get the following output:

                 date    open    high     low   close volume
0 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
1 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
2 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
3 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
4 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
5 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
6 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
7 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
8 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
9 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875 20190
  1. Get the list of columns in df:
>>> df.columns.tolist()

We get the following output:

['date', 'open', 'high', 'low', 'close', 'volume']
  1. Create a DataFrame object again using the time_series_data. This time, specify the columns in the order you want:
>>> pandas.DataFrame(time_series_data, 
columns=['close','date', 'open', 'high', 'low', 'volume'])

We get the following output:

    close                date    open    high     low volume
0 71.7925 2019-11-13 09:00:00 71.8075 71.8450 71.7775 219512
1 71.7925 2019-11-13 09:15:00 71.7925 71.8000 71.7800 59252
2 71.7625 2019-11-13 09:30:00 71.7925 71.8125 71.7600 57187
3 71.7425 2019-11-13 09:45:00 71.7600 71.7650 71.7350 43048
4 71.7775 2019-11-13 10:00:00 71.7425 71.7800 71.7425 45863
5 71.8150 2019-11-13 10:15:00 71.7750 71.8225 71.7700 42460
6 71.7800 2019-11-13 10:30:00 71.8150 71.8300 71.7775 62403
7 71.7525 2019-11-13 10:45:00 71.7750 71.7875 71.7475 34090
8 71.7625 2019-11-13 11:00:00 71.7525 71.7825 71.7475 39320
9 71.7875 2019-11-13 11:15:00 71.7625 71.7925 71.7600 20190

How it works...

In step 1, you import the datetime class from the datetime module and the pandas package. In step 2, you create a time-series data, which is typically returned by 3rd party APIs for historical data. This data is a list of dictionaries, and each dictionary has the same set of keys—date, open, high, low, close, and volume. Observe that the value for the date key is a datetime object and for the other keys are float objects.

In step 3, you create a pandas DataFrame object by directly calling the constructor with time_series_data as an argument and assign the return data to df. The keys of the dictionaries become the column names of df and values become the data. In step 4, you fetch the columns of df as a list using the columns attribute and the tolist() method. You can verify that the column names are the same as the keys of the dictionaries in time_series_data.

In step 5, you create a DataFrame with the columns in a specific order by passing a columns argument to the constructor with the required order as a list of strings.

There's more

When a DataFrame object is created, an index is assigned to it automatically, which is an address for all the rows. The leftmost column in the preceding example is the index column. By default, the index starts from 0. A custom index can be set by passing an index argument to the DataFrame constructor with the required indices as an iterator. This is shown as follows:

  1. Create a new DataFrame object from time_series_data, with a custom index:
>>> pandas.DataFrame(time_series_data, index=range(10, 20)) 

We get the following output:

                  date    open    high     low   close volume
10 2019-11-13 09:00:00 71.8075 71.8450 71.7775 71.7925 219512
11 2019-11-13 09:15:00 71.7925 71.8000 71.7800 71.7925 59252
12 2019-11-13 09:30:00 71.7925 71.8125 71.7600 71.7625 57187
13 2019-11-13 09:45:00 71.7600 71.7650 71.7350 71.7425 43048
14 2019-11-13 10:00:00 71.7425 71.7800 71.7425 71.7775 45863
15 2019-11-13 10:15:00 71.7750 71.8225 71.7700 71.8150 42460
16 2019-11-13 10:30:00 71.8150 71.8300 71.7775 71.7800 62403
17 2019-11-13 10:45:00 71.7750 71.7875 71.7475 71.7525 34090
18 2019-11-13 11:00:00 71.7525 71.7825 71.7475 71.7625 39320
19 2019-11-13 11:15:00 71.7625 71.7925 71.7600 71.7875 20190

Note the index in the output starts from 10 and goes up to 19. The default index values would have ranged from 0 to 9.