Designing and executing an ML-driven strategy
In this book, we demonstrate how ML fits into the overall process of designing, executing, and evaluating a trading strategy. To this end, we'll assume that an ML-based strategy is driven by data sources that contain predictive signals for the target universe and strategy, which, after suitable preprocessing and feature engineering, permit an ML model to predict asset returns or other strategy inputs. The model predictions, in turn, translate into buy or sell orders based on human discretion or automated rules, which in turn may be manually encoded or learned by another ML algorithm in an end-to-end approach.
Figure 1.1 depicts the key steps in this workflow, which also shapes the organization of this book:
Figure 1.1: The ML4T workflow
Part 1 introduces important skills and techniques that apply across different strategies and ML use cases. These include the following:
- How to source and manage important data sources
- How to engineer informative features or alpha factors that extract signal content
- How to manage a portfolio and track strategy performance
Moreover, Chapter 8, The ML4T Workflow – From Model to Strategy Backtesting, in Part 2, covers strategy backtesting. We will briefly outline each of these areas before turning to relevant ML use cases, which make up the bulk of the book in Parts 2, 3, and 4.
Sourcing and managing data
The dramatic evolution of data availability in terms of volume, variety, and velocity is a key complement to the application of ML to trading, which in turn has boosted industry spending on the acquisition of new data sources. However, the proliferating supply of data requires careful selection and management to uncover the potential value, including the following steps:
- Identify and evaluate market, fundamental, and alternative data sources containing alpha signals that do not decay too quickly.
- Deploy or access a cloud-based scalable data infrastructure and analytical tools like Hadoop or Spark to facilitate fast, flexible data access.
- Carefully manage and curate data to avoid look-ahead bias by adjusting it to the desired frequency on a point-in-time basis. This means that data should reflect only information available and known at the given time. ML algorithms trained on distorted historical data will almost certainly fail during live trading.
We will cover these aspects in practical detail in Chapter 2, Market and Fundamental Data – Sources and Techniques, and Chapter 3, Alternative Data for Finance – Categories and Use Cases.
From alpha factor research to portfolio management
Alpha factors are designed to extract signals from data to predict returns for a given investment universe over the trading horizon. A typical factor takes on a single value for each asset when evaluated at a given point in time, but it may combine one or several input variables or time periods. If you are already familiar with the ML workflow (see Chapter 6, The Machine Learning Process), you may view alpha factors as domain-specific features designed for a specific strategy. Working with alpha factors entails a research phase and an execution phase as outlined in Figure 1.2:
Figure 1.2: The alpha factor research process
The research phase
The research phase includes the design and evaluation of alpha factors. A predictive factor captures some aspect of a systematic relationship between a data source and an important strategy input like asset returns. Optimizing the predictive power requires creative feature engineering in the form of effective data transformations.
False discoveries due to data mining are a key risk that requires careful management. One way of reducing the risk is to focus the search process by following the guidance of decades of academic research that has produced several Nobel prizes. Many investors still prefer factors that align with theories about financial markets and investor behavior. Laying out these theories is beyond the scope of this book, but the references highlight avenues to pe deeper into this important framing aspect.
Validating the signal content of an alpha factor requires a robust estimate of its predictive power in a representative context. There are numerous methodological and practical pitfalls that undermine a reliable estimate. In addition to data mining and the failure to correct for multiple testing bias, these pitfalls include the use of data contaminated by survivorship or look-ahead bias, not reflecting realistic Principal, Interest and Taxes (PIT) information. Chapter 4, Financial Feature Engineering – How to Research Alpha Factors, discusses how to successfully manage this process.
The execution phase
During the execution phase, alpha factors emit signals that lead to buy or sell orders. The resulting portfolio holdings, in turn, have specific risk profiles that interact and contribute to the aggregate portfolio risk. Portfolio management involves optimizing position sizes to achieve a balance of return and risk of the portfolio that aligns with the investment objectives.
Chapter 5, Portfolio Optimization and Performance Evaluation, introduces key techniques and tools applicable to this phase of the trading strategy workflow, from portfolio optimization to performance measurement.
Strategy backtesting
Incorporating an investment idea into a real-life algorithmic strategy implies a significant risk that requires a scientific approach. Such an approach involves extensive empirical tests with the goal of rejecting the idea based on its performance in alternative out-of-sample market scenarios. Testing may involve simulated data to capture scenarios deemed possible but not reflected in historic data.
To obtain unbiased performance estimates for a candidate strategy, we need a backtesting engine that simulates its execution in a realistic manner. In addition to the potential biases introduced by the data or a flawed use of statistics, the backtesting engine needs to accurately represent the practical aspects of trade-signal evaluation, order placement, and execution in line with market conditions.
Chapter 8, The ML4T Workflow – From Model to Strategy Backtesting, shows how to use backtrader and Zipline and navigate the multiple methodological challenges and completes the introduction to the end-to-end ML4T workflow.