AI Powered Investment Research
Posted with permission from Prattle. original
Prattle is pleased to publish this interview with Erez Katz, CEO and co-founder of Lucena Research, focusing on how machine learning technologies can be used to automate the development and testing of novel investment strategies.
How do you approach investment strategy development? Where do you start?
There are two general methods that seed an investment research strategy.
The first is a human-driven hypothesis that can be automated and validated with machine intelligence. This is most commonly accomplished through inductive reasoning or supervised learning where a model is trained by generalizing inputs (observations) that drive a consequential outcome.
The second is generating investment ideas based on how the data is classified or clustered. Data can be grouped based on various correlation scores from which a human can infer investment ideas. These ideas are then further refined, automated, and validated by machine intelligence.
How do you evaluate strategies?
That’s an excellent question. Because it is easy for untrained evaluators to get an undue sense of confidence in an investment strategy, false positives are a common pitfall. We’ve built our evaluation process to avoid these hazards.
For us, it’s important to see consistent performance between in-sample and out-of-sample periods. We normally break our strategy research into three distinct periods:
The first is the training period. During this period the machine constructs and trains its models. The training period normally utilizes machine intelligence to build predictive models based on the data it gathers during that timeframe. An important rule of thumb is that a backtest should never overlap the training period as the models were already biased or informed by the data and are bound to show good results that will most likely not carry over to a new timeframe.
The validation period is second. The models obtained in the training period are tested and refined over many thousands of permutations. This is also known as a meta-parameters grid search. This research allows us to identify the best method to deploy the models identified in the first step. It is here that questions like “how often should the portfolio rebalance?” or “what are the proper allocation guidelines?” are answered.
Finally there is the hold-out period. This is the unseen timeframe that had zero impact on the model creation and the model refinement stages (steps one and two). It is critical to be disciplined and not to peek into the future during the first two periods.
During this step, the strategy is backtested over the hold-out period to evaluate its sustainability and strength…and to ensure overfitting (or curve fitting) isn’t at work. A backtest during the hold-out period should be limited to a single attempt to evaluate the best execution rules identified in the validation period. Multiple attempts to validate the research during the hold-out period introduces the possibility of “selection bias” on the part of the researcher. If enough backtests are conducted it is only a matter of time before a successful run is generated, but it is likely, however, that this result holds little statistical significance.
Before we leave this point, I’d like to offer a view additional tips and process notes. We highly recommend validating the algorithms during multiple hold-out periods representing different market conditions. We also suggest testing for robustness and performance consistency by changing elements of the strategy that are not critical to the underlying models. For example, we change the day of the week in which a strategy is rebalanced and determine if the results based on a Tuesday are within acceptable margins of error against Thursday. Finally, before we deploy capital, we simulate rolling forward the strategy by applying the same execution rules of a backtest on a roll forward paper trading account.
What are some common weaknesses you see in automating investment strategies?
There are several.
The first is that automated investment strategies can often appear to be “black box” operations that could put asset managers in a bad position should returns go south. Professional managers are fearful of strategies that don’t allow them to clearly articulate why an investment decision is made. In their eyes, it is not a matter of “if” an investment strategy underperforms but “when,” and they want to be able to defend their decisions when questioned by their customers.
The second is an inability to dynamically adjust to regime changes. Many strategies are based on static models that work well in certain market conditions but lack the “self-adjustment wisdom”—the ability to adapt to market or idiosyncratic conditions. We normally favor strategies that get smarter over time as they gather more data and knowledge. These types of strategies are harder to design, develop, and come by.
The final is that automated strategies often have short life cycles. Novel strategies can work well—until the market catches up and exploits their “secret sauce” to a point of obsolescence. Managers should never fall in love with a concept and should drop a strategy that falls outside its expected “behavior” thresholds.
What strategies/approaches do you feel are unappreciated or underappreciated?
The financial market has not been kind to new investment concepts, and building credibility takes time. I have seen how new fintech startups and innovative investment strategies are often evaluated on the pedigree of their creators before being considered on their true merit. This, unfortunately, makes it much harder for new innovative entrants to penetrate the market and get the appreciation they deserve.
What is automated investment research? What options are out there?
Automated investment research is a concept of applying machine research in lieu of human research. Many computationally intensive research tasks can be automated before they are ultimately refined by a human quantitative analyst (a Quant). For example, a genetic algorithm, also known as GA, can be used in an automated way to identify predictive multi-factor models. Assuming you have 100 factors to consider, how can you determine which combination is most predictive for a certain investment objective? Rather than examining every possible combination, we can dramatically cut down the evaluation time by applying a GA process in which we start with a random selection and iteratively mutate the most predictive factors.
Another example of a useful automated process is a “grid search,” which we discussed earlier. Grid searches can be used to refine the meta parameters of an investment strategy. Meta parameters are the building blocks of a strategy and include allocation guidelines, rebalance frequency, exit conditions, etc.
Should all investment professionals begin integrating automation into their workflow? Why/why not?
If you break down the spectrum of investment time horizons, you have HFT (millisecond decisions of high-frequency trading) on one end and the buy and hold Warren Buffett investment style on the other. The shorter the investment time horizon, the more dependent the strategy is on automation. Conversely, the longer the time horizon, the greater the emphasis on the human element. Our firm mainly focuses on the one-day to one-year time horizons in which there is a clear influence of a hybrid (human and machine) research and investment approach.
As AI technologies continue to work their way into the finance industry, how will the human component of the investing process evolve?
In the same way investment research combines machine and human input, so do investment decisions. Our advice has always been to integrate machine intelligence as an overlay to human research.
How are discretionary traders adapting to this changing landscape?
It’s rather remarkable to witness the insatiable appetite for predictive data and predictive algorithms by the slow-to-adopt discretionary managers. The data revolution is here and evolving at lightspeed, and even the most traditional deep value bottom-up researchers are starting to realize that an algorithmic approach to investment can be additive. Because they still hold the veto power to override the machine investment approach, discretionary traders are viewing machine intelligence as nothing more than another arrow in their research quiver.
What are the biggest challenges quantitative funds are facing now?
In my opinion, the biggest challenge is that there is so much misinformation in the market. Machine learning has become a buzzword, and many are not quite sure what it means. Bad actors are taking advantage of this situation to make a quick and unsustainable profit.
The quant funds that have been successful have come to realize that without ongoing research they will not be able to maintain their competitive edge. Good ideas somehow eventually make their way to the masses and consequently lose their potency.
Is there a particular type of data that you feel is underutilized? How about overrated?
The alternative data provider market is very fragmented. At the top there are very few multi-billion dollar market cap players like Thompson Reuters, Bloomberg, and FactSet. Further down below there are thousands of smaller players who generate less than $10 million in revenue per year. Many of them are faced with a serious marketing conundrum. If these providers price their data too cheaply, many clients will jump on board. However, the greater the number of clients utilizing their data, the less exclusive the data is, undercutting the data’s alpha-generating potential. On the other hand, if the providers price their data too high, there is less of a chance that customers will buy from them. Finding the happy medium requires exhaustive market research that many are unable to afford. Those who are able to afford expensive, exclusive access to data will naturally hold such data to a higher level of scrutiny and expectation. Overrated data is short lived since ultimately it needs to justify its value, and portfolio managers are notoriously impatient.
How do you evaluate data?
We have a well defined and regimented data evaluation process.
1. We initially test the data for completion, consistency, survivorship bias, and signal correlation to information (also called information coefficient).
2. We then evaluate the data in the context of decile signals breakdown, universe coverage, signal distributions, outliers, and anomalies.
3. We conduct an initial signal strength analysis.
4. Upon identifying some form of information, we move to more advanced research.
5. We conduct feature engineering by which we create advanced machine learning signals based on the data’s raw values.
6. The process ends with a cross validation performance report in which we compare in-sample to out-of-sample performance.
It is important to distinguish between evaluating data and designing an investment strategy based on one or more data sources. Data evaluation mainly focuses on distilling information from noise and reporting on signal strength and its distribution within an asset universe. In contrast, when applying data towards an investment strategy, we consider building models from the data designed for a very specific investment style and objective.
How has the investment data market changed over the past decade? What are the most promising sources for data in this space?
As with many emerging technologies, the speed with which we introduce sophistication and complexity is commensurate with the speed with which the market catches up to new technologies.
The most recent trend in alternative data is the conversion of unstructured data into accessible and actionable structured data. We see great advancement in NLP (Natural Language Processing), a machine learning application designed to derive sentiment towards a given subject matter. In addition, larger data players are emerging from traditional businesses who initially accumulated data for internal use but are now expanding their business model as data providers.
For example, credit card providers and major retail chains that have tracked consumer consumption at the individual level for years are now anonymizing and aggregating their data to help others derive behavioral trends by demographics, locations, industries, sectors, and more.
Any final thoughts?
We are living in interesting times. AI and machine learning are at their embryonic stage and are quickly evolving to dominate our way of life. We have not yet begun to realize the profound impact this emerging technology will have on our world.
About Erez Katz
Erez Katz is an accomplished serial entrepreneur and a C-level executive with a proven track record of effective leadership, instilling operation efficiencies, and driving profitable growth.
Erez is the active CEO and co-founder of Lucena research and was instrumental in architecting and designing Lucena’s flagship product QuantDesk®. Today, Erez shares his expertise in applying Lucena’s technology to help investment professionals generate alpha, minimize risk via portfolio optimization, and construct new portfolios using predictive analytics and machine learning technology. Prior to co-founding Lucena with Dr. Tucker Balch, Erez was the founder of Objectware Inc, a web technology company that was sold in 2007 to Bridgeline Digital. Erez was instrumental in growing Bridgeline and in its transition to the public domain on Nasdaq. Erez also assumed Bridgeline’s Chief Operating Officer through 2011.
About Lucena Research
Lucena Research specializes in predictive analytics and machine learning for the financial markets. The company was founded by renowned machine learning expert Tucker Balch, Ph.D, and serial technology entrepreneur Erez Katz. Lucena’s mission is to bridge the gap between data providers and investment professionals who seek to enhance their investment decisions with machine learning technology.