The "Establishing Best Practices" panel discussion held on 4 Aug 2016 featured:

  • Michael Beal - Co-Founder of JPMorgan Intelligent Solutions, CEO of Data Capital Management
  • Michael Recce - Former head of data science at a leading Connecticut based hedge fund
  • Matei Zatreanu - Founder and former head of data science at a leading New York based hedge fund

The panel spoke about establishing a data science effort, sourcing data sets, and building out proprietary analytics.


Establishing a team

The most established PMs and funds are generally happy to keep doing things the way they always have. When looking for internal sponsors, try and find people who are established enough to be able to finance a data science effort but are still actively looking for a competitive advantage. If you can’t find such a sponsor, you may be best off either switching firms or starting something up on your own.

Even if you can gain buy-in for an alternative data effort within a large organisation, building out a successful data science team can be a long-term and capital-intensive endeavour – being able to sustain it through to fruition requires ongoing communication with the senior-most decision makers in your business to ensure their expectations are realistic.

Another challenge is finding people who can bridge data science and traditional fundamental research. Alternative data teams have to possess both skillsets; if you can’t build a culture which allows Berkeley types to collaborate effectively with Harvard MBAs, you will fail.

Finding data

Generally speaking, there are two models:

  • Analyst driven
    Determine which KPIs a fundamental analyst would want to be able to track for a particular name, figure out what data exhaust exists for each of those KPIs, and then go out and talk to companies which might have that data.
  • Availability driven
    All data available on companies within the fund’s investable universe is assessed even if PMs have no immediate use for it. If you don’t have enough contacts at the outset to use this model, there are vendors selling lists of companies with notable data assets.

When assessing possible new data sources, avoid heavily processed data. Michael Recce: "I don’t like my data raw – I like it rare"; you want your data to retain as much granularity as possible.

Analysing data

Building out ETL (Extract / Transform / Load) pipelines is an underappreciated aspect of alternative data - it often ends up being a major bottleneck.

Once data has been ingested and is prepared for analysis, your data analyst has to extract key features, use them to build a model, and then determine how best to present his final output. All three of these steps require intuition around the fundamental analysis within which the data will be used.

Be aware that alternative data often provides far richer insights than 10Ks / 10Qs, so try to think about the firm you are looking at the same way its internal operations department does. Instead of just predicting top-line, think about customer cohorts, logistical issues, etc.

Further info

Book recommendations from Michael Recce:

  • "Superforecasting: The Art and Science of Prediction" by Philip Tetlock
  • "Thinking, Fast and Slow" by Daniel Kahneman

If you would like to receive additional info on anything the panelists discussed - feel free to contact us.