The idea that bookmakers function by simply balancing bets from recreational punters and taking a cut is a common misconception. The truth is far more interesting, with data science at the heart of modern sports betting innovation, epitomised by how Pinnacle is using R and advanced Machine Learning techniques to stay ahead of the pack.
For those that aren’t aware of R it is an increasing influential open source statistical programming language - think Excel on steroids for simplicity. It is being used by some of the most disruptive and progressive tech companies (read about the approach Air b’n’b are taking toward data science immersion) and Pinnacle count themselves among them.
Sports betting is a perfect problem for data science. Betting Resources touched on this in an introductory article on the subject but our ambitions for its use are growing all the time. Led by our Director of Trading, Marco Blume, our aim is for company-wide immersion, and to attract the smartest data scientists within the R community Marco took that message to one of the biggest gathering of R technicians - useR!2017 - in Brussels on July 5th.
This article is intended to provide both a window into how R is transforming Pinnacle’s risk management and to offer a summary of Marco’s presentation, which you can watch below:
Standard vs Pinnacle Trading Model
Anyone familiar with sports betting, and Pinnacle’s position within the industry, will know that we employ a unique business model. We offer extremely low margins, coupled with high limits and welcome the sharpest players in the world to take advantage. Accepting winning players is extremely rare among recreational bookmakers but we are happy to assume this cost in exchange for the insight we gain from the betting behaviour of those successful players.
On its own, that however isn’t enough to enable us to function with such slim margins, several times lower than average within the industry. This high-level representation of our Trading Model illustrates the complexity of information gathering that goes into building our lines and the increasingly important role of R.
The challenge is to combine machine learning, real-time market feedback along with data from a range of other sources to drive a highly automated trading engine. R - and a host of other packages - plays a key role in the data blending and wrangling that takes place in order to create data which can then be used for analysing and improving our sports models and trading algorithms, and seamlessly produce the end product for our customers - the low margin odds. With the margins being offered we have very little room for error so it is imperative that we continually collect and analyse market data.
The challenge of live betting
It is clear that data science plays a key role in maintaining our reputation for unrivalled risk management, and in simple terms, how we make money. The problems that solutions like R help us solve are magnified by the challenge of live betting, the most popular product among modern bettors.
The challenge of producing efficient live odds is that it must happen in a fluid data environment. In addition to the layers illustrated above, we have to factor in live data feeds from the events themselves, along with the huge number of live bets being received, and feed these into our Trading Algorithm, while paying attention to how other bookmakers are trading the event.
The wider application of R
Though data science plays a crucial role in our approach to trading, it has applications in other departments. It gives great flexibility for manipulating and visualizing data sets especially when used in conjunction with other tools. For example look at the fascinating change in the percentage probability of winning of Trump and Clinton in 2016 US Presidential Election and crucial events that annotate the changes in each candidate’s chances of becoming POTUS. The betting markets were aware of the likely results of the election hours before pundits on TV.
The data was generated through a few lines of R code and visualised using plot.ly - no Photoshop required.
Engaging with the R Community
Pinnacle understand and respect the collaborative nature of the R community, which is why we have three approved packages in the CRAN repository odds.converter is a simple yet effective way of converting between a wide range of odds formats - Decimal, American, Hong Kong, Malay and Probability.
We want R enthusiasts to play with the data, build their models and test against the data, reference the known in-game events and share their findings.
pinnacle.api will appeal to those wanting to go beyond just manipulating odds to actually creating strategies and placing bets. This package offers real-time access to our odds and the ability with a few lines of code to apply simple betting strategies to place real wagers.
Throwing down the gauntlet
Beyond our aim of immersing our organisation in R and data science we want to attract new talent and problem solvers. This article began by exposing the myth of how bookmakers work. It is also a misconception that bookmakers only employ traders who are all sports’ fanatics. What is more important is their ability to manipulate and interrogate data, and help solve the problems that our unique trading model presents.
In order to find people that fit that bill we have released a third package pinnacle.data featuring historical data from the 2016 MLB season and US Election in the hope that data scientists out there will use this as a sandbox and show us patterns that we aren’t aware of. We want R enthusiasts to play with the data, build their models and test against the data, reference the known in-game events and share their findings.
We are willing to release for free what is valuable proprietary data, which syndicates and competitors would pay good money, in order to throw down the gauntlet to the R community. We are excited to see what you can discover.