The old adage “put junk in, get junk out” is the very reason why, in the year 2019 (the year the original Bladerunner was set let’s not forget!) effective data management is still worthy of a news article.
Yes, of course, data management requires data governance. And humans are, well, not as predictable as 1’s and 0’s.
Travis Nadelhoffer, CEO at NorthGravity has built his business around helping clients make better decisions with better, yes you guessed it, data.
Ahead of his participation at ETOT this year, we got the chance to ask him a few questions on his take on good IT.
In regard to companies going down an advanced analytics journey, what’s the biggest mistake you often see when working in the sector?
Input data is often an afterthought, and a scalable platform normally starts as CSV files on John’s laptop. This is ok for a basic POC, but the lack of data and the missing scalable platform to run the models does not empower a data driven organization. The other mistake is underestimating the cost and time, while overestimating the likelihood of success of a Frankenstein DIY project.
Generally, CEO’s and senior executives are blinded by the sexiness of machine learning and over invest in the Data Sciences side (often internal promotions) before they invest in the platform/pipeline for this new field. There’s all the heavy lifting of getting the data, data opps, model orchestration, model opps, and model management, which is less sexy but, as important if not more important. Imagine having a Ferrari without the ability to fill up the tank with petrol. Companies success in this area increases when they have a platform to manage, organise and store and process the data and models. If not, what often happens is you hire very expensive resources like data scientists, and they ultimately spend 80% plus of their job doing data collection or server admin tasks.
The other mistake is underestimating the cost and time, while overestimating the likelihood of success of a Frankenstein DIY project.
On the DIY side, cloud and SaaS products make everything look so easy. While in reality it still takes time and effort to make the solutions work together. You can’t walk down to your local Home Depot/B&Q/OBI and buy all the parts to build a new house and hope it builds itself. This is where the Do It Yourself part comes in, it takes time and people to build the end result.
What software is used for your company when handling data?
We are a partner with AWS(Amazon), but what we’ve done is stitched together leading services and built software on top of that to create an open end to end platform that can help power clients analytics journeys, with domain expertise, and resources to help with data collection/data science. Our platform is single or multi-tenant. Our platform is serverless and cloud native driving cost efficiency and the scale needed for large analytics projects.
I was over in London yesterday; with a major oil company and they have stitched solutions together and are having a fundamental decision do we continue to do this? The time and fees are slowing down their analytics program. Do they want to keep this DIY project or is there other domain areas where they can increase their advantage?
“…data practice not only allows clients to see the data but allows them to add value back.”
Is it possible for a modern large European energy trading firm to have flawless data practice?
I don’t think it’s possible to have flawless data. There’s always going to be problems and issues. So, I think what’s important is that the process is redundant, adaptable, and constantly improving. You want to avoid catastrophic issues. You don’t want to lose data. There will always be issues outside of the control of the company. As long as your process/ platform is adaptable it can improve. It is also important the data practice not only allows clients to see the data but allows them to add value back. I think that’s a key point.
Where data programs fail is that they’re too rigid and not adaptable. The flexibility is important, based on the growth of different types of data. The growth of data is not going to slow down. The other part that is extremely important is the controls and security. How do you know where this data came from? Who updated it? When was it changed? Who can see the data? Who is using the data? Can you detect errors, understand how they happened and fix them fast?
“The lower quality of input data to a model; will translate into higher biases in the results”.
At ETOT you’ll be presenting a user journey using NorthGravity, can you please give us a preview of what will be covered?
We will walk through the challenges a large global commodity trading company faced producing prescriptive results from Images for pre-trade decisions support. We will discuss how the client benefited from partnering with NorthGravity building a production pipeline in weeks using the NG platform. The Head of Research from the client will be joining me.
The Problem: Prescriptive Results from Images for pre-trade decisions
Indexing 9 Petabytes of Images from different sources with metadata
Selecting the images based on filters
Removing bias from the images
Transforming the image pixels to time series data points
Apply a model to produce prescriptive results
Create a repeatable pipeline that can scale in production
Store all data inputs and calculated data
Storing all models versions
Optimize the cost and time to production
A big thank you for that! For those of you who want to know more about data from all perspectives, take a gander at the ETOT 2019 agenda and, of course, join us in London!