Sophisticated modelling can seem daunting for many organisations wanting to become data centric. Do you have huge databases full of data with no duplication, no data gaps and all in the same format? Is the data tagged, have clear ownership, access rules and update rules? It is linked to other relevant sources and mapped to the core business domains and processes? If you have this, great. If you don’t, you are in the majority.
Leading edge AI and machine learning models need quality data that are linked to business outcomes and have labels. This allows algorithms to “train” faster, then “learn” and “evolve”. So quality data drives quality insights, but requires tailoring to needs. In other words, data needs quality controlling and labels attached that link to outcomes and business processes.
Labeling and linking creates data assets that drive value. This means that your differentiation is your data. High volumes of good quality data are the foundation of analytics, insights and artificial intelligence. Your data can always be linked to third party data, but in essence, the more quality proprietary data that can be harvested, the larger the potential data advantage.
The analogy that “data is the oil of the 21st century” applies to data quality. Low grade data (and oil) are expensive to mine and process, offering limited value relative to high grade offerings.
So large volumes of data (a data lake) is great, but your data story needs to map with user journeys and business processes. This is where business transformation meets data science. Business transformation involves mapping out where you want to be (the future state) relative to where you are now (the current state) to determine what to do and where to begin. This could be either an opportunity to save on processes or some new product ideas. The scoping and prioritising activity is a key phase that will inform what data is needed, when and what for.
So the recipe for success is to know what, why and when data is needed. Then ensure quality data is available with the correct labeling (“meta-data”) and outcomes mapping. In summary, ensure data quality and robust operations are in place before letting the algorithms loose. After all, your differentiation is your data. Garbage in = garbage out.
Steps to differentiating yourself with data:
Darren Wilkinson
darren@algospark.com