Reaktor Data Enablement Framework
We work with leading clients from retail to finance and consumer goods, helping them make sense of their data enabling AI and Data Science initiatives.
Using data to drive and business actions
“Data is a new oil” – a common buzzphrase heard over and over.
“Data is the soil” – closer to reality, as before harvesting the results there is need to actually plant something
But in reality, when a company first thoroughly analyzes the data it possesses, it turns out that the data is mostly ROT – Redundant, Obscure & Trivial.
So, what to do to get most of your data and bring incremental value for your data-driven projects?
First, we should agree, that your data is your asset, as well as your human resource, equipment and machinery etc.
And, as any other asset, data needs proper management.
Before the management comes the understanding of what data is in your organization, and how it is really used.
What we see nowadays, is that data exists in a vacuum, in-between the “business people” and the “IT people”, who rarely communicate with each other even on some traditional topics like IT architecture or Enterprise Bus (except for the help desk, of course:)), let alone more vague topics, like data enabled solutions.
Here comes the role of a Chief data Officer (CDO), as a mediator, a person, who will fill the existing vacuum with data-focused added value activities.
- While CDO role is a subject for a separate talk, we should name a few important things here, as typically (but not always) it is the CDO, who actually enables the data in the organization.
- What skills should this role possess? What background is needed to be a successful CDO?
- While it is a topic for discussion, our vision is that data enablement is a business headache.
- While IT usually has the tools, runs the servers, make SQL queries etc, it’s obvious that data is, most of the time, generated by business and, almost every time, consumed by business.
- While at the beginning of data enablement it is more of a function, then a real role, during the transformation, when data landscape becomes more complex, number of data applications increase, it becomes obvious that a proper C-level role needs to be in place.
CDO can take different places in the enterprise hierarchy:
- CDO under CIO: good, when data-centric transformation is not the target, but lacks of flexibility as well as quick-wins
- CDO same level as CIO: separate function, more flexibility, results come faster, but need for close collaboration with CIO
- CDO under COO: good, when data is one of the main resources to be leveraged in day-to-day operations, but need for additional training for COO
- CDO under CFO: also a possible case in banks in insurance companies
All in all, hard choice, good understanding of business is needed, as well as alignment with business strategy
Data value pyramid
The role of the data, as of any other asset, is to bring value.
Calculating of the value of your data is a vague, but yet important step in managing this asset.
There are a few ways to calculate it, including top-down strategic and bottom-up use case based approaches.
The data value pyramid (largely based on famous Maslow pyramid) created by Russell Jurney, a LinkedIn worker at that time, depicts 5 main layers of successful data enablement:
Records – that’s where your raw data lives – just records kin the database.
Charts – your basic reports, built upon the records of the previous layer
Reports or BI – more sophisticated dashboards, with different levels of possible deep-dive, created using BI tools like Qlik, Tableau etc
Predictions – more complex solutions, including machine learning capabilities, as well as more traditional data mining approaches
Actions – that’s where intelligent (“AI-driven”) decision support comes into place.
The most important thing here is to understand, that each layer is dependant from the the ones below. Many companies, following the AI hype tend to start at the peak, just to discover that they need to climb down one or more layers to make their cool Data Science solutions be scalable and really work.
It is such a pity, when your sophisticated model predicts the wrong things not because of the data scientist failure, but because that there are wrong fields on the input phase.
“Shit in – shit out” is a rule of a thumb here.
Data management framework
So, the first step is understanding – where we are and what should we do next exactly.
One of the most handy solutions here is, well, to conduct a sophisticated assessment.
Multiple tools and methodologies exist, like The DAMA Guide to the Data Management Body of Knowledge (DMBOK) or Data Management Maturity model, developed by Carnegie-Mellon and Booz Allen Consulting, with participation of Industry and consulting leaders, such as Lockheed Martin, Microsoft, EY, KPMG, Target and others
Regardlessly of the methodology, a good assessment should be able to cover the whole data landscape within the organization, in order to understand all the data-related pain points and be able to prioritize the painkillers in accordance with business value they will potentially bring.
This DMM framework, for example, covers 6 most important parts of a successful Data Enablement at an enterprise of any level: Data Management Strategy, Data Governance, Data Quality, Data Operations, Platform & Architecture, Supporting Processes. We like to add the 7 pillar – Data Applications, which mainly include BI and data Science.
For each pillar the actual level of maturity needs to be assessed through answering the following questions:
- Is the process actually performed?
- How it is managed?
- Is it properly defined?
- How it is measured?
- Is it optimized?
It is also important to understand, that, after the assessment has been conducted, there is no need to even try to tackle all the discrepancies at once. The goal is to get to your target state “gradually but surely”, which also involves such activities as Proof-of-Concepts (POCs), impact value (re-)evaluation and constant (re-)alignment with overall organization strategy.
Let’s take a look at the most important elements of a good data strategy one by one.
Data management strategy
As we have previously agreed, data is an asset. And as for each asset management program, there should be the vision and goals. Data management strategy elaboration and maintenance should be aligned with overall enterprise strategy and objectives.
To be on the same page, there is normally a need to have 1-2 workshops for C-suites ot the organization, describing how the company plans to manage data and why.
Data management function should be performed, with its goals and KPIs, which are in line, again, with overall company strategy.
Business cases should be calculated, meaning that at least a brief assessment of program funding should be done, as well.
Data management strategy should first answer the essential questions: do you want to become a data-driven or data-centric organization, and if yes, to what degree?
Sometimes the value an organization can gain from becoming a data-driven isn’t worth the effort. While many companies generate a lot of data, thus acquiring it basically for free, other should spend huge amount of time (and money) to get their hands on the data sources which may or may not bring the value.
Data is an asset, OK. Who owns this asset? Who controls its consistency? Who aligns terms from business glossary to database fields names?
Establishing a proper Data Governance process answers these questions.
It is extremely important do set a data source owner and a data source steward for, respectively, each data source you have. And this should be a business person, who, in his or her turn, can use the help from IT in a day-to-day management of the data.
Next step may include a data governance council, at least virtual, who develops, upgrades and monitors the data-related policies and compliances.
A data source without an owner, i.e. without a person, who understands what’s in there, how it is generated and used, quickly turns into a swamp useless rows and columns.
Or, without the proper knowledge of what you already have, duplicates are created.
One of the leading retailers in home appliance industry, fro example, invested a lot of time and money for building Big Data analytics capabilities, but when it came to the actual projects, to their surprise, the actual amount of useful data shrinked to the size that can be handled (processed and modelled, in this case) using a modern laptop and one server node.
Data quality is a sometimes disregarded backbone of a successful data application. Data should be profiled, cleansed and its quality should be assessed on a routine basis.
Many data science related projects were delayed or even cancelled, when during the development it became obvious that due to data inconsistency there is no way to get the enough quality level of a model and a huge data cleansing process, involving business glossary recalibration and Confluence platform creation.
For example, inconsistent date & time formats used by different departments of a huge airline delayed the potentially profitable Data Science-powered Dynamic pricing projects for almost a year, while the format was being agreed by all the counterparts.
This include the day-to-day operations, related to your data.
Data Lifecycle management is probably the key activity here.
While being the “veins” of data management in general, DLM should align the business needs to IT capabilities in a very exact manner, involving stakeholders from every major function.
Another important part is provider management. In this process, data and solutions providers are taken into consideration.
You can imagine the pain having multiple external data sources to be coupled with your internal data to create combined data sets.
The same can be said about managing different project partners, some of which help you with cloud premises, while others are building data science models on these clouds. And the results provided by those providers still need to be somehow incorporated into your business logic…
The key of success here is to first, elaborate your own policies and frameworks, and, second, manage your providers according to those policies and frameworks. Otherwise chaotic operations will result in chaotic results, if any.
Platform & Architecture
Architecture is fundamental for successful data management.
It consists of infrastructure and logical layers.
Choosing the right infrastructure, going to the cloud, understanding what database to use – those are strategic question which will affect your operations at every level.
This is especially true for larger enterprises, with rather complex IT and data landscapes, where legacy systems neighbour Hadoop-built Datalakes, and security regulations are trying to control every byte of the data you possess.
A good example of a bad architectural decision is a mid-size bank in Eastern Europe, who, in its early years, made a choice in favor of public cloud to store and process some generique data. However, when a law came out, that all, and not only sensible customer data of financial organization should be stored inside the country and on own servers, that bank beared enormous costs in order to quickly migrate from public cloud to private, loosing clients on the way.
This is the fun part of them all. Your complex and powerful BI tool will show you nonsense, if it’s fed with inconsistent or irrelative data. Your data science project will end without any result, when you realize that some critical data is just missing, your project will be delayed when you understand that the final result is not needed by anyone in your company.
Data applications should be driven by business needs, and not vice versa.
Ideally, the stakeholders to whom you propose your data science project should agree to not only assess the result, but actually sponsor it one way or another, thus confirming that he or she will use the final product.
Moreover, there should be a clear vision of the final goal to be reached: decrease operational costs, increase CTR for marketing, increase sales amounts etc, otherwise you will find yourself in an eternal loop of demos and POCs, which lead nowhere.
And last, but no the least, data should be actually accessible and be of the quality level enough to really use it for complex modelling.
A lot of organizations lose money, in terms of man-hours, on generating reports which no one uses, or which bring no real insights for the decision makers.
Others tend to go swords blazing on data science, paying for Docker containers or even IPython notebooks, which, while being all in all a data science work, doesn’t bring any value.
Operationalization of data science, or just the answer to the questions like “What do I want from data science?”, “How should I assess the results of the project?” and “How will I actually use them?” is the key here.
ROC AUC mode accuracy metric of 86%, 90% or even 99% (rarely the case, though) is not the result you are looking for. What you really need is more like “Increasing cross-sell by increasing CTR using the recommender engine based on Data Science techniques”. This what should the initial goal of the Data Science project, at least this is how it should be formulated.
Every initiative, including data-driven, should take place supported by Process Quality Assurance, Risk management and overall Process management practices. In business environment, apart from pure R&D, there is no real place for hoping that everything will settle down on its own.
While Agile and CRISP-DM methodologies dominate those kinds of projects, and sandbox approach is also very vital, there is a thing to remember: if you go Agile, you go Agile everywhere. If you use CRISP-DM for one data science project, you should not go Waterfall for another, because it negatively affects the scalability, creates chaos for management and makes your life harder overall.
Data enablement is a program, not a project. You can not “enable” your data for marketing, and leave call center alone, for example. Data is created and used by almost everyone in any organization and is a valuable asset which needs to be managed properly.
As stated by The Data Doctrine authors, Peter Aiken (Associate Professor of Information Systems at Virginia Commonwealth University (VCU), past President of the International Data Management Association (DAMA-I) and Associate Director of the MIT International Society of Chief Data Officers) and Todd Harbour (Chief Technical Officer (CTO) Broad Creek, LLC and prior to this Chief Data Officer (CDO) for New York State):
- Data Programmes Precede Software Projects
- Stable data Structures Precede Stable Code
- Shared Data Precede Completed Software
- Reusable Data Precede Reusable Code
Those are the main principles to keep in mind when starting your data-driven transformation.
To embark on the data enablement journey your company should be “culturally fit” for this. There is no chance of reaching the other shore, if your sailors don’t know how to use the sails or don’t trust the pilot. To be on the same page, at the very beginning of the transformation, you need to align all the stakeholders, C-suites and specialists alike, on what is data, why it is important and what benefits can it bring through a series of workshops and trainings.
Then, you should understand where are you now, and where you want to be.
Thirdly, when you have established your vision, defined the measurable goals and KPIs, calculated the preliminary value of data applications, it is time to set the sails, plot the course, and, thus fully prepared, start you travel to the new shores.
Oops! We could not locate your form.