Data wranglers, the heroes of cyberworld

Data scientist has been described as the sexiest profession of the century. No surprise there, with machine learning and artificial intelligence having arrived in our midst with a bang. Machines drive our cars, recommend products for us to buy and TV programs to watch, and even recognize us in photos and speech.

However, the real mundane and drawn-out data work is usually talked about in a somewhat embarrassed tone. Because even as said work is vital to the functioning of artificial intelligence (AI) applications, it’s not sexy.

Not part of the heroic tale of digitalization are the spelling of database schema, modification of data into the appropriate form, tedious investigations of the meanings of fields, creation history, or distributions. Why don’t machine learning projects get off to a flying start? Surely AI should fix all our problems – there’s so much data out there, after all!

Fashionable talk about digitalization overestimates the ability of algorithms to do humdrum work. After all, a big part of the essential work in data and artificial intelligence projects completely lacks the science fiction-like sparkle of algorithm development. The amount of working time that goes into data wrangling is easily underestimated. Simultaneously, we overlook the knowledge and wisdom that is accumulated through humble cleaning work in the basements of the digital world and in interactions between people.

Narrow and general artificial intelligence

Present-day systems certainly do perform impressively in image and speech recognition, driving vehicles, and in playing the board game go. These are all notable achievements. And they can easily lure us into thinking that the artificial intelligence behind them has human capabilities. Yet it is still very difficult to get artificial intelligence to autonomously carry out tasks involving conceptual as well as poorly defined problems the likes of these: how does information come from the real world into this enterprise resource planning system? What does this record in the database mean? What needs to be predicted? How should procedures be carried out in order for this particular business to make more money?

Even the best artificial intelligences currently out there are narrow. They do not understand the humane world yet —they don’t even understand the data systems built by people. The level of abstraction of both company data and goals is very high. And that’s not all – these abstract goals are tied to specific situations, are open to multiple interpretations and subject to human complications. All data, even Big Data, gives only a very pale depiction of a company’s goals. Even the best deep neural network can’t conjure up results just from data. For data wrangling and integration, it is necessary to really understand what’s happening in the world, to be in the know about human things.

In order to get the tasks done autonomously, it would be essential to have a powerful and enlightened artificial intelligence; such does not yet exist.

Data work is learning about organizations

Artificial intelligence is still narrow and in this sense, weak. This is why we humans will still continue to be needed for a long time. We are essential in, first of all, determining the objectives of organizations. Objectives are, obviously, founded on human intentions and goals. Secondly, data modelling and data wrangling are needed for the creation of artificial intelligence – for the foreseeable future, at least. Thus, the not-so-sexy parts of AI production still rely on humans. This kind of data work is essential if an organization is to get any value from data. A general artificial intelligence, at this exact moment, does not exist.

It’s true that when a narrow artificial intelligence (or even a statistical model) is built for the right kind of work, it will wipe the floor with humans in tasks of consistence and capacity. But still, machine learning or deep learning solve problems only with heavy support from us.

Hands-on data work is not something to be ashamed of. Data work is about learning. One gets a concrete grasp of how information accumulates and how it is modified within an organization. Who really knows about organizational ‘stuff’, and what, in general, should one learn about objectives and data? How is data collected, and how should it be collected? How is it transformed in the process? How can the implementation of objectives be tested and measured in a digital organization – an organization that is beginning to find applications for analytics, machine learning, and artificial intelligence?

From the perspective of data-driven activities, these are the most fundamental questions about processes and organizations.

We already form cyber-organisms

Decisions are already being made through an amalgam of humans and data systems. Narrow artificial intelligence is here, but it doesn’t yet qualify as intelligence in the human sense – it is a bundle of important reflexes that analyze speech and photos as well as make our activities more effective and efficient. However, artificial intelligence must be planted in an organization with care. The tender buds of artificial intelligence are, for the time being, thoroughly unintelligent. They cannot formulate high-level goals and are as fickle as a Hollywood diva when faced with raw data.

Data is not an entity that stands apart from organizations. Then again, managers – nor humans at all – are no longer the only decision-makers. Artificial intelligence and analytics are the acid test for how well an organization works, since both people and machines must understand information to make better decisions. Everything has to fit together. Information flows through the combination of data system and human organization.

And indeed, an important question for the digital age is how to combine data, artificial intelligence, and human action. Because we already form cyberorganisms.

This spring, we’re sharing multiple opinions on and aspects of the AI revolution here on our blog – stay tuned for more. Read this post in Finnish here.