How Data Lakes Unlock the Value of IoT

When technical capabilities and company culture combine, IoT-fed data lakes become a powerful brain at the heart of the business

Internet-enabled devices have led to an explosion in the growth of data. On its own, this data has some value, however, the only way to unlock its full potential is by combining it with other data that businesses already hold.

Together, pre-existing data and newly-minted IoT data can provide a full picture of specific insights around a single consumer. It is paramount, however, that companies don’t prioritise innovation at the expense of ethics. Sourcing and analytics must be done correctly – with the right context that respects consumer privacy and wishes around data usage.

The insights gained from successfully blending these two different data sources also unlock secondary benefits including new product development, possible upsells or the ability to build customer goodwill through advice-driven service delivery.

It’s a winning combination, but the challenge is how to actually merge device data with regular customer information.

No easy fit

This problem arises from the fact that IoT device data is a different “shape” to data in traditional customer records.

If you think of a customer record in a sales database as one long row of information, IoT collected information is more like an entire column of time series information, with a supporting web of additional detail. Trying to directly join the two is near impossible, and it is likely that some valuable semantic information could end up lost in the process.

But if IoT information fundamentally resists structure, and existing business databases are built on rigid structures, how do you find an environment that works for both? The answer is a data lake.

Pooling insight

A data lake is a more “fluid” approach to storing and connecting data. It is a central repository where data can be stored in the form it’s generated, whether that is in a relational database format or entirely unstructured. Analytics can then be applied over the top to connect different pieces of information and derive useful business insights.

However, there is more complexity involved in setting up a data lake than just combining all of an organisation’s data and hoping for the best. If you do that, you’ll likely end up with a data swamp – a disorganised, underperforming mess of data that lacks the necessary context to make it useful.

This can be avoided using the expertise of dedicated data engineers. These are the masterminds who build the framework for a data lake and manage the process of extracting data from its source, before transforming it into a usable format and then loading it into the data lake environment. Done properly, this will ensure data provenance, with appropriate metadata to guide users on allowable use cases and analysis.

“If you do that, you’ll likely end up with a data swamp – a disorganised, underperforming mess of data that lacks the necessary context to make it useful”

This sounds like a significant undertaking, and there’s no getting around the fact that doing data lakes right does take time and effort, but it is possible to take a staged approach. Many organisations start with a data “puddle” – a small collection of computers hosting a limited amount of data — and then slowly add to this, increasing the number of computers over time to form the full data lake.

A question of culture

In addition, technical considerations are just one side of the coin. The other side is one of culture. At the core of the problem is that businesses will not succeed with commercialising their IoT data if users are either unaware of, or distrusting of, the data lake and its potential.

While investment in big data continues to grow, a recent NewVantage Partners survey on Big Data and AI found that just 31 percent of organisations consider themselves data driven — the second year in a row that the number has fallen. Data lake technology has been around for several years now, and should be more than capable of enabling these types of organisations, but without the right culture in place, its benefits are seldom felt.

How do you create a culture that centres on being data-driven? As any management team knows, culture shifts are never easy, but a data-driven culture boils down to improving collaboration, communication and understanding between data professionals and business functions.

With a successful technical implementation of a data lake, you then need data professionals to advocate its benefits, and liaise with business departments to understand the types of insights that would be most useful to inform strategic decisions.

This then reinforces business confidence in the data function, and allows the data teams to expand their contributions to the business and be recognised for their hard work. When supported by senior buy-in, this positive feedback loop generates a growing culture of data savviness and data-driven approaches within the organisation.

Brain of the organisation

When technical capabilities and company culture combine, data lakes can become a powerful brain at the heart of the business. With the right analytics tools layered over the top, data lakes can reduce the time to finding insights and surface powerful information. These insights can serve business needs better and faster and are an outright win for any organisation. In short, they are well worth the time and investment.

Author: Dean Wood, Principal Data Scientist

MANGO IS NOW A PART OF ASCENT

No easy fit

Pooling insight

“If you do that, you’ll likely end up with a data swamp – a disorganised, underperforming mess of data that lacks the necessary context to make it useful”

A question of culture

Brain of the organisation