More organizations are investing in big data and unstructured data environments, but many of these projects remain in the area of “innovation,” meaning that the intended ROI is not always clear. Even when organizations don’t have a specific goal or use case in mind, there’s a general understanding that the ability to process and analyze unstructured data will provide value over time. Like any large technology undertaking, it’s important to provide some key benefits sooner rather than later in order to convince the business to choose to fund the work.
Any big data effort intends to consume data, both structured and unstructured, into a new type of data environment. (In fact, some insurers have utilized big data technology to ingest mostly the structured data from their disparate core systems.) This idea of “ingesting data” isn’t simple (despite the message put forth by many vendors) and it can be disappointing when an insurer realizes some of the complicated data mapping efforts they know from the relational databases are still required.
For an innovation-focused big data initiative to mature into a permanent, value-providing part of the infrastructure, the data must move through three phases.
Phase 1: Data storage
The first level of data ingestion is essentially a flat file storage of the data. This means that all data – whether unstructured files or dumps of relational databases – is migrated into the unstructured data environment with little or no adjustment. Assuming the systems are in place, this can typically be done quite quickly, especially for a limited set of data. However, there’s not a lot of value provided during this phase aside from distributed storage.
Some organizations with terabytes of data and high data archival costs have found dramatic cost reductions through the use of Hadoop for distributed data storage. As organizations simultaneously experiment with telematics, IoT, drones, or other sources of high-volume data, however, a distributed data solution provides a way for the company to gather and store all of the data before fully understanding how to utilize it.
Phase 2: Data warehouse replacement
The second level of data maturity in a big data technology system is when the organization has an overlaid structure that allows querying the data across multiple data sets. This essentially recreates a traditional enterprise data warehouse, at least from a perceived value by the end users.
One of the values of working with an unstructured data environment is that full normalization across all loaded data isn’t necessary. (Many an EDW project has gone over budget or failed due to the complexities of defining a normalized data model that will support all of an insurer’s data across systems and lines of business.) However, this has led to a mistaken belief that no normalization is necessary. At some point, if an organization is to join across two different data sets that both have a notion of customer (for example) there will need to be some form of data model to allow this.
Unlike a relational database, however, that model can be defined after the data is already in place, and the definition can be limited only to the key factors needed for the reporting. Essentially, the normalization is delayed, reduced, and does not require any foreknowledge of how the data will be utilized.
Phase 3: New insights and new capabilities
The third level of data maturity is when there are new taxonomies and visualizations on top of the unstructured data environment, allowing insight that isn’t available with a traditional data warehouse. This can mean a model that layers a complex, unstructured data source (such as telematics reports or drone images) on top of the insurer’s claims experience, providing the opportunity to query against both data sets to discover new risks. It also means the ability to use new visualization tools across the full sets of data to identify anomalies and factors that weren’t available with traditional data where so much of the analysis was predetermined by the initial normalized model.
Proving value early
One of the values of an unstructured data environment is that it’s not necessary to move all of an organization’s data into the system before moving to later phases. With an enterprise data warehouse, failing to consider some data during the initial creation of a normalized model can create serious problems later on. With unstructured data that is not the case. This means an organization that is innovating with Hadoop or a distributed database can limit initial experiments to a few sources, building up an expertise and layering more and more data into the environment later.
In fact, in order to secure additional funding from the organization to continue a big data project, it’s important that the team get at least some Phase 3 visualization on top of the data as quickly in the timeline as possible. This will allow business users to not just validate and QA the data, but to see what kind of new values will be made available to them as the work continues.