Forbes

Big Data In Banking: How Citibank Delivers Real Business Benefits With Its Data-First Approach

By Bernard Marr

December 5, 2016

Citigroup is one of the world’s largest providers of financial services, operating in more than 160 countries and holding over 200 million customer accounts. The company, in recent years, has adopted a fully big data-driven approach to drive business growth and enhance the services it provides to customers.

Insurance providers may be making bold strides into the Internet of Things era, thanks to the abundance of unstructured behavioral data available from wearables, scanners and sensors. However, other financial services such as retail and investment banking and brokerage have lagged behind, but are beginning to rapidly disrupt and innovate their analytic landscape, harness volumes of data assets at a fast, yet measured pace.

Part of the reason is due to balancing the sensitive nature of the data with the need to deliver value to clients while rigorously prioritizing the privacy and protection of information. That said, it is also due to big data’s biggest opportunity that companies like Citi are able to see the big picture. At Citi, model testing allows for a holistic understanding of innovative use cases by deconstructing data at its most granular level as well as synthesizing structured and unstructured data sources. It comes down simply to “what can data do for us?” Among business leaders at many of these venerable institutions.

This is set to change as the benefits of a data-led, analytical approach to business become more apparent. I spoke to Michael Simone, the Managing Director of Data Platform Engineering at Citigroup, about the challenges – and opportunities – around implementing a data-first culture across an organization employing almost a quarter of a million people.

Simone heads the organization responsible for engineering the blueprint for using big data analytical tools across the company, helping drive data innovation strategy across Citi’s businesses. The platform is primarily built on Hadoop and datasets are sourced from between different applications that ingest multi-structured data streams from transactional stores, customer feedback, and business process data sources.

In addition to architecting and engineering the data technology platform, his data science team often acts to “jump-start” big data-driven analytical activity within whatever parts of the business where it can be shown to offer benefits. Identifying where big data resources can most effectively be used involves lining up business use cases with technological capabilities, and is one of the biggest possibilities.

Simone tells me, “Since the inception of our Data Innovation program, we have executed hundreds of proof-of-concepts and use cases, all validated against meeting specific business requirements. We are focused on having actionable results that are balanced with very specific metric-based outcomes.”

Once a potential use for analytics is identified, it is assessed in terms of benefits and opportunity cost. “There are a variety of factors that are taken into consideration, which is why not all of them make it through the gate. Sometimes after going through all of the paper exercises of understanding it, we may realize that there are other ways to accomplish this and moving to big data just because it is big data may not be the right fit.”

One area of Citi’s operations where big data analytics have been implemented successfully is in customer retention and acquisition. This involves analyzing data and targeting promotional spending using machine learning algorithms. Citi is helping themselves as well as its customers by providing functionality that is keeping it as a firm and its customers protected. So from compliance and cybersecurity to customer service and fraud to marketing and web analytics, many uses need mixed support of big data in order to operationalize a wide range of new and critical functionality.

Another is to scan transactional records to spot anomalies, which could identify or predict defects – and in the case of customers, incorrect or unusual charges. The costs resulting from these anomalies is far easier to correct if spotted quickly – or even before it happens – through predictive modeling. Simone tells me that across the organization the cost of reworking these “errant data points” has been brought down in some cases by double digits thanks to new methods of big data analysis.

Platforming costs, too, have been driven down due to the move towards big data horizontal architecture. Citi also relies on commercial distributions of open source technology to underpin its strategy.

“We have been managing and analyzing data effectively for years to see how we can improve our own operations and provide better service.” Says Simone, “But big data also offers a price point where we can store as well as analyze the data at what I call Citi-scale. We are a global company with an incredible amount of assets that are valuable to our business, and we can now store them at an expense point that makes them analytically beneficial to us at their most granular level.”

While the commercial open source community offers a great deal, Citi chooses technology and partners to work with based on how able they are to adapt to meet its business needs.

Simone tells me “Our core platform is predicated on the value of open source with integration of very capable pure play big data solutions. As a result, we have a set of open source vendors with whom we work closely.”

These vendors are picked for their ability to demonstrate “deep integration” rather than “bolt-on” capability with the technology they provide, but also “they need to be open to influence and ready to work with us to craft their products to meet our ongoing and rigorous requirements.”

As the foundation for its big data strategy, Citi has invested in its own integrated big data platform architecture, which Simone refers to as its Virtual Enterprise Data Lake.

As well as making all of its available data actionable wherever it might be needed with a minimum of latency and ensuring that data is accurately mappable to common enterprise reference models, the Lake lowers data TCO by eliminating redundant or duplicated data and reducing the cost of moving it around the place in a point to point manner.

“We have a prescribed and thorough decision process around what is analytically valuable data, and then we make a data flow determination about where to store it. Currently, we are accelerating on a program to store it at a manageable price point, making it a win on multiple business, financial, and technology levels.”

Along with many industries, banking and finance is set to be revolutionized by oncoming advances in real-time streaming analytic capabilities, Simone believes.

“In most of the use cases we have executed so far,” he says, “only a small percentage has been in real-time. As an industry, what we are seeing is an increasing number of creative use cases and thinking around streaming analytics in real-time.

“The challenge is going to be how to operationalize those insights fast enough to make them valuable to the business in a timely fashion. Big data and real-time business insights are becoming more closely coupled together, providing a higher degree of criticality in new and old areas. By enabling us to react faster for our customers, they are empowering us to move towards a proactive business model to deliver remarkable experiences.”

 

This article was written by Bernard Marr from Forbes and was legally licensed through the NewsCred publisher network.