Changing the Game with a Data Lake: An Interview with Centrica’s David Cooper and Daljit Rehal

By Jon Brock and Sesh Iyer

UK-based Centrica is a leading multinational energy and services company. Its brands include British Gas and Hive. BCG’s Jon Brock and Sesh Iyer recently spoke with Centrica’s David Cooper, group CIO, and Daljit Rehal, strategic systems director, about how the company established a data lake at British Gas as part of its efforts to transform Centrica’s data analytics capabilities. Edited excerpts from the discussion follow.

David and Daljit, can you briefly describe the circumstances that led to Centrica’s decision to establish a data lake at British Gas?

Cooper: When we began this project, about three years ago, British Gas’s information architecture and supporting infrastructure were largely patchwork in design. Essentially, British Gas had assembled a collection of technologies over time, with limited thought given along the way to how they might be forged into a coherent information architecture. Simultaneously, we were facing increasing demands internally. British Gas wanted to do more with the data it had, but our systems were struggling to provide the necessary access and functionality.

Rehal: Also, we were at a point where we needed to make decisions about British Gas’s data warehouses. They all needed hardware upgrades and new appliances. And we hadn’t yet begun to dip a toe in the world of big data; we were just trying to do the basic business reporting and analytics. We knew that we would have to find a way to accommodate the substantial growth in our data volume that would result from smart metering, connected homes, feeds from our website, sensors in various devices, and so forth. In addition, it occurred to David and me that neither of us had ever met anyone in an IT organization who said, “I’m really happy with my data warehouse.” Bottom line, we realized that we needed different technologies and a different approach.

What are some of the key advantages of the technology you’re using now over traditional data warehouse technologies?

Rehal: Some of the most visible advantages are on the cost front. A lot of the new technology can run on cheaper hardware. It can also be scaled in very small increments. So there can be material cost savings. 

Cooper: Another big advantage, cost-wise, is that a lot of the new technology is open source. So we can avoid such things as expensive vendor lock ins.

What are some of the critical functional advantages of the technology?

Cooper: The data lake holds raw, not summarized, data. It also accommodates more frequent updates than the traditional data warehouse. In the past, with a traditional data warehouse, it was routine to have delays of 48 hours, and potentially much longer, between the time something occurred and when the data was actually stored. With the data lake and its more than 200 servers, we’ve removed the traditional bottlenecks. We can take and perform analytics on data from our source systems up to four times a day.

Rehal: The data lake is also inherently unstructured, so it accommodates a wider mix of data and data types—it’s essentially a dumping ground for data. You can populate it with whatever you want to, and there’s no need to spend months and months creating engineering structures to accommodate it. We’ve got everything in our data lake, from GPS data for the company’s vans to smart-meter readings for customers to data from our billing platforms. The data from most of British Gas’s systems is in there already, and we aim to add the rest as soon as possible. We’ve also designed a high level of data security into the lake; we’ve had that in place from the inception.

You’ve engaged in large-scale, complex transformation initiatives within British Gas, focused on such things as billing and customer relationship management systems before. Did you approach this effort differently?

Cooper: Our approach toward this differed in several critical respects. First, we didn’t have the benefit of being able to study the examples of other major companies that had tried this—no FTSE companies had attempted it. So we started in much more of a proof-of-concept mode. We needed to determine if this was something that we could actually do, given the huge amount of complexity. Removing all of a major company’s historical information systems and replacing them with a data lake isn’t simple. If it were, everyone would be doing it. We ran our first big proof-of-concept test on half a dozen Raspberry Pi computers clustered together. From there, we graduated to PCs and then to servers.

Rehal: There were other significant differences from our normal approach. Most project management methodologies, whether agile or waterfall, start with a requirement-scheduling exercise and then proceed to developing a solution. Our approach here was radically different because we didn’t need to determine requirements—we were going to take all of the data, since doing so wouldn’t cost us that much more than just taking part of it. We’d take all of it and worry about whether people found it useful later. Hence, we didn’t need to schedule requirement-capture workshops and so forth. This was a radical change to our way of thinking and approach.

How do you manage data governance? That seems like it could be a sizable challenge, given the volume of data in the lake.

Cooper: The program itself acted like an ignition point for the business to start thinking, or rethinking, about data governance. Through acquisitions and considerable organic growth, the company had forgotten much of what it probably knew at one point. We had to start again, with a new set of data owners and the identification of dedicated data stewards. In terms of metadata management and processes, we developed our own solutions and are in the process of rolling them out.

What are some of the major business and IT cost benefits of the data lake project so far? 

Cooper: There have already been significant business benefits. We now have the foundation in place to enable the business to do essentially all of the things it wants and needs to do with the tremendous amounts of data, including smart data, that it’s accumulating daily. These are capabilities that didn’t exist before that can make a meaningful difference to the business. An example is the enhanced forecasting ability that the company now has. Even a modest improvement in, say, the ability to estimate likely power and gas usage across the customer base over a given period of time can translate into material economic benefits for British Gas.

A second example of a business benefit enabled by the data lake is the new functionality we’ve been able to deliver to our people in the field. In the past, when one of British Gas’s engineers was making a service call to a customer’s home, he or she would have access only to the information necessary to perform the specific task required. Now, through the apps that we have created for them, the engineers have access, via their handheld and mobile devices, to information such as which other British Gas products or services this particular customer has and the history of previous customer contacts. This greatly enhances the engineer’s ability to make real-time decisions that can enhance the company’s level of service.

There are numerous other benefits to our business users as well, including such things as better ability to gauge and attribute performance and identify revenue leakage. And we’re at the tip of the iceberg, as the prospect of real-time pricing is right around the corner. So we expect that the business benefits will continue to unfold.

Rehal: In terms of IT cost benefits, many of those are realized early owing to the cost avoidance of having to buy bigger data warehouse appliances, as mentioned. The data lake has also helped us save money by enabling us to reduce our storage footprint and decommission a lot of redundant storage.

We’ve also benefited on the cost front by bringing processing to the lake—that is, migrating certain analytical platforms, such as SAS, to it—rather than extracting the data and taking it to processes outside the lake. This has had the added benefits of curbing data leakage and giving us greater opportunity to control the data and make it more secure.

Ultimately, because of the complexity of our estate, we were able to justify the entire data lake investment on the basis of IT cost savings. That was much easier than trying to do so on the basis of various business units’ individual business cases.

Have you encountered any significant cultural challenges as the initiative has proceeded?

Cooper: The resistance from both the business team and a lot of our IT people was enormous initially. People were comfortable with what they were familiar with. This was particularly true for some of our IT staff, especially those with the longest histories in data. These individuals had amassed certain qualifications and experiences over their careers and viewed the pending changes as a threat to their personal worth. Some also thought that the initiative would never work and would end in tears. But once the project started to succeed, they began to align.

Rehal: There was also strong resistance initially from the business units, especially some of the leadership, who were very concerned with the particular choice of technology solutions. To win them over, we had to get them thinking about outcomes. One of those outcomes, we pointed out, was potential independence from the IT unit. We asked them: Do you want the world of self-service? Do you want to be free of the huge waterfall-type governance process you’ve been dealing with?

This worked in many cases, though ultimately we encountered two different types of communities. There were people who said, “I just want the data. I don’t want anything else from you. I’m clever, I’ve got clever people on my team, I can do the analysis.” For those people, the notion of autonomy from IT was a very easy sell. There was another group within the business units, though, that was so hooked on the existing IT that it would not accept the idea until the finished product was delivered, with all the requirements captured and the designs finished.

Cooper: Make no mistake, the leap from the old environment to the new one is, in fact, substantial. When you look at some of our traditional reports for simple things like the number of customers who received products and services from British Gas today versus yesterday, the query, written in SQL, is huge. Today, that can be replaced with a small Java program that is a fraction of the length and is instantly reusable. This is a substantially different universe, and the transition was hard on some individuals. And again, there were two camps. There were some who said, “OK, this is an opportunity for me to learn something new.” There were others who said, “I don't want to be part of this; I’m going to go somewhere else.”

So yes, there have certainly been cultural challenges. We’ve worked through them with single-mindedness and by building on our successes.

Are there any lessons you’ve taken from the journey so far that you think could be particularly valuable for other companies contemplating such a move?

Rehal: You will probably face considerable skepticism and opposition, so have a clear vision and be brave!

Where do you expect Centrica to be in several years as the journey continues?

Rehal: I’d expect that in, say, two or three years’ time, as we continue to expand what we’re doing with British Gas across the rest of Centrica, our information architecture will be relatively simple—simple, clean, and with one copy of the data rather than several. Our costs will be lower as a result. We’ll have modern, cutting-edge technology and people who love working on it. Our more effective use of our data will continue to unlock opportunities for the business.

Thanks, Daljit and David.