What is a Data Lake vs a Data Warehouse

Written By: Eric Kimberling
Date: November 22, 2022

Data is one of the most important assets that an organization utilizes and manages but data is only useful if you can properly make use of that data. One of the key ways that organizations can make better use and add value to their data is through data warehousing. 

One of the key components of digital transformation nowadays is the use of data. We've always had data within our organizations, we've always had systems that collect data, track data and even report on data but now more than ever, we have so much data we've accumulated as organizations, that we need to figure out ways to get value and use out of that data.

Emerging technologies like machine learning, artificial intelligence and predictive analytics can only work if we've got solid data and we can actually make use of that data. In this article, we are going to talk about data warehousing and data lakes and what it means to your organization and how it might impact your digital transformation.

YouTube player

Data Warehouse vs A Data Lake

To start, it helps to understand what a data warehouse is and what a data lake is. Data lake is a newer concept, whereas data warehousing has been around for a longer period so we start with data warehousing. A data warehouse is a software that allows you to take structured data from one or more systems and store it in an organized fashion, in a way that you could report and slice and dice that data in a way to add meaning and more value to the data. 

For example if you're using multiple systems, say an Enterprise Resource Planning system, a CRM system or Supply Chain Management system, you've got multiple systems throughout your organization. Data warehouse is a place that you can take that data from those core operational systems stored in the data warehouse and then figure out how to manipulate that data and make use of that data. 

A data warehouse is focused on structured data whereas a data lake can handle structured data but it's more suitable for unstructured data. An example of structured data that would be more useful for a data warehouse would be something like a financial transaction with a very clear numeric value behind it or a customer order with a clear line item for what that customer is ordering. Unstructured data on the other hand which is where a data Lake comes in and an example of unstructured data would be something like a customer complaint about the quality of your product. You might receive a return along with a customer complaint and it might be a qualitative description of what was wrong with that product so it's not a numeric predefined data set, it's something that's unstructured, it's free form text, so that's probably the best example of what unstructured data might look like and that's what a data lake is used for. Data Lakes will capture this unstructured information and this is where data science really comes into play because now you need to apply some higher caliber data science and analytics to find meeting and patterns within that data. 

The beauty of this unstructured data is that in years past we couldn't do a lot with that unstructured data, it wasn't something that we could quantify or create a summary of in an easy way but now with artificial intelligence and machine learning we have tools that can allow us to look for patterns in that unstructured data and data scientists can now make meaning of or find meaning in the unstructured data that's in a data lake so those are the two differences between data warehouses and data Lakes but if we summarize both buckets data warehousing and data Lakes it's essentially a third-party software or a place to store the data from other systems so that you can make use of that data and analyze the data in a more meaningful way.

Benefits of Data Warehousing and A Data Lake

There are a number of benefits to deploying a data warehouse and or a data Lake. First and foremost, it allows you to collect and integrate data across multiple systems. If you're operating in a siloed or a best of breed or a multiple system environment, a data lake or data warehouse is a way to collect and gather that data in one place where you can start to analyze and make sense of all those different data points. Another advantage of these data tool sets is it allows you to get a lot of value out of data and processes and systems that you already have in place, without a huge investment in time, cost and risk. In other words, rather than having to replace your entire operational technology stack, you can use a data lake or a data warehouse or both, to make better use of and deliver more of a ROI from those multiple systems. Typically, deploying a data lake or data warehouse is going to be a lot cheaper and a lot less time consuming and a lot less risky than ripping out and replacing your core operational systems. It can be a great way to deliver more immediate ROI at a lower risk and a lower cost.

Finally, it's not just about storing data. In integrating data, data warehouses and data Lakes provide a staging for you to provide better business intelligence, predictive analytics, leveraging artificial intelligence and using these more advanced technologies to really make use of and gain deeper insights into the data in ways that core technology is typically and historically have not been able to. These are a few reasons why organizations are so keen on deploying data lakes and data warehouses, they're integrated, they deliver more immediate business value and it allows you to do more with the data, which in today's day and age, is becoming a core competitive advantage and a really important asset to organizations. Data lakes and data warehouses are tools that enable you to deliver on that value proposition now while there is a clear beneficial upside to deploying a data warehouse or a data Lake within your organization.

Downside Risks of Data Warehousing and Data Lakes

As with most things, there's also data warehousing and data lake downsides. First and foremost, it requires a certain amount of sophistication when it comes to data science and technology. In general, you have to know how to get the data from multiple systems, you have to know how to analyze the data, you have to know how to report on that data and that's not something that's typically done by your average end user. These are roles that require a higher caliber skill set, either in terms of a data scientist, a technologist or a system architect, a user that can figure out how to tie together multiple sets of data. 

Another downside or potential risk of data lakes and data warehouses is as powerful as they are, they're only as good as the data that's actually stored in those tools. If your data is corrupt at the operational level, let's just say you have a financial and accounting system that hasn't been maintained or preserved very well or a customer master list that isn't very well maintained and you try to take that data from those systems, a data lake or data warehouse is not going to get you very far because the core route of the data is corrupt. It's vitally important, in order to get the full value out of data lakes and warehouses, to ensure that you have good data and that you have good operational technology supporting the data that feeds into the data lake or data warehouse.

Last, another downside risk of data lakes and data warehouses is that if you are deploying it as a sort of a Band-Aid to solve a longer term problem, which may be that your core operational systems need to be replaced. It may serve your purposes in the short term but longer term a data warehouse or data lake is not going to fix the fact that your core operational systems are broken or outdated or obsolete, so you just want to be aware of the fact that just because you can deploy data lakes and data warehouses at lower cost, lower risk, higher ROI, eventually you may need to still revisit your core operational systems that are feeding into those data tool sets.

Data Lake and Data Warehouse Vendors

There are a number of tools out there that can provide data lakes and data warehouses. Certainly some of the big ERP software vendors like Oracle, SAP and Microsoft have their own versions of a data warehouse or data lake or their own equivalent of one of those tools but there's also third-party standalone systems that provide that same capability as well.

One example in an emerging up and coming example is Snowflake. Snowflake is a pretty popular data lake and data warehousing tool that's used agnostically across different types of technologies to consolidate that data into one place. Snowflake's not the only system, there's a lot of other systems out there that provide that same capability, particularly as it relates to data warehouses and data lakes.

YouTube player

If you are looking to strategize an upcoming transformation or are looking at selecting an ERP system, we would love to give you some insights. Please contact me for more information eric.kimberling@thirdstage-consulting.com

Be sure to download the newly released 2023 Digital Transformation Report to garner additional industry insight and project best practices.

Kimberling Eric Blue Backgroundv2
Eric Kimberling

Eric is known globally as a thought leader in the ERP consulting space. He has helped hundreds of high-profile enterprises worldwide with their technology initiatives, including Nucor Steel, Fisher and Paykel Healthcare, Kodak, Coors, Boeing, and Duke Energy. He has helped manage ERP implementations and reengineer global supply chains across the world.

Author:
Eric Kimberling
Eric is known globally as a thought leader in the ERP consulting space. He has helped hundreds of high-profile enterprises worldwide with their technology initiatives, including Nucor Steel, Fisher and Paykel Healthcare, Kodak, Coors, Boeing, and Duke Energy. He has helped manage ERP implementations and reengineer global supply chains across the world.
Subscribe for updates
We never share data. We respect your privacy
Stratosphere 2024
Register Here
Additional Blog Categories

Categories

Resources

crossmenuarrow-right