What is a Data Lake vs a Data Warehouse

Data is one of the most important assets that an organization utilizes and manages but data is only useful if you can properly make use of that data. One of the key ways that organizations can make better use and add value to their data is through data warehousing.
data lake vs data warehouse satellite image

Data is one of the most important assets that an organization utilizes and manages but data is only useful if you can properly make use of that data. One of the key ways that organizations can make better use and add value to their data is through data warehousing. 

One of the key components of digital transformation nowadays is the use of data. We’ve always had data within our organizations, we’ve always had systems that collect data, track data and even report on data but now more than ever, we have so much data we’ve accumulated as organizations, that we need to figure out ways to get value and use out of that data.

Emerging technologies like machine learning, artificial intelligence and predictive analytics can only work if we’ve got solid data and we can actually make use of that data. In this article, we are going to talk about data warehousing and data lakes and what it means to your organization and how it might impact your digital transformation.

YouTube player

Data Warehouse vs A Data Lake

To start, it helps to understand what a data warehouse is and what a data lake is. Data lake is a newer concept, whereas data warehousing has been around for a longer period so we start with data warehousing. A data warehouse is a software that allows you to take structured data from one or more systems and store it in an organized fashion, in a way that you could report and slice and dice that data in a way to add meaning and more value to the data. 

For example if you’re using multiple systems, say an Enterprise Resource Planning system, a CRM system or Supply Chain Management system, you’ve got multiple systems throughout your organization. Data warehouse is a place that you can take that data from those core operational systems stored in the data warehouse and then figure out how to manipulate that data and make use of that data. 

A data warehouse is focused on structured data whereas a data lake can handle structured data but it’s more suitable for unstructured data. An example of structured data that would be more useful for a data warehouse would be something like a financial transaction with a very clear numeric value behind it or a customer order with a clear line item for what that customer is ordering. Unstructured data on the other hand which is where a data Lake comes in and an example of unstructured data would be something like a customer complaint about the quality of your product. You might receive a return along with a customer complaint and it might be a qualitative description of what was wrong with that product so it’s not a numeric predefined data set, it’s something that’s unstructured, it’s free form text, so that’s probably the best example of what unstructured data might look like and that’s what a data lake is used for. Data Lakes will capture this unstructured information and this is where data science really comes into play because now you need to apply some higher caliber data science and analytics to find meeting and patterns within that data. 

The beauty of this unstructured data is that in years past we couldn’t do a lot with that unstructured data, it wasn’t something that we could quantify or create a summary of in an easy way but now with artificial intelligence and machine learning we have tools that can allow us to look for patterns in that unstructured data and data scientists can now make meaning of or find meaning in the unstructured data that’s in a data lake so those are the two differences between data warehouses and data Lakes but if we summarize both buckets data warehousing and data Lakes it’s essentially a third-party software or a place to store the data from other systems so that you can make use of that data and analyze the data in a more meaningful way.

Benefits of Data Warehousing and A Data Lake

There are a number of benefits to deploying a data warehouse and or a data Lake. First and foremost, it allows you to collect and integrate data across multiple systems. If you’re operating in a siloed or a best of breed or a multiple system environment, a data lake or data warehouse is a way to collect and gather that data in one place where you can start to analyze and make sense of all those different data points. Another advantage of these data tool sets is it allows you to get a lot of value out of data and processes and systems that you already have in place, without a huge investment in time, cost and risk. In other words, rather than having to replace your entire operational technology stack, you can use a data lake or a data warehouse or both, to make better use of and deliver more of a ROI from those multiple systems. Typically, deploying a data lake or data warehouse is going to be a lot cheaper and a lot less time consuming and a lot less risky than ripping out and replacing your core operational systems. It can be a great way to deliver more immediate ROI at a lower risk and a lower cost.

Finally, it’s not just about storing data. In integrating data, data warehouses and data Lakes provide a staging for you to provide better business intelligence, predictive analytics, leveraging artificial intelligence and using these more advanced technologies to really make use of and gain deeper insights into the data in ways that core technology is typically and historically have not been able to. These are a few reasons why organizations are so keen on deploying data lakes and data warehouses, they’re integrated, they deliver more immediate business value and it allows you to do more with the data, which in today’s day and age, is becoming a core competitive advantage and a really important asset to organizations. Data lakes and data warehouses are tools that enable you to deliver on that value proposition now while there is a clear beneficial upside to deploying a data warehouse or a data Lake within your organization.

Downside Risks of Data Warehousing and Data Lakes

As with most things, there’s also data warehousing and data lake downsides. First and foremost, it requires a certain amount of sophistication when it comes to data science and technology. In general, you have to know how to get the data from multiple systems, you have to know how to analyze the data, you have to know how to report on that data and that’s not something that’s typically done by your average end user. These are roles that require a higher caliber skill set, either in terms of a data scientist, a technologist or a system architect, a user that can figure out how to tie together multiple sets of data. 

Another downside or potential risk of data lakes and data warehouses is as powerful as they are, they’re only as good as the data that’s actually stored in those tools. If your data is corrupt at the operational level, let’s just say you have a financial and accounting system that hasn’t been maintained or preserved very well or a customer master list that isn’t very well maintained and you try to take that data from those systems, a data lake or data warehouse is not going to get you very far because the core route of the data is corrupt. It’s vitally important, in order to get the full value out of data lakes and warehouses, to ensure that you have good data and that you have good operational technology supporting the data that feeds into the data lake or data warehouse.

Last, another downside risk of data lakes and data warehouses is that if you are deploying it as a sort of a Band-Aid to solve a longer term problem, which may be that your core operational systems need to be replaced. It may serve your purposes in the short term but longer term a data warehouse or data lake is not going to fix the fact that your core operational systems are broken or outdated or obsolete, so you just want to be aware of the fact that just because you can deploy data lakes and data warehouses at lower cost, lower risk, higher ROI, eventually you may need to still revisit your core operational systems that are feeding into those data tool sets.

Data Lake and Data Warehouse Vendors

There are a number of tools out there that can provide data lakes and data warehouses. Certainly some of the big ERP software vendors like Oracle, SAP and Microsoft have their own versions of a data warehouse or data lake or their own equivalent of one of those tools but there’s also third-party standalone systems that provide that same capability as well.

One example in an emerging up and coming example is Snowflake. Snowflake is a pretty popular data lake and data warehousing tool that’s used agnostically across different types of technologies to consolidate that data into one place. Snowflake’s not the only system, there’s a lot of other systems out there that provide that same capability, particularly as it relates to data warehouses and data lakes.

YouTube player

If you are looking to strategize an upcoming transformation or are looking at selecting an ERP system, we would love to give you some insights. Please contact me for more information eric.kimberling@thirdstage-consulting.com

Be sure to download the newly released 2023 Digital Transformation Report to garner additional industry insight and project best practices.

Kimberling Eric Blue Backgroundv2
Eric Kimberling

Eric is known globally as a thought leader in the ERP consulting space. He has helped hundreds of high-profile enterprises worldwide with their technology initiatives, including Nucor Steel, Fisher and Paykel Healthcare, Kodak, Coors, Boeing, and Duke Energy. He has helped manage ERP implementations and reengineer global supply chains across the world.

Share:

More Posts

Subscribe for updates

We never share data. We respect your privacy

Additional Blog Categories

Artificial Intelligence 24
Business Intelligence 8
Business Process 21
Business Transformation 35
Cloud ERP Implementations 58
cloud solutions 1
Consulting 10
Coronavirus and Digital Transformation 13
CRM Implementations 27
Custom Development 1
Cyber Security 7
Data Management 6
Digital Strategy 296
Digital Stratosphere 10
Digital transformation 409
digital transformation case studies 8
Digital Transformation News 7
E-Commerce 3
Emerging Technology 4
enterprise architecture 1
EPMO 1
ERP architecture 2
ERP Consulting 24
ERP Expert Witness 3
ERP Failures 56
ERP Implementation Budget 1
ERP Implementations 381
ERP project 14
ERP software selection 179
ERP Systems Integrators 16
ERP Thought Leadership 4
Executive Leadership in Digital Transformation 16
Future State 5
Global ERP Implementations 29
HCM Implementations 72
Healthcare 1
IFS 4
Independent ERP 14
Independent ERP Consultants 28
Internet of Things 1
legacy systems 1
Manufacturing ERP Systems 7
Mergers and Acquisitions 2
Microsoft D365 9
Microsoft D365 Consultants 1
Microsoft Dynamics 365 Implementations 87
Microsoft Sure Step 1
NetSuite Implementations 42
OCM 9
Odoo 4
Oracle Cloud ERP Implementations 90
Oracle ERP Cloud Expert Witness 3
Oracle ERP Cloud Failures 7
Organizational Change Management 93
Project Management 12
Quality Assurance 3
Quickbooks 2
Remote ERP 1
Sage 100 3
SAP Activate 1
SAP Expert Witness 5
SAP failures 21
SAP S/4HANA Implementations 120
SAP S/4HANA vs. Oracle vs. Microsoft Dynamics 365 9
SAP vs Oracle vs Microsoft Dynamics 7
SAP vs. Oracle 6
Small Business ERP Implementations 15
Small Business ERP Systems 8
Software Selection 35
Software Testing 5
Software Vendors 15
SuccessFactors Implementations 50
Supply Chain Management 33
System Architecture 5
Systems Integrators 8
Tech Trends 2
Tech Trends 1
Technology Consultant 3
Top ERP software 35
Top OCM 0
Warehouse Management Systems 6
Workday Implementations 52