Data Governance and Data Lake: Data Policy in Our Favor

A widely recognized collection for machine learning tasks.
Post Reply
shukla7789
Posts: 1196
Joined: Tue Dec 24, 2024 4:28 am

Data Governance and Data Lake: Data Policy in Our Favor

Post by shukla7789 »

What is data governance and how does it relate to data lakes? Learn how to have a better data policy in your company.
As we have seen more than once —and it never hurts to repeat—, information is the greatest asset that a company has today. And there is no good information without good data. This is where everyone has their methodology, procedures and technologies for data management. The key to selecting the best data management system is a good data governance model .

It is in this context that we ask ourselves: Data Lake, Big Data, Data Warehouse? Where do we start?


The increase in existing data from different sources and in homeowner database formats, together with the need to process it properly and ensure its quality, today implies an added difficulty on the path to achieving optimal analysis to boost business performance. Let's look at some numbers:

The size of global data will measure a staggering 40 zettabytes by 2020.

Structured data is growing at a rate of 40% each year.

Content, which includes all types of data, including structured and unstructured, is growing at a steady rate of approximately 80% annually.

Machine-generated data is expected to increase 15-fold this year.

Data Lake: Overcoming the Limitations of Data Warehouse

The Big Data phenomenon
Big Data is used to describe both the technological ecosystem and the industry that deals with data that is too large or complex to be stored and/or processed by traditional means.

A popular definition of Big Data is described by the so-called “4 Vs”: Volume, Variety, Velocity and Veracity:

Volume. Refers to the difficulty caused by the size of the data.

Variety. This speaks to the complexity of dealing with disparate data types; some of your data will be structured, semi-structured, or unstructured, and the technology to deal with this variety is not trivial.

Velocity. When collecting real-time events such as IoT data, web traffic, financial transactions, database changes, or anything else that occurs in real time, the “velocity” of data flowing into (and in many cases, out of) your systems can easily exceed the capabilities of traditional database technologies.

Veracity. This is the added complexity of dealing with data that is invalid, erroneous, malicious, malformed, or all of the above. This adds the need for data validation, quality control, normalization, and more.

And we could add a 5 V: the value of the data. And this depends on how we work with it.







You may be interested in reading:
Agile and profitable businesses with cloud-native data management





What is a Data Lake?
A data lake is a repository for Big Data. It stores data of all kinds in its rawest form, i.e. structured, unstructured and semi-structured, that has been generated from different sources. A data lake is different from a data warehouse. The latter store data in a well-structured form. The data present in a lake may or may not be used in the future, but the data in a data warehouse is meant to be used as all the irrelevant stuff has already been removed.

The business model of a Data Lake is evolutionary and, according to recommendations from Dell Technologies specialists, is carried out in three stages:

Familiarize yourself with the technologies.

Create an elastic data platform.

Create a collaborative value creation platform.

In short, Big Data is big data and the Data Lake is the repository for it.



And finally, what is Data Governance?
Data governance is all the management that is done to ensure the integrity, ease of use, security and availability of a company's data. Having or not having data management can be the key to whether your data-driven strategy is effective or not. At least it is essential for Data Analysis and Business Intelligence.

Data Governance is, in fact, a turning point on the path to becoming a data-driven company. It is a change of mentality that requires focusing efforts on working with information and treating it in the most appropriate way.

A good Data Governance program must be based on three basic pillars:

A clear decision-making framework. Establish how data is invested, whether to solve data quality issues or just for analytics investments.

Trust. For many analytics consumers, there is a great deal of opacity around the definitions behind the reports they are looking at. Unlike in a spreadsheet, where formulas are transparent right under the pointer, when we move to more modern reporting and analytics platforms, these become much more opaque. We can't necessarily see the definitions just by looking at the surface level of the report.
Post Reply