What is metadata and what is its importance in Big Data?
Posted: Tue Jan 21, 2025 10:49 am
Find out what metadata is and what it has to do with the supply chain, and take note of the value of this type of data in times of Big Data.
To define what metadata is , we will make an analogy with distribution logistics. In this way, we can clearly explain what metadata is, and why it is critical in data management in a big data environment.
You may be interested in reading:
Informatica's Cloud Data Lake, a step towards leadership in Big Data
What is metadata and what does it have to do with the supply chain?
When you send a package to an international destination, if you have a linkedin database with the delivery of the order, you appreciate having information about where your goods are in the route. Logistics companies maintain information about all goods in transit so they can track the movement and successful delivery of packages throughout the entire shipping process.
Metadata provides this same type of visibility into the data-rich environment of big data. Data moves in and out of companies, and it moves within them as well. Tracking data changes and spotting a process that causes problems when performing data analysis is difficult if you don’t have insight into the data and its movement process. Today, even a single column change in a source table can impact hundreds of reports that use that data, so it’s very important to know in advance which columns will be affected.
So what is metadata ?
Metadata provides information about each data set . For example, size, database schema, format, last modification time, access control lists, usage, etc.
The use of metadata enables the management of a scalable data lake platform and architecture, as well as data governance.
Metadata is typically stored in a central catalog to provide users with information about available data sets.
To define what metadata is , we will make an analogy with distribution logistics. In this way, we can clearly explain what metadata is, and why it is critical in data management in a big data environment.
You may be interested in reading:
Informatica's Cloud Data Lake, a step towards leadership in Big Data
What is metadata and what does it have to do with the supply chain?
When you send a package to an international destination, if you have a linkedin database with the delivery of the order, you appreciate having information about where your goods are in the route. Logistics companies maintain information about all goods in transit so they can track the movement and successful delivery of packages throughout the entire shipping process.
Metadata provides this same type of visibility into the data-rich environment of big data. Data moves in and out of companies, and it moves within them as well. Tracking data changes and spotting a process that causes problems when performing data analysis is difficult if you don’t have insight into the data and its movement process. Today, even a single column change in a source table can impact hundreds of reports that use that data, so it’s very important to know in advance which columns will be affected.
So what is metadata ?
Metadata provides information about each data set . For example, size, database schema, format, last modification time, access control lists, usage, etc.
The use of metadata enables the management of a scalable data lake platform and architecture, as well as data governance.
Metadata is typically stored in a central catalog to provide users with information about available data sets.