Finding data is the first step to using data. If you can’t even find the data, how can you use it? Data maps, data catalogs, and even data asset platforms, etc. In fact, the goal is to display the data that has been processed by the data platform, so that people who have data needs can complete the first step of using data – finding data. The data map here is essentially the same as the metadata we discussed in the data management section. However, the presentation format can be more flexible. In other words, one is for R&D and the other is for business applications. In the metadata section, the interface is generally displayed in a tree structure according to the data source to which it belongs. In a data map, there is usually a home page with a search box. In the search list, the details page has different tabs. front page The main function of the homepage is a search function. Users enter the content they want to search for, and after fuzzy matching, the list of fuzzy matching contents is displayed. The list here is the content of the table. If it is an enhanced version, this search can be used to search and query data assets' data service APIs, reports, large screens, and even articles. This can be explained in the asset search. Details page After searching, click on a specific field to display the search details. The details page is actually a description of each dimension of the table, and the dimensions are constantly deepened with use. Usually, we can add dimensions such as: basic information, fields, data preview, partition information, data audit, data lineage, update information, processing tasks, evaluation, etc. Basic Information The basic information includes the table's English name, Chinese name, table description, creation time, person in charge, and other basic information. And what data warehouse layer and business area does this metadata belong to? This information is set in the planning of the table level in the data management chapter 2. Fields The fields, field types, and field descriptions in the table are displayed in a list format. Whether the field descriptions are rich and comprehensive is also an important dimension of whether the data is comprehensive. Data Preview There is no need to query data. Providing data preview capability can show what the data in the table looks like, which can provide data consumers with a more intuitive user experience. One problem here is that if you query the data directly, you need to choose the resources to use when querying the data. If you save the data in advance, you need to have a plan for how much to save, what storage to use, and whether to update it. Partition information If it is a big data storage such as HIVE, etc. If it is a partitioned table, you need to list the partition information, what are the partition fields, what is the latest partition, and when each partition is updated and written. Data Audit This information is actually more of a data exploration process, which is equivalent to summarizing the characteristics of some fields in advance without the need for users to manually write SQL to summarize. If it is an enumeration field, how many enumeration values there are and how many counts each value has. If it is a numeric type, what is the distribution of the numeric type field, etc. This information is the result of a calculation in a table, which involves a range of issues. When to calculate and what resources to use for calculation. Only when these are clearly thought out can this function be better implemented. Data lineage Data lineage can be understood as a simplified version of the end-to-end task lineage link in the task governance chapter. Here, only the upstream and downstream relationships between tables are displayed. Users are used for impact analysis and data traceability. The display format is still in the form of a graph. Update information Each table needs to be updated, fields added, field types changed, fields deleted, etc. Here you can record the entire change information of the table. Processing tasks The corresponding processing tasks are displayed on the interface, which intuitively shows which task generated this table. evaluate The evaluation function is more flexible. It can be an official evaluation, such as data popularity and data credibility. This credibility can be mentioned in the use of data indicators for OLAP. If it is a unified indicator, it is guaranteed to be consistent, and an official label is added to indicate that it is. It can also be user-oriented, providing opinions on this table, such as what fields to add, how accurate the data is, etc., thus establishing a channel for information collection and feedback. Generated data services If it is a data service API generated based on a table, the corresponding API will be displayed directly. If it is based on SQL, it can also reflect in which data service API logic this table is located. |
<<: Death of an Algorithm Engineer
>>: User retention analysis: improve user stickiness and increase user life cycle value
Recently, #小米网街# has become very popular. Xiaomi H...
There are thousands of high-quality domestic and f...
This article breaks down Jellycat’s brand logic an...
Domestic e-commerce is not that easy to do, but cr...
This article will analyze the successful marketing...
Many people have heard of the "people, goods,...
Nowadays, international trade is very popular. Man...
I believe that the question of asking for help is ...
In the fierce competition of the e-commerce indust...
In 2022, with the "100 million yuan for buyin...
Outdoor activities have become a new lifestyle. Ho...
Amazon promotional codes are divided into three ty...
Amazon European FBA enables sellers to store goods...
SHOPEE has released a new SIP price adjustment rat...
In the stock era, commercialization is particularl...