Data Operations | The first step to using data—finding data

This article is an in-depth discussion on data operations, with a particular emphasis on the importance of "finding data" in the process of using data. The author details tools such as data maps, data catalogs, and data asset platforms, which aim to display data that has been processed by the data platform so that people with data needs can easily find and use the data.

Finding data is the first step to using data. If you can’t even find the data, how can you use it? Data maps, data catalogs, and even data asset platforms, etc. In fact, the goal is to display the data that has been processed by the data platform, so that people who have data needs can complete the first step of using data – finding data.

The data map here is essentially the same as the metadata we discussed in the data management section. However, the presentation format can be more flexible. In other words, one is for R&D and the other is for business applications.

In the metadata section, the interface is generally displayed in a tree structure according to the data source to which it belongs.

In a data map, there is usually a home page with a search box. In the search list, the details page has different tabs.

front page

The main function of the homepage is a search function. Users enter the content they want to search for, and after fuzzy matching, the list of fuzzy matching contents is displayed. The list here is the content of the table.

If it is an enhanced version, this search can be used to search and query data assets' data service APIs, reports, large screens, and even articles. This can be explained in the asset search.

Details page

After searching, click on a specific field to display the search details.

The details page is actually a description of each dimension of the table, and the dimensions are constantly deepened with use. Usually, we can add dimensions such as: basic information, fields, data preview, partition information, data audit, data lineage, update information, processing tasks, evaluation, etc.

Basic Information

The basic information includes the table's English name, Chinese name, table description, creation time, person in charge, and other basic information.

And what data warehouse layer and business area does this metadata belong to? This information is set in the planning of the table level in the data management chapter 2.

Fields

The fields, field types, and field descriptions in the table are displayed in a list format. Whether the field descriptions are rich and comprehensive is also an important dimension of whether the data is comprehensive.

Data Preview

There is no need to query data. Providing data preview capability can show what the data in the table looks like, which can provide data consumers with a more intuitive user experience.

One problem here is that if you query the data directly, you need to choose the resources to use when querying the data. If you save the data in advance, you need to have a plan for how much to save, what storage to use, and whether to update it.

Partition information

If it is a big data storage such as HIVE, etc. If it is a partitioned table, you need to list the partition information, what are the partition fields, what is the latest partition, and when each partition is updated and written.

Data Audit

This information is actually more of a data exploration process, which is equivalent to summarizing the characteristics of some fields in advance without the need for users to manually write SQL to summarize. If it is an enumeration field, how many enumeration values there are and how many counts each value has. If it is a numeric type, what is the distribution of the numeric type field, etc.

This information is the result of a calculation in a table, which involves a range of issues. When to calculate and what resources to use for calculation. Only when these are clearly thought out can this function be better implemented.

Data lineage

Data lineage can be understood as a simplified version of the end-to-end task lineage link in the task governance chapter. Here, only the upstream and downstream relationships between tables are displayed. Users are used for impact analysis and data traceability. The display format is still in the form of a graph.

Update information

Each table needs to be updated, fields added, field types changed, fields deleted, etc. Here you can record the entire change information of the table.

Processing tasks

The corresponding processing tasks are displayed on the interface, which intuitively shows which task generated this table.

evaluate

The evaluation function is more flexible. It can be an official evaluation, such as data popularity and data credibility. This credibility can be mentioned in the use of data indicators for OLAP. If it is a unified indicator, it is guaranteed to be consistent, and an official label is added to indicate that it is.

It can also be user-oriented, providing opinions on this table, such as what fields to add, how accurate the data is, etc., thus establishing a channel for information collection and feedback.

Generated data services

If it is a data service API generated based on a table, the corresponding API will be displayed directly. If it is based on SQL, it can also reflect in which data service API logic this table is located.

<<: Death of an Algorithm Engineer

>>: User retention analysis: improve user stickiness and increase user life cycle value

What is the Amazon Logistics New Product Warehouse Promotion Program? What does it include?

Data Operations | The first step to using data—finding data

What is the Amazon Logistics New Product Warehouse Promotion Program? What does it include?

Is it easy to run a cross-border e-commerce platform in Africa? What is it suitable for?

These 27 types of content on Video Account will be restricted, so don’t post them anymore!

From category innovation to long-term brand success, new consumption still needs 4 steps

How do new Amazon sellers attract traffic? The most effective promotion method

Who is offended by the paid input method of Baidu?

Going to the big fair in a short video: The end of the universe is back to the village

Alibaba International Station launches "Plan S"

2024 Spring Festival Homecoming, Xiaohongshu Sponsoring Spring Festival Gala, Competition in the Milk Tea Industry

Is 30 orders a month too few for a new Amazon store? How to place orders quickly?

Recommend

Xiaoguan Tea: The era of making quick money is over.

How to use Amazon advertising discount coupons? Instructions for use

What should I do if Amazon has low inventory and no orders? What is the impact of out-of-stock?

How did JD.com perform in the assignment of 10 billion yuan in subsidies?

The consumer market is experiencing a trend of “exquisite mass merchandising”

Hot sale of 190,000+, Xiaohongshu business analysis of Shengcaoquan shampoo

Overseas short video marketing: How to calculate audience engagement rate?

How to start a Shopee store? What are the tips?

Keywords for automotive marketing in 2024: 3Cization, competition among big players, and tighter budgets

When is Amazon Black Friday? When is Amazon's best discount?

What does Amazon Black Friday mean? What to do?

喜茶的「我佛持杯」如何爆火出圈？

Build a user operation knowledge system from "points, lines and surfaces"

Can I withdraw my funds if Amazon KYC audit fails? How to file a KYC appeal?

How did Zara escape the price war?