Five characteristics of data, three problems, and one kind of arrogance

Five characteristics of data, three problems, and one kind of arrogance

Data is an objective existence and a factual description of things. It can be obtained through measurement, recording, discovery, etc. Data has five characteristics: infinity, easy duplication, heterogeneity, perishability, and originality. Data and information are of vital importance, but there are three problems that plague the healthy and orderly development of data, namely data rights confirmation, data transactions, and data elements. Let's take a look at the specific analysis of the article!

"Data is the new oil" (Clive Humby, 2006). If you have to pick one thing, data is most like oil. Both are important strategic resources and the driving force of the world. But data is just data, it is not anything else.

1. Five characteristics

Data is an objective existence and a factual description of things, which can be obtained through measurement, recording, discovery, etc. Data has five characteristics: infinity, easy duplication, heterogeneity, perishability and originality.

1. Infinity

Unlike physical objects, data will not be exhausted due to use, but will be generated due to use, will be constantly created, and will increase in number. "Data will become the most basic objective product. No matter what we do, we are generating data" (Paul Sonderegger, 2017). According to DASA R&T's "Emerging Technology Trends 2016-2045" , the amount of new data generated globally doubles approximately every two years. This can be called Moore's Law of Big Data, and the data explosion is inevitable.

2. Ease of replication

Data can be quickly copied at almost zero cost, can be used by multiple people at the same time, can be recycled multiple times, and one person's use does not exclude or hinder others from using it. There is no direct conflict of interest between different people in the use of data. The easy copyability makes data non-competitive and non-exclusive to a certain extent, but data is not a public good. There are public data, corporate data and personal data.

3. Heterogeneity

In "The Hidden", Xie Ruolin said: "Now there are two gold bars here, tell me which one is noble and which one is dirty?" This illustrates a truth - gold bars are homogeneous, and the two gold bars have the same value. Homogeneity is ubiquitous, such as goods shipped out of the factory, oil, electricity and other energy sources. However, data is heterogeneous. The value contained in one bit of data is completely different from that contained in another bit of data, and the value of the same data to different people is also different. As Wang Qinmin (2023) said: "The value of data varies depending on the user, the application scenario, and the professional data quality standards."

4. Perishability

Data is a perishable commodity that depreciates rapidly over time. According to IBM (2015), 60% of unstructured data loses its true value within milliseconds. This means that the value of data is largely reflected in its timeliness. More than half of the data is no longer valuable the moment it is generated. We can call this the "one-second law." Even less data can be analyzed and processed to produce actual benefits. 90% of the world's data has never been analyzed and used (IBM, 2015; DASA R&T, 2016). Less than 2% of the data created or copied in 2020 will be saved and retained until 2021 (Source: IDC).

5. Originality

Data is raw and meaningless in itself. Only by processing and analyzing it can it be transformed into useful information for people. If data is the new oil, then analysis is the internal combustion engine. Information is the product of data extraction; information is processed by the human brain to form knowledge, which is subjective; data, information and knowledge are historical, while wisdom is about the future and is the ability of people to use knowledge to make decisions and judgments.

Regarding the relationship between data, information, knowledge and wisdom, Professor Zeleny of Fordham University (1987) proposed the DIKW pyramid model (as shown below), from the bottom to the top:

  • Data: Know nothing, know nothing;
  • Information: Know what, know what it is;
  • Knowledge: Know how, know how to do it;
  • Wisdom: Know why.

Figure DIKW pyramid model

Two or three difficult questions

The gap between a person and the people around him lies mainly in the different abilities to grasp, understand and use information. Data and information are crucial. However, there are three problems that plague the healthy and orderly development of data, namely data rights confirmation, data transactions and data elements. We must face up to challenges, dare to act, and solve all difficulties with great courage and wisdom.

1. Data ownership confirmation

Guan Yu was in Cao's camp but his heart was with Han. It is easier for us to judge the ownership of the physical body; but the heart and soul are ethereal, uncertain, secretive and diverse, and sometimes belong to multiple subjects at the same time. Data is similar. It is difficult for people to clearly determine who it belongs to, and it is difficult to effectively physically divide it and reasonably allocate rights. The complexity of rights confirmation is related to the characteristics of the data itself and the diversity of the right holders. The data chain involves multiple participants, none of whom are indispensable and cannot play a role alone, and they have different demands. In addition, the value density of data is low, and the benefits generated are difficult to clearly measure, which makes the cost of data rights confirmation extremely high.

2. Data Trading

Trading is a mutually beneficial behavior and the most spontaneous and positive activity in human society. Trading will only happen if both parties benefit from it. For data, trading is a difficult problem. The United Nations Conference on Trade and Development (2019) pointed out: "Data has important use (or abuse) value, but it does not have exchange value like most economic goods." Real transactions generally have clear prices and are repeatable and predictable. For example, a store repeatedly sells clearly marked milk tea to different consumers, and the utility obtained by consumers is predictable - thirst quenching, delicious, and social. Data heterogeneity, value is not easy to measure, pricing is difficult, expected utility is difficult to manage, and there is a risk of "free riding"... These are all problems to be faced in data trading.

3. Data elements

Production factors are the basic resources that people need to produce goods and services. They promote production, but they do not become part of products and services, nor do they change significantly due to the production process. Marshall, the founder of the neoclassical school, proposed the tetralogy of production factors in his famous book Principles of Economics (1890), namely land, labor, capital and entrepreneurship. Academician Mei Hong pointed out (2023): "It is China's first initiative to establish data as an important production factor." However, it is difficult to define data production factors in economics, and no influential and convincing results have been seen yet. Economists are urgently needed to step up their research.

3. Avoid Big Data Arrogance

When talking about classic cases of data mining, many people will think of "beer and diapers" and Google Flu Trends. In fact, the former is a story that appeared as early as 1992 and never really happened; the latter once predicted the arrival of the flu in advance, but it was closed long ago because of its low accuracy.

The importance of data is beyond doubt. People like to add a "big" before "data" to highlight the extraordinary. People often fall into the misunderstanding of "big data hubris". Data can solve many problems, but it has limitations, and it is difficult to predict mutations through data. A pig that has lived a peaceful life cannot predict the black swan of the Spring Festival through past data; the travel data of a carriage can enable people to get "a faster horse", but it cannot enable people to invent a car. Data is a competitive advantage for enterprises, but it is not omnipotent. A good APP will not be able to sit back and relax just because it has historical data. It is always challenged by innovators and can only "lead the trend for only a few years"; entrepreneurs can launch innovative products, gain users, and achieve success even if they have no data and no accumulation. From this perspective, it is not impossible to have no data.

In the era of big data, "correlation, not causality" is regarded as the guiding principle. "The key is human analysis and reasoning to find out why two things appear at the same time or successively. Only when the reasons are right can it be new knowledge or newly discovered laws. Correlation itself is not of much value" (Li Guojie, 2015). It is better to have no "numbers" than to believe in "numbers". We must comprehensively use scientific methods such as experimental observation, logical deduction, induction and refinement to explore the relationship and laws between things, so as to dig out valuable information and conclusions.

We value data not because the data itself is important, but because the spirit of seeking truth from facts and respecting the objective world and objective laws is important. Data is facts. As Academician Li Guojie (2015) said: "Attaching importance to data means emphasizing the scientific spirit of speaking with facts and rational thinking."

Author: Yan Deli; WeChat public account: "Tencent Research Institute (ID: cyberlawrc)"

<<:  What level of thinking ability is you at?

>>:  Brand and price determine life and death

Recommend

What does Amazon storage limit mean? Which products have storage limit?

The logistics model for opening a store on Amazon ...

60,000 followers, 160,000 orders!

This article introduces the case of selling goods ...

How to shape an entrepreneur’s personal IP?

This article delves into the strategies for entrep...

How to gain consumer insights? (2023 edition)

This article will share the core methods of unders...

5 Ways to Play Metaphysical Marketing

When metaphysics began to enter the young group, m...

Complete! [Business Analysis Model] Construction Guide

In data analysis and business optimization, buildi...

What are the overseas shopping websites? Website introduction

There are domestic shopping websites in China, and...

DeepSeek per person, from top to bottom

From lawyers, journalists to real estate agents, a...

What does a trade surplus mean? Is a trade surplus or a trade deficit better?

In foreign trade, if you do not understand the mea...

How to modify the wish price? How to modify the price of diamond-added products?

Price is a key competitive factor on e-commerce pl...

Brand No. 1: 24 business rules

To build a brand, you also need to understand busi...