Talk about the application of "standard deviation" in mathematics

Talk about the application of "standard deviation" in mathematics

What many of us lack is not the theory of data analysis, but the ability to apply the theory in practical scenarios. This article uses simple language and examples from work and life to teach us how to judge the stability and reliability of data by calculating the standard deviation, determine whether there are outliers in the data, and optimize the data sampling plan.

I wonder if you have had similar experiences? Have you ever been criticized by your boss for not delivering the final business results in time because of the pursuit of high-end algorithms and tools? Have you ever given a comprehensive conclusion and suggestion based on a single point of data analysis? Have you ever given some conclusions of inverted cause and effect or "survivor bias" that led to a detour in the business?

Everyone will encounter these experiences more or less. Why do we make these mistakes? Because we lack some basic data analysis thinking. What many of us lack is not the theory of data analysis, but the ability to apply the theory in actual scenarios. Theory + actual scenario = methodology. How to put the seemingly floating theory into the actual work scenario requires easy-to-understand cases and vernacular language to convey it . No matter what stage or level you are at, we start from the most common cases in life and work, and use the most straightforward words to explain the theory clearly, so that you can truly master the basic thinking and principles of data analysis. This is also the original intention of writing this series of articles.

Because it is written in plain language, there are no obscure formulas or complicated procedures in this series of articles. I just hope to use plain language, combined with various examples from work and life, to teach you how to solve these problems from the perspective of data analysis and master some basic knowledge of data analysis . When we look at the same thing again, our thinking is different from before. We can interpret what is happening around us from the perspective of data and make your judgments with data thinking.

1. Give an example

As data analysts, standard deviation is one of the concepts we are most familiar with. It is one of the important indicators that describe the distribution form and degree of dispersion of data. In this article, I will analyze standard deviation from multiple perspectives, including its definition, function, application scenarios, etc., and explain its importance in data analysis through actual cases.

1. What is standard deviation?

The standard deviation is a statistic that measures the degree of variation of a set of data. In essence, it is used to describe the degree of dispersion of the data. The larger the standard deviation, the more dispersed the data; the smaller the standard deviation, the more concentrated the data . The standard deviation is the average of the distances between all the data in a sample or population and the mean. In simple terms, the standard deviation is a measure of the dispersion of a set of data relative to the mean.

2. The role of standard deviation

Standard deviation plays several important roles in data analysis:

2.1 Describing the shape of data distribution

The standard deviation can help us determine the distribution of data. When the standard deviation is small, the data is concentrated near the mean and the distribution is relatively concentrated; when the standard deviation is large, the data is more dispersed relative to the mean and the distribution is relatively scattered. Through the standard deviation, we can roughly understand the shape of the data and then choose the appropriate analysis method.

2.2 Measuring the Discreteness of Data

The standard deviation can measure the degree of dispersion of a set of data and thus determine the stability of the data . The smaller the standard deviation, the smaller the degree of dispersion of the data and the more stable the data changes; the larger the standard deviation, the greater the degree of dispersion of the data and the more unstable the data changes. Through the standard deviation, we can determine the stability of the data and thus determine the corresponding risk control strategy.

2.3 Relationship between standard deviation and mean

The standard deviation is closely related to the mean. When the data distribution is concentrated, the standard deviation is small and the mean is more accurate; when the data distribution is more dispersed, the standard deviation is large and the mean is less accurate. In data analysis, we need to consider the standard deviation and mean to judge the reliability and accuracy of the data .

2. Data Analysis Case

Case: Analysis of website user traffic Suppose an Internet company wants to analyze the user traffic of its website in order to determine the operation plan. The company first collects user traffic data for one month, a total of 30 days. We can determine the stability of user traffic by calculating the standard deviation.

First, we sort the user visits by date and then calculate the average value, as shown in the following table:

Average = (500 + 550 + 480 + … + 520) / 30 = 510

Next, we calculate the difference between the number of visits per day and the average and square it. This is shown in the following table:

We then divide the sum of the squared differences by the total number of days and take the square root of the result to get the standard deviation. This is shown below:

Standard Deviation = √(100 + 1600 + 900 + … + 100) / 30 = 31.62

By calculating the standard deviation, we can determine the stability of user traffic. If the standard deviation is small, it means that user traffic is relatively stable, and we can adopt a more stable operation plan; if the standard deviation is large, it means that user traffic fluctuates greatly, and we need to consider a more flexible operation plan.

3. Usage scenarios of standard deviation

1. Determine the reliability of the data

In the process of data analysis, we often need to judge the reliability of data. Standard deviation is one of the important indicators to judge whether the data is stable. If the standard deviation is small, it means that the data is relatively stable and we can use the data relatively safely. If the standard deviation is large, it means that the data fluctuates greatly and we need to consider the reliability of the data to avoid affecting the accuracy of the analysis results.

2. Determine whether the data is abnormal

In the process of data analysis, we also need to determine whether there are outliers in the data. If the value of a certain data is far greater than other data, it may be due to data entry errors or problems with the data itself. We can determine whether the data is abnormal by calculating the standard deviation. If the value of a certain data exceeds the standard deviation of 2-3 times the mean, we can judge it as an outlier .

3. Optimize data sampling scheme

When performing data analysis, we often need to sample data in order to quickly draw conclusions. However, sampling itself may also bring errors, so we need to optimize the sampling plan to reduce errors. The standard deviation can help us measure the size of the sampling error. If the standard deviation is small, it means that the sampling error is small, and we can use a smaller sample size to get a more accurate conclusion; if the standard deviation is large, it means that the sampling error is large, and we need to collect more sample data to reduce the error .

IV. Conclusion

Standard deviation is a very important indicator in data analysis. It can describe the distribution of data and measure the degree of data dispersion. It is closely related to the mean. In the process of data analysis, we can use the standard deviation to determine the stability and reliability of the data, determine whether there are outliers in the data, and optimize the data sampling plan. Therefore, we need to have a deep understanding of the concept and calculation method of standard deviation, and use it flexibly in practice to improve the accuracy and efficiency of data analysis.

Author: Data Analysis Planet

Source: WeChat public account "Data Analysis Planet" (ID: data-xingqiu)

<<:  Nezha Automobile IP development ideas: Good vision is not a "hammer" and does not need to be "super"

>>:  New consumer brands are targeting young people’s music festivals

Recommend

Our Xiaohongshu operation is resigning and starting a business

Nowadays, more and more people choose to join Xiao...

With one hit drama a year, has Liu Yifei become the favorite of advertisers?

Successful drama marketing is inseparable from the...

99% of Xiaohongshu merchants fall into these three misunderstandings

As a popular content seeding platform, Xiaohongshu...

Big companies don't need creativity

Innovation and creativity are issues that every ad...

Three Leaps in the Growth of Data Analysts

"How to quickly become a data analyst" i...

Is cross-border e-commerce a scam? What are the scams?

Cross-border e-commerce is very popular now. Many ...

My 8 favorite sentences in May!

This article focuses on the excellent copywriting ...

How can marketers master AI to avoid being eliminated?

AI technology has become a new tool to change exis...

What is the root cause of the obstacles to action?

The article summarizes two ways of thinking: defen...

Are Hequ houses worthless? Hema has moved into Pinduoduo

Recently, more and more Hema's self-operated p...

How can a WeChat public account have a chance to be recommended by the system?

This article will unveil the WeChat official accou...