Causal inference can be said to be one of the most difficult problems in the field of data analysis, and there is no final conclusion after many years of debate. Students are often asked: "What is the cause of this problem?" Everyone finds it difficult to analyze, so today we will explain it systematically. 1. Disassembly methodThe most common method used to find causal relationships is the decomposition method. Break down a result indicator from multiple angles and find out the reasons that affect it. For example: Yesterday there were 4 promotion channels, and a total of 100 customers were obtained. Today, only 80 customers were obtained. The question is why the number of customers obtained has decreased. Disassembly method (as shown above): 1. Breaking down the total number of customers by four channels, we found that channel A had the least customers. Conclusion 1: Because there are fewer A channels, the total number of customers acquired is less. 2. Disassemble the customer acquisition of channel A into three steps according to the customer acquisition process, namely display page-landing page-conversion. It is found that the conversion link is missing. Conclusion: Because there was a problem in the conversion process of channel A, the total number of customers acquired was less. 3. Summary: Because there is a problem in the conversion link of channel A, this is the reason for the low customer acquisition. It seems that the answer is perfect, and the reason has been found! But this answer cannot withstand another question from the business department: Why did A convert poorly? I didn't change the text? The investment funds are not small, right? There is only one day difference between the two, why is there such a big difference? Why does only A get worse while the others remain the same? I can't answer any of them... The so-called disassembly method is essentially just to lock the location of the problem through segmentation. It cannot find the culprit. Therefore, it is often used to discover problems rather than explain them (as shown in the figure below). 2. Correlation Coefficient MethodThere is a method of correlation analysis in statistics. And there is a formula that looks very complicated (as shown below): Many students got excited as soon as they saw it! So I brought the two indicators in to calculate the correlation coefficient, and also chatted about GPT everywhere:
Anyway, the correlation coefficient is large enough, so it is a correlation! This time there is a complex formula to support it, which should be very scientific, right? In this way, it is easy to come up with the classic "Dragon Vein" in the field of statistics.
Correlation analysis, regression analysis, and cluster analysis are not "analysis" in essence, but calculation. Through calculation, the relationship between two or more columns of numbers is obtained. As for whether this relationship has any meaning, the calculation formula itself is not responsible for explaining it. Therefore, when applied to reality, all kinds of strange results are often produced. All statistical methods have similar problems, which can only explain the relationship between the data itself, but not the relationship in reality. Looking at it more fundamentally: Can all business behaviors and external factors be quantified? Not at all. For example, consumers' trust in a brand, the quality of product experience, and the feeling about copywriting are difficult to quantify into a stable and reliable indicator. Therefore, using statistical methods, it is possible to screen and filter indicators on a large scale, but it is difficult to infer the true cause and effect. 3. Trend Analysis MethodSince complicated methods don’t work, is there a simple way? have! For example, based on the most basic feeling: since A will cause B, then if A occurs, B should occur, and when A ends, B will slowly end (or B will die). People have summarized the four principles of causal inference. Four principles:
This kind of inference is in line with people's intuitive logic. More importantly, less data is needed! As long as there is an indicator trend, the chart can speak for itself. So it is very useful. BUT, there is a big problem with this approach, which is that it is impossible to eliminate the mixed factors, and can only observe the factor with the greatest impact. It is even more impossible to see the deep factors hidden behind it. For example, when observing external factors, we can only observe obvious factors such as weather and traffic restrictions; when observing internal factors, we can only observe factors such as price reduction. Other minor factors cannot be observed at all. Therefore, this method is often used as a process of elimination to eliminate unreasonable excuses. For example: "You say that bad weather means bad performance, but why do other companies still have good performance when the weather is bad?" As for what factors drive performance? I don't know, so we need to use other methods to analyze. 4. Control Variable MethodThe best way to eliminate mixed factors is to conduct group testing, put the samples in sealed boxes, and then test the effects of each group. For example, if I want to test the user response rate to different copywriting, in theory, I should use the same product, price, conversion position, select the same group of people, the same channel, and then start the test: But there are also problems with the testing method:
The result is that this type of test is suitable for scenarios with instant feedback + closed information channels + personalized push. Yes, it is similar to the scenario of taxi-hailing apps and short video apps. The feedback speed is a little slower, such as e-commerce platforms using big data to kill old customers, which can easily be discovered when consumers use different mobile phone numbers to log in and compare prices. In the end, they still buy the cheaper one... 5. Why common methods don’t workIn summary, we find that in the field of causal inference, there is almost no method that is completely reliable, including many classic statistical methods and scientific experimental methods. Why is this so? Because in essence, business management is a social science problem, not a natural science problem. In the field of natural science, there are some basic principles such as physics, chemistry, and mathematics that support it. These principles are stable, scientific, and quantifiable. Therefore, through data statistics + scientific experiments, we can slowly discover the natural laws behind them. Social science problems are completely different! Social science problems themselves are influenced by multiple factors, which are easy to be manipulated and changed by people. They are emotional and impulsive. Therefore, in the field of social science, it is difficult to directly apply natural science methods to solve problems. In addition, people working in companies all have positions, attitudes, and intentions. When they ask, "What caused this problem?" or "What caused this achievement?", their subconscious mind assumes that the credit is mine, so I have to take some of it, and the blame is someone else's, so I have to throw it away. Therefore, even if there is a reliable method, people may not be willing to use it, and even if there is a conclusion, people will find other excuses to prevaricate. Therefore, when faced with causal inference, we must carefully distinguish the problem scenarios. In short, the way to solve the problem is to combine the business scenario and analyze specific issues specifically. |
<<: Xiaohongshu Launch Review Guide
>>: Riders should not become the "condiment" of short videos
This article discusses the new cost thinking of br...
Every year, March 15 is a disaster for brands. Do ...
Cross-border logistics is a very critical part of ...
Shopee is now a very popular platform. If many mer...
Amazon merchants may be closed down by the platfor...
In the process of running a Shopee store, every se...
In today’s content communities, advertisements are...
When opening a store on the Shopee platform, promo...
Everyone will place Amazon ads. When placing ads, ...
Merchants who open stores on Amazon sometimes enco...
In order to get more orders and traffic for their ...
After the product is listed on the Amazon store, b...
The US e-commerce business has just been successfu...
The coupons on the Taobao platform correspond to c...
This article introduces the internet celebrity imi...