The four models of causal inference really work!

The four models of causal inference really work!

Among the many challenges of data analysis, causal inference has always been a complex and delicate issue. How to accurately identify the key factors that affect the results from massive amounts of data? I hope the four methods shared in this article can help you.

Causal inference can be said to be one of the most difficult problems in the field of data analysis, and there is no final conclusion after many years of debate.

Students are often asked: "What is the cause of this problem?" Everyone finds it difficult to analyze, so today we will explain it systematically.

1. Disassembly method

The most common method used to find causal relationships is the decomposition method.

Break down a result indicator from multiple angles and find out the reasons that affect it.

For example: Yesterday there were 4 promotion channels, and a total of 100 customers were obtained. Today, only 80 customers were obtained. The question is why the number of customers obtained has decreased.

Disassembly method (as shown above):

1. Breaking down the total number of customers by four channels, we found that channel A had the least customers.

Conclusion 1: Because there are fewer A channels, the total number of customers acquired is less.

2. Disassemble the customer acquisition of channel A into three steps according to the customer acquisition process, namely display page-landing page-conversion. It is found that the conversion link is missing.

Conclusion: Because there was a problem in the conversion process of channel A, the total number of customers acquired was less.

3. Summary: Because there is a problem in the conversion link of channel A, this is the reason for the low customer acquisition. It seems that the answer is perfect, and the reason has been found!

But this answer cannot withstand another question from the business department: Why did A convert poorly?

I didn't change the text?

The investment funds are not small, right?

There is only one day difference between the two, why is there such a big difference?

Why does only A get worse while the others remain the same?

I can't answer any of them...

The so-called disassembly method is essentially just to lock the location of the problem through segmentation. It cannot find the culprit. Therefore, it is often used to discover problems rather than explain them (as shown in the figure below).

2. Correlation Coefficient Method

There is a method of correlation analysis in statistics. And there is a formula that looks very complicated (as shown below):

Many students got excited as soon as they saw it!

So I brought the two indicators in to calculate the correlation coefficient, and also chatted about GPT everywhere:

  • Is the correlation coefficient of 0.99 considered high?
  • Is the correlation coefficient of 0.9 considered large?
  • Is the correlation coefficient of 0.8 considered large?
  • Is the correlation coefficient of 0.7 considered large?

Anyway, the correlation coefficient is large enough, so it is a correlation!

This time there is a complex formula to support it, which should be very scientific, right? In this way, it is easy to come up with the classic "Dragon Vein" in the field of statistics.

  1. China's GDP rises every year
  2. The tree in front of my house grows taller every year
  3. Bring in the two data and calculate the correlation coefficient to be 0.99
  4. So the tree in front of my house is the dragon vein of China!

Correlation analysis, regression analysis, and cluster analysis are not "analysis" in essence, but calculation.

Through calculation, the relationship between two or more columns of numbers is obtained. As for whether this relationship has any meaning, the calculation formula itself is not responsible for explaining it.

Therefore, when applied to reality, all kinds of strange results are often produced. All statistical methods have similar problems, which can only explain the relationship between the data itself, but not the relationship in reality.

Looking at it more fundamentally: Can all business behaviors and external factors be quantified?

Not at all.

For example, consumers' trust in a brand, the quality of product experience, and the feeling about copywriting are difficult to quantify into a stable and reliable indicator.

Therefore, using statistical methods, it is possible to screen and filter indicators on a large scale, but it is difficult to infer the true cause and effect.

3. Trend Analysis Method

Since complicated methods don’t work, is there a simple way?

have!

For example, based on the most basic feeling: since A will cause B, then if A occurs, B should occur, and when A ends, B will slowly end (or B will die). People have summarized the four principles of causal inference.

Four principles:

  1. The cause occurs before the effect
  2. After the cause occurs, the result occurs
  3. The effect lasts while the cause lasts
  4. When the cause disappears, the effect disappears

This kind of inference is in line with people's intuitive logic. More importantly, less data is needed! As long as there is an indicator trend, the chart can speak for itself. So it is very useful.

BUT, there is a big problem with this approach, which is that it is impossible to eliminate the mixed factors, and can only observe the factor with the greatest impact. It is even more impossible to see the deep factors hidden behind it.

For example, when observing external factors, we can only observe obvious factors such as weather and traffic restrictions; when observing internal factors, we can only observe factors such as price reduction. Other minor factors cannot be observed at all.

Therefore, this method is often used as a process of elimination to eliminate unreasonable excuses.

For example: "You say that bad weather means bad performance, but why do other companies still have good performance when the weather is bad?" As for what factors drive performance? I don't know, so we need to use other methods to analyze.

4. Control Variable Method

The best way to eliminate mixed factors is to conduct group testing, put the samples in sealed boxes, and then test the effects of each group.

For example, if I want to test the user response rate to different copywriting, in theory, I should use the same product, price, conversion position, select the same group of people, the same channel, and then start the test:

But there are also problems with the testing method:

  • It is difficult to find two groups of people that are exactly the same, completely eliminating any mixed factors.
  • It is difficult to exhaust all target user types, so the opinions of the same type of people may be obtained after repeated testing.
  • It is difficult to completely close the test environment, especially when testing hot topics such as big promotions and new products.
  • It is difficult to implement a completely differentiated plan while being legal and compliant. It involves price discrimination and deceiving consumers. The Anti-Monopoly Law and the Industry and Commerce Bureau are not there to do anything for free.
  • Consumers are always profit-seeking. They will find ways to break through the test barriers and finally choose the plan with the biggest discount.

The result is that this type of test is suitable for scenarios with instant feedback + closed information channels + personalized push.

Yes, it is similar to the scenario of taxi-hailing apps and short video apps. The feedback speed is a little slower, such as e-commerce platforms using big data to kill old customers, which can easily be discovered when consumers use different mobile phone numbers to log in and compare prices. In the end, they still buy the cheaper one...

5. Why common methods don’t work

In summary, we find that in the field of causal inference, there is almost no method that is completely reliable, including many classic statistical methods and scientific experimental methods. Why is this so?

Because in essence, business management is a social science problem, not a natural science problem. In the field of natural science, there are some basic principles such as physics, chemistry, and mathematics that support it. These principles are stable, scientific, and quantifiable. Therefore, through data statistics + scientific experiments, we can slowly discover the natural laws behind them. Social science problems are completely different! Social science problems themselves are influenced by multiple factors, which are easy to be manipulated and changed by people. They are emotional and impulsive. Therefore, in the field of social science, it is difficult to directly apply natural science methods to solve problems.

In addition, people working in companies all have positions, attitudes, and intentions. When they ask, "What caused this problem?" or "What caused this achievement?", their subconscious mind assumes that the credit is mine, so I have to take some of it, and the blame is someone else's, so I have to throw it away. Therefore, even if there is a reliable method, people may not be willing to use it, and even if there is a conclusion, people will find other excuses to prevaricate.

Therefore, when faced with causal inference, we must carefully distinguish the problem scenarios.

In short, the way to solve the problem is to combine the business scenario and analyze specific issues specifically.

<<:  Xiaohongshu Launch Review Guide

>>:  Riders should not become the "condiment" of short videos

Recommend

New cost thinking for brands

This article discusses the new cost thinking of br...

Do companies still need to conduct media opinion risk checks before March 15?

Every year, March 15 is a disaster for brands. Do ...

Can I change the site if Shopee fails the review? Is it useful?

Shopee is now a very popular platform. If many mer...

How do Shopee sellers cancel orders? How to reduce order cancellation rate?

In the process of running a Shopee store, every se...

618 e-commerce ads invade content communities

In today’s content communities, advertisements are...

How to associate ads with Shopee? What is the method?

When opening a store on the Shopee platform, promo...

What is the meaning and function of Amazon advertising portfolio?

Everyone will place Amazon ads. When placing ads, ...

What does Amazon KYC audit mean? What are the audit requirements?

Merchants who open stores on Amazon sometimes enco...

What does Amazon advertising optimization include?

After the product is listed on the Amazon store, b...

TikTok's survival story

The US e-commerce business has just been successfu...

What are the tips for setting up Amazon coupons? What are the uses of setting up?

The coupons on the Taobao platform correspond to c...