It’s amazing! It turns out that the RFM model can be used in this way.

It’s amazing! It turns out that the RFM model can be used in this way.

RFM is a very traditional data analysis model. How can we go beyond RFM and conduct in-depth research on user scenarios?

Many students have expressed their desire to learn about RFM, and today it is here. RFM is a very traditional data analysis model, and almost all articles mention it. However, there are also many misuses and misapplications of it in the market. Today we will talk about it systematically.

1. Basic Principles of RFM

RFM is the abbreviation of three words: Recency. When taking data, it is generally the interval between the last consumption record and the current time, such as 7 days, 30 days, or 90 days without visiting the store. Intuitively, if a user has not visited the store for a long time , there must be a problem and something needs to be done. Many companies' user wake-up mechanisms are formulated based on this.

The consumption frequency within a certain period of time (Frequency) is generally taken as the consumption frequency of users within a period of time. For example, how many months of consumption in a year, how many days of visiting a store in a month, etc. Intuitively, the higher the consumption frequency of users, the more loyal they are. Many companies' user incentive mechanisms are formulated based on this, and they want people to buy a second time after buying once.

The accumulated amount of consumption within a certain period of time (Monetary) . When obtaining the number, it is generally the amount of user consumption within a period of time. For example, how much consumption is there in a year. Intuitively, the more the user buys, the greater the value. Many companies' VIP mechanisms are based on this, such as silver cards with a minimum of 10,000 and gold cards with a maximum of 20,000.

Therefore, even if we look at these three dimensions separately, they are all very meaningful. Of course, we can also look at the three dimensions crosswise (as shown in the figure below).

Because RFM is related to time, many students will struggle with how to divide the time when collecting data. Strictly speaking, the more basic the business is, the higher the consumption frequency itself, the shorter the time should be. The most typical example is fresh food. People need to eat every day, and if they don't come for 7 days, there may be a problem. Ordinary fast-moving consumer goods retail may take 30 days, and clothing and department store retail may take 90 days.

Of course, more practices are to take it monthly . For example, R is taken monthly, and F and M are calculated within the last year. This is done simply because it is easier to understand.

RFM is essentially a method of finding a judgment standard using three classification dimensions. Through the combined calculation of the three dimensions, it can determine whether a user is good or bad, and then take corresponding measures.

The real meaning of RFM is that it is a method to infer user value from transaction data, so it is very feasible! You should know that the biggest bottleneck of data analysis is data collection, and as long as it is a normal enterprise, there must be transaction data.

Therefore, as long as the company has established a unified user ID authentication mechanism, it can associate the user ID with transaction data and use RFM to analyze users. This can be done even if there is no embedding point, no website, and no basic information. It is a convenient and easy-to-use tool.

Of course, all convenient and easy-to-use tools have some shortcomings, and the RFM model is no exception.

2. RFM’s biggest shortcoming

The biggest shortcoming of RFM is the unified authentication of user IDs. Don't underestimate these words, as they are very difficult to implement in many companies. For example, when you go to a supermarket, chain store, or store to buy something, the cashier will often mechanically ask: Do you have a membership card? If you answer no, she will let you go. As a result, 70%-90% of offline store orders cannot be associated with user IDs, which leads to a serious lack of user data. It is easy to misjudge user behavior by directly using RFM.

As for users holding multiple membership cards to take turns to get discounts, multiple users using one VIP card to get the biggest discount, and store clerks using their relatives' cards to get discounts on orders without IDs, these things are happening in an endless stream and are common in both physical and Internet companies.

So when you are doing the RFM model, if you really see 111 types of users, don’t be too happy too soon, there is probably something wrong . Nowadays, companies often operate on several platforms at the same time, such as Tmall, JD.com, their own WeChat malls, Youzan, etc., which makes unified certification more difficult. If you don’t plan well, you can easily fall into the endless subsidy pit.

3. The Deeper Problem of RFM

Even if unified user ID authentication is done, RFM still has a deeper problem.

Let’s review the three basic assumptions of the RFM model:

R: The longer a user stays, the greater the risk of churn.

F: The higher the user frequency, the more loyal they are

M: The more users buy, the more valuable it is

Let me ask you a question: Are these three assumptions valid? If we don't consider them in terms of specific industries, specific products, and specific activities, they seem to be valid. However, once we discuss them in detail, we will find that many scenarios do not meet these three assumptions. Therefore, simply talking about RFM without considering products and activities is prone to bugs .

R: The longer a user stays, the greater the risk of churn.

For seasonal consumption such as clothing, it is normal for users to have a gap of 2-3 months.

If it is a new product driven by mobile phones and tablets, the interval time basically follows the product update cycle.

If it is a large durable product such as furniture, housing, or car, R is meaningless because the user will only buy it twice in his lifetime.

If it is a prepaid payment and then swipe card model, R will not exist and needs to be replaced by verification data. Therefore, R does not necessarily mean that there is a risk of user loss, especially now that we have embedded data, user interaction behavior can better explain the problem.

F: The higher the user frequency, the more loyal they are

If user consumption is event-driven, such as events, holidays, birthdays, weekends...

If user consumption is driven by activities, such as buying when there is a discount...

If the user's consumption is fixed, for example, the amount of medicine purchased is 30 days...

The above situations will lead to the F value being not fixed, which may be randomly generated or manipulated. Many companies rigidly implement the RFM model and often set a fixed F value, such as encouraging users to buy 4 times, because data shows that users who buy more than 4 times are very loyal. The result is that users are forced to split orders, and finally the F value goes up, but the profit goes down.

M: The more users buy, the more valuable it is

What if users are looking for bargains and stock up when there are discounts?

What if the user buys a lot and is tired of them or has had enough of them?

What if the user buys durable goods and has to wait for more than ten or twenty years after the purchase? What if the user's consumption itself has a life cycle, such as maternal and child products or games, and has reached the end of the life cycle?

In many cases, if a user bought a lot in the past, it does not mean that he will buy more in the future. The two are not equal. Therefore, if you really see customers with 011, 001, or 101, don't rush to send coupons. The key is to figure out what the problem is.

In addition to the problems of individual dimensions, the three dimensions combined are also prone to problems. This is because the user structure of many companies is not a pyramid, but an Eiffel Tower: too many inactive users gather at the bottom, and most inactive users only have one order, or only log in a few times before leaving. Therefore, if RFM is really divided into eight categories, the proportion of users with 000 may be particularly high.

This means that the existing surviving users may be the result of survivor bias, and the current 111 is not the future of 000. We need to analyze in more depth why a large number of inactive users have accumulated, and even change the process from the root to solve the problem. If we really apply RFM mechanically, we may lead the business into a dead end.

4. Typical Abuse of RFM

There is nothing wrong with RFM itself. In the case of a lack of data (especially a lack of buried data), using RFM is much better than not using RFM. Each of the three dimensions of RFM is very useful. The overall structure of RFM is also suitable for evaluating the overall quality of user operations. What is wrong is to apply RFM mechanically without doing in-depth analysis. What is wrong is the brainless practice of shouting at customers who have made large orders and rushing to issue coupons when users do not buy. Blindly issuing coupons will not only seriously overdraw marketing costs, but will also cultivate more users who take advantage of the situation, disrupting normal operations, just to make the RFM values ​​look good.

In particular, what online articles and online courses like to teach the most is: according to RFM, each segment is split into 5 segments, divided into 5*5*5=125 categories, and then K-means clustering is used to cluster them into 5-8 categories. This approach is completely wrong.

Firstly, after K-means clustering, even the original advantage of RFM, which is clear in meaning, is gone. It is very confusing to interpret these 8 categories.

Secondly, this does not take into account the rolling update of data. After a week or a month, the RFM index has changed! Do you still cluster all users every day?

Third, k-means clustering is not a stable classification method, and unsupervised classification is more suitable for exploratory analysis. A week later, a user was divided into two completely different categories, which would make marketing and operations planning very crazy when implementing policies: it’s different every day, what should we promote!

Essentially, because online courses and online articles all provide a perfectly cleaned static data table, there is no need to cooperate with other departments, and there is no need to consider continuous scenarios, so a model + algorithm approach was chosen. Well, whether it can be used is not important, the most important thing is to show that you are awesome!

5. How to make RFM more useful

Comprehensively analyzing the scenarios where RFM fails, we can see that seasonality, product characteristics, promotional activities, holiday events, and user life cycle all have an impact on user behavior. Therefore, it is critical to go beyond RFM and conduct in-depth research on user scenarios.

Note that studying these five elements is not as difficult as you might think. For example, many products have intrinsic correlations, and you can figure it out as long as you are familiar with the business. For example, seasonality and holiday events are essentially related to time. Therefore, by labeling the time when users log in and consume, you can analyze them (as shown below). The same is true for promotional activities. Promotional activities can be directly identified from orders, so it is also easy to label users as promotion-sensitive.

The user life cycle requires data collection, and it is enough to collect the most critical data. The most typical user life cycle is the practice of the maternal and infant industry. Companies will definitely collect the most critical data: how many weeks of pregnancy. Fathers may not know this data, but mothers must know it very well. Knowing the starting point, the subsequent calculation can be made. Similar examples include drugstore chains doing chronic disease management, K12 education, etc.

VI. Summary

Any model has its historical background, data basis, and scope of use. Not all models aim to be accurate. Simplicity, ease of use, and hassle-free are often considered.

Author: Down-to-earth Teacher Chen

WeChat public account: Down-to-earth Teacher Chen

<<:  Young people who don't want to be human have made the monkey "Maluo" a deity

>>:  The number of fans increased by 6.5 million in three months, and the "grandma version" of Li Ziqi became popular, and a meal made netizens cry

Recommend

E-commerce giants suffer from "traffic anxiety"

Why do e-commerce giants suffer from "traffic...

Is Amazon's zero-based training real? Is it reliable?

Many friends want to do Amazon cross-border e-comm...

How to choose products for e-commerce

In the field of e-commerce, product selection is a...

Is it easy to do business on Amazon Japan? What changes have taken place?

If you want to open a store on Amazon's cross-...

What does overseas marketing do? What about doing overseas sales?

It is not easy to do e-commerce. If there is one t...

Paris Olympics: A training ground for global sports brands

The Paris Olympics is not only an arena for athlet...