To do data analysis, you need data to analyze. What if the data is not true? What if the data is artificially distorted? What if the data is artificially distorted and you are required to accept it? Let's discuss this topic today. The following are the nine most common methods. Keep them in mind. You will encounter them in year-end summaries, annual plans, activity evaluations, etc. Knowing them in advance will help you deal with them early. Rank 1: False dataThe business side deliberately falsifies, misreports, or fails to report data, resulting in missing basic data and frequent errors. This situation was very common in the days when paper sheets were used. However, with the popularization of data systems, this problem has become less and less common. If there are still scenarios where paper sheets are used, such as user paper application forms, questionnaires, etc., this problem will still exist. The solution is also very simple: use WeChat card wallet! In what era do you still need to write paper sheets to register for membership? Rank 2: Manually Changing NumbersSee also: The system is dead, but people are alive. To solve this problem, we can only strengthen the assessment and severely punish those who violate the rules. These operations are very regular and highly bound to the behavior of specific people, and can be identified through analysis. Stage 3: Modify caliberWhat should I do if the data is not good? Just change the statistical caliber! In essence, data indicators are designed for the convenience of calculation, and the business side can change them as they wish. However, changing the caliber will lead to inconsistent data before and after, which is a big problem. Changing only the statistical caliber without changing the indicator name is a big problem. So it is okay to change the caliber, but the previous data reports should be updated in one go according to the new caliber. Stage 4: Control the rhythmrefer to: Note that unlike Rank 2, which is a bad practice of falsifying data to deceive the company, Rank 4 does not falsify data in essence, but rather exploits the rules of sales, operations, and rewards to maximize personal interests. In fact, everyone will do this. This is an unspoken rule of the business. We often say that "clear water will lead to no fish". You can't ask a person not to think about himself. If you really manage too strictly, the front-line business will definitely quit and run away. As a data analyst, we need to be able to identify these specific problems and keep them within an acceptable range. If the problems are too widespread, we can then look at how to promote optimization and adjustment at the institutional level (as shown in the figure below). Note that starting from this question, we enter the intermediate level of difficulty, because the following questions will require higher and higher analytical ability of data analysts. For example, distinguishing which are reasonable hidden rules and which are malicious changes requires a certain amount of analytical experience. Rank 5: Random RhythmWhen doing data analysis, you must have often heard this kind of question:
However, you have worked hard to dig up a bunch of data and found that there is nothing wrong with it? Congratulations, you have been fooled. The "decline", "bad" and "unsatisfactory" mentioned by the business are likely to be false propositions! Be careful, the business side inadvertently distorts the data judgment, and many new data analysts will step into it directly. Many newcomers do not ask if it is true first, but directly study why. They break the data into pieces by user group, registration time, product type, etc., and finally can't interpret it. After two days, they come back to find that the problem no longer exists. To deal with this kind of problems, remember: When you encounter issues like “size, quantity, height, speed, quality”, ask about the standards first. When you hear a specific question, first ask how the person knows the question. When you hear people discussing data, ask about the original data source first. However, the difficulty lies in the fact that these three "questions first" go against human instinct. People are most accustomed to thinking by just hearing rumors, so these three seemingly simple questions require a lot of repeated and intensive training to learn, otherwise you will often be confused. Level 6: SatisfactionSatisfaction here refers to those indicators that are often mentioned by business departments but are difficult to record directly using the system. Similar indicators include satisfaction, brand influence, product strength, industry status, NPS, etc. Due to the lack of direct records, many unexpected problems will occur. Level 7: Natural Growth RateFinally, if you want to manipulate the data, just keep changing the "natural growth rate" data. If it doesn't work, you can change it to a negative number (as shown below). The best way to deal with this approach is to ignore it. The actual number of participants in the activity is easy to calculate. If you want to determine the natural growth rate, you can agree on it in advance to avoid nagging afterwards. Rank 8: Reference GroupThe reference group and the natural growth rate are a pair of distressed brothers. They are both easily replaced and modified at will under the guise of "scientific evaluation" until the business is satisfied. If the business side wants to argue, they will keep saying that the reference group you set is unscientific, that the samples are all exceptions, not random enough, and not representative. In fact, as long as it is not a full statistics, it can always be labeled as "unscientific, non-random, and non-representative" (if you really do a full statistics, they will say: natural growth has not been eliminated, blah blah, anyway it always makes sense). The best way to deal with it is: don't respond. As long as the grouping method is clear in advance, accept the result as it comes out. There is no need to complain. Setting up a reference group can only be used when doing precise push through limited channels. Setting up a reference group is just a means of testing in AB test. AB test itself is just one of the testing tools, not an authoritative rule. Doesn't the business department have any judgment ability without AB test? Where is your business ability? What's the use of you! So cool, I can finally scold back! Level 9: Comprehensive AssessmentWhen evaluating a problem, a single indicator is the clearest, but people prefer to use composite indicators to appear to be "comprehensive". When there are too many indicators, it is necessary to design weight distribution. Then there is the trick. If someone is not satisfied with the evaluation result, they will say: "This weight is unreasonable and cannot reflect the actual business", and then force you to change it. The final result, of course, must satisfy people, so they will say that your analysis is in-depth and reasonable. Otherwise, they will continue to worry about it. The most outrageous thing is that I have seen a business leader handwrite a branch company score ranking, and then tell me: You use big data artificial intelligence methods to comprehensively calculate this ranking by combining various indicators, and you have done it perfectly, and we will sign a contract with you next year... What can I say at this point? Of course, it's OK. Signing the contract is important, scientificity is shit. It's just a change of weights. It's as if nothing happened during graduate school. Solution: Score each indicator separately and let the leader decide the weights of multiple indicators. Abandon the neural network method with low business explanation. If there is disagreement in the business, tell the leader how to determine the data after the fight. The above three are high-level methods of manipulating data. The reason why they are high-level is that natural growth rate, reference group, and comprehensive evaluation are often discussed topics in data analysis. Many new data analysts who have never suffered hardships love to tinker with these things, thinking that the more complicated they are, the more advanced they are. The final result is that the more complicated the work is, the harder it is to explain the business meaning clearly, and the more doubts are raised by the business side. In the end, they are led by the nose and become: "If the results are beneficial to the business, it is objective and comprehensive, and if the results are not beneficial to the business, it is a lack of in-depth analysis." This is asking for trouble. Summary We will find that different departments use different methods. For frontline departments such as sales, promotion, and supply chain, data is the direct product of their work, so it is easiest to tamper with the data source. Operations, planning, product and other departments like to use indicators that are difficult to quantify, like to talk about "far-reaching impact", like to set a bunch of "natural growth rates" and "reference group users" and then eliminate them, tampering with judgments based on data. Why don’t sales, promotion, and supply chain have toss and turn? Because they are faced with the solid sales and money collection, promotion recruitment, and warehouse shipment issues. It is very clear that each person gets one cent, and there is no quibbling. But when operations, planning, and products are all working together on one thing, they always want to highlight their own contributions. So the endless arguments began. "Excluding natural growth, how much benefit did my activities bring?" "Excluding natural growth and activity-driven growth, how much benefit did my product revision bring?" "Excluding natural growth, activity-driven growth, product revision, how much benefit did my copywriting bring?"... If we must compare the two hazards, tampering with the data source is definitely more harmful. If the data is false, then there is no point in analyzing it. Tampering with the data source indicates that the company's management is chaotic and the channel control is weak. What's interesting is that all functional departments at the headquarters hate this kind of weakness, so on this issue, the headquarters departments are often united in pointing their guns at the outside world. But when it comes to data judgment, it is often done by the top. The headquarters' operations, product, and planning staff arbitrarily change standards for their own selfish interests, which is very harmful to the realization of true data-driven development. Not daring to face the facts and using data to whitewash the situation, the final result is that the business departments themselves lose their judgment ability and return to the original state of making decisions based on their own intuition and leaving without a care in the world. This is what we do not want to see. The ideal state is that the data source is real and rich, the data judgment is simple and clear, and the data analysis is in-depth and three-dimensional. Focus more on finding causes, making predictions, and testing effects, so that better results can be output. |
How to further improve content creation efficiency...
Shopee is a cross-border e-commerce platform. I be...
In the e-commerce industry, Wish operation is a ke...
With the popularization of the Internet and the de...
Shopee now has sites in many countries around the ...
Are you still using Miaoya? MiaoYa, which once dom...
On the Internet, everyone who speaks up seems to b...
What to do if you are too outstanding? In this com...
This spring, "Kuang Da" unified the remo...
On the eBay platform, cross-promotion ads are an e...
The Amazon tracking code is equivalent to our dome...
Friends who do Amazon e-commerce should not forget...
Shopee announced in early September that Shopee of...
Recently, many anti-counterfeiting bloggers have e...
The domestic game "Black Myth: Wukong" w...