Changwen dissects classic papers and unveils a Facebook ad sorting model!
The following article is from TechFlow , author Liang Tang
Focus on the field of large algorithms, starting from the foundation of the hard core
Today we're dissecting a classic paper: Practial Lessons from Predicting Clicks on Ads at Facebook. As can be seen from the name of this paper, the author of this paper is Facebook's advertising team。 This is a combination of GBDT and LR model application in ad click-through rate prediction, although it has been several years, but the method is still not completely outdated, there are still some small companies still in use.
This paper is very, very classic, so to speakRecommended, must-read articles in the advertising fieldIt's not too common sense in the industry. The quality of this article is very high, the content is also relatively basic, very suitable as everyone's entry paper.
This article is a little long, it is recommended to look first.
The beginning of the paper gives a brief overview of the status of advertising in the Internet industry at the time and the size of Facebook at the time, when Facebook had 750 million daily active users and more than one million active advertisers, so the importance of choosing the right and effective ads for Facebook to serve to its users is enormous. On top of this, Facebook's innovative approach to combining GBDT with a logical regression model yields more than 3% in real-world data scenarios.
In 2007, Google and Yahoo proposed an online bidding ad-fee mechanism, but Facebook and search engine scenarios are different, in which users have a clear search intent. The engine filters ads based on the user's search intent, so the candidate's ad set won't be very large. But Facebook doesn't have such a strong intention message, so Facebook has a much larger number of adstherefore, the pressure and requirements on the system are also higher.
But this article doesn't talk about system-related content.Focus only on the last part of the sort model。 It's not said in the paper, but we can see that ads in search engines like Google and Yahoo are search ads, and those ads on Facebook are recommended ads. The biggest difference between the latter and the former is that the logic of recall ads is different, somewhat similar to the difference between a referral system and a search system.
At the heart of this is the user's intent, and while the user has no strong intentions when they log on to Facebook, we can extract some weak intentions based on the user's previous browsing behavior and habits. For example, the user stays on which kind of goods for the longest time, the most clicks on what kind of content, and similar to the collaborative filtering of the user's behavior abstracted into vectors. In fact, there is a lot of content, but also very valuable, it can be seen that Facebook in writing papers are left a hand.
After saying the nonsense we look at the specific practice, the specific practice many students may have heard that is the GBDT-LR practice. It seems like a word is over, but there are a lot of details. For example, why use GBDT and why does GBDT work? What are the mechanisms in force here? What is written in the paper is only superficial, and the thinking and analysis of these details is the key. Because the practice in paper is not universal, but the implications are often universal.
First of all, model evaluation, paper provides two new methods of evaluating models. One is Normalized EntropyThe other is Calibration。 Let's start with the model evaluation.
This indicator is quite commonly used in real-world scenarios and is often seen in the code and paper of various gods. Direct translation is the meaning of normalized entropy, the meaning has changed a little, can be understoodCross entropy after normalization。 It is calculated by the ratio of the cross entropy average of the sample to the cross entropy of the background CTR.
Background CTR refers to the experienced CTR that trains sample sets and can be understood as an average click-through rate. But here's to note that it's not the ratio of positive and negative samples. Because we do it before we train the modelSampling., for example, if sampled on a plus-minus sample ratio of 1:3, the background CTR here should be set as a ratio before sampling. Let's assume that the ratio is p, so we can write the formula for NE:
4, the combination of models
The real-time nature of the data
1, the time window
1, behavior characteristics or context features
2, the importance of analysis
More great recommendations ☞To fake the truth, more blockchain landing applications are coming ☞What do you think is the highest state of a programmer? Daily anecdotes ☞ ai and 5G era, what's the next wind for the real-time Internet? ☞8000 words 32 pictures Understand transactions, isolation levels, blocking, deadlocks ☞Photo Neural networks erupted rapidly, and the latest developments are here ☞At least 75 exchanges have closed this year, and nearly half have not said why
point to share Like it Dots are watching
Go to "Discovery" - "Take a look" browse "Friends are watching"
sent to have a look