Why is machine learning widely used in Google's advertising system and rarely in search sorting?
Translator: Wang Xiaokeko
(This article is a Quora translation, the original address can be found at the end of the text)
Many of my friends at Google have told me that their advertising systems are mostly machine learning-based, while search sorting is based on functions written by people based on intuition (some modules are machine learning-based).
What makes this difference?
Edmond Lau, former Google Search Quality Engineer
When I was at Google, I knew that Amit Singhal, the head of Google's core sorting team, had a philosophical objection to the use of machine learning in search sorting. I think there are two main reasons behind it:
1. In machine learning systems, it is difficult to figure out why one search result is higher than the other. The reasons for a particular strategy are very elusive. Many machine learning algorithms are black boxes, at best telling some weights and models, they are difficult to express the reasons for a specific strategy;
2. In some scenarios, even if a person has successfully identified the factors on which, making one result higher than the order of the other, it is still difficult to build a machine learning system to fit the weight of these factors in a separate scenario. Signals and features applied to a machine learning system can only indirectly affect the weighting information of the output layer, and this lack of direct control means that even if one person can clearly explain why one search result is better than the other, it is still not possible to apply such human intervention directly to a machine learning system.
The rule-based scoring system is still complex, but allows engineers to directly adjust the sort weights in different scenarios. As Can be seen from Google's leadership in web search, this choice of strategy retains the explanatory and controllable nature of the results, making Google's search results quality improvement work iteratively fast and significantly improved, and their Team released 450 improvements in 2008, a number that appears to be growing.
The ordering of ads tends to be an optimization problem, compared to the two specific search results, the quality of the two ads is very difficult to compare. In contrast, the results of two pages corresponding to a search sentence are significantly different and can be identified by manual scoring. An ad with only three or four lines in a search engine is often similar to a user. Users can often easily identify a bad ad, but it's hard to tell which of the two ads makes sense.
Brands vary, small text differences, user behavior tracking, these people are difficult to perceive things, but easy to be identified by the machine, they are more important in the advertising scene. In addition, different advertisers have different budgets and bids, which makes ad sorting more like a revenue optimization issue than a quality optimization issue. Because it's hard to figure out how an ad sorting system works better through a summary of experience. Explanatory, controllability is important in page sorting, but much less important in ad search. So machine learning has become a very good choice.
Jackie Bavaro, Google PM for 3 years
Edmond Lau's answer was great, and I'd like to add some more important information.
I was in Google's search group from 2008 to 2010, and many groups changed machine learning-based systems to rule-based systems. In other words, Google has tried a lot of machine learning methods, but when they realized that adopting a rule-based approach would result in faster quality improvements, they decided to change direction. This is by no means a bias, and this is the conclusion that many search groups have tried to come to.
I was a product manager for pictures, videos, and geographic searches, and these three groups were dedicated to providing the best results to users when data belonged to these three types. I often look at the search results for randomly selected search terms and think, "Do they contain the results that the user wants?" If not, how can we do better? "When we ask this question, we can often think further about what elements can help us (pick out good results). The image you want to show is exactly what Google is trying to present for you.
Some of the answers are classic, but the other important part is the difference between the two systems in terms of goals, size, and users.
The users of the ad system are advertisers and Google's sales department, and if the machine learning system doesn't work, these advertisers will be very upset, so Google will make less money. But relatively speaking, this is tolerable. The system has an objective goal so that the machine learning system can optimize it directly. And the search space for ads is relatively very, very small (less long tails).
The search sorting system has a very subjective goal - the user experience. CTR, search volume, etc., are difficult to describe this goal, especially when the search sentence is a long tail. Although some strategies can be found automatically, there are still a large number of strategies that need to be based on people's subjective perceptions.
Sameer Gupta, working on itSuggest Bio
Current machine learning algorithms perform well in "general situations", but are weak for "exceptional circumstances", while search engine evaluation metrics, such as accuracy, recall rates, RMSEs, etc., describe only "general situations".
That is, machine learning methods are very easy to fit the data you already have, but it can cause catastrophic problems when you don't see the data.
Google still operates the search results manually and does not fully switch to machine learning, Peter-Norvig gives two reasons:
Human experts believe that they can design better algorithms than machine learning models;
2. The second reason is very interesting, Google search team worried that artificial intelligence models in those different from the training data, the unsealed data, may lead to disastrous results (very poor BadCase), they believe that the manually built models will be quite a degree to avoid these problems.
Go to "Discovery" - "Take a look" browse "Friends are watching"
sent to have a look