Query efficiency prediction enhances server capacity and improves user satisfaction in a commercial search engine.

The time taken by a search engine to respond to a query varies with its current workload and the nature of the query. UofG researchers, Professor Iadh Ounis and Dr Craig Macdonald, developed a technique called query efficiency prediction, which estimates how long an unseen search query will take to be answered. The increased capacity allows any search engine to adjust its configuration appropriately to reduce response times. A major comercial search engine adopted UofG’s prediction technique to better direct resources to reduce query response times, increasing the querying capacity of their hundreds of thousands of servers by 50% (equivalent to a saving of USD96 million on capital server costs). Since 2014, this has enabled this search engine to deploy more sophisticated search algorithms that enhance the quality of search results. The outcome is improved user satisfaction for hundreds of millions of users worldwide and the corresponding increased advertising revenues.

Context and societal impact

Search engines enable access to information online, and with over 6 billion queries made each day worldwide; this represents a high-value activity for companies through the display of paid ads to the search users. Search engines deploy a number of search algorithms for each query, which aim to identify high-quality search results that will satisfy their users. The more that users are satisfied by the search engine’s results, the more likely they are to continue to use that search engine and to be exposed to more paid ads, resulting in enhanced advertising revenues for the search engine.

To address the high volume of queries received, a search engine’s infrastructure is designed to maintain response times within a service level agreement (SLA), e.g. answering 99% of queries within 100ms. Search engines deploy a large number of servers for answering queries, but are limited by the capacity (i.e. the maximum processing capability) of their servers. The higher the complexity of the search ranking algorithm, the better the results returned by the system; but at the expense of increased query response times. Hence, the deployable algorithms are often limited due to the need to ensure that the query meets the response time SLA. Yet the time taken for any given query to execute can vary, depending on the search engine's workload and the nature of the query (e.g. long vs. short queries).

Building on a long track record of publications on developing efficient search engines, UofG researchers Professor Iadh Ounis and Dr Craig Macdonald first proposed the novel notion of query efficiency prediction, which aims to predict accurately how long an unseen query will take to answer. Query efficiency prediction is useful because knowing precisely how long each query will take to answer allows either appropriate resources to be dedicated to the query, or to choose an appropriate search ranking strategy that meets the available time budget. Query efficiency prediction combines signals – such as the length of the query, or the popularity of the constituent terms of the query within the corpus, or how often they occur – using machine learning (such as gradient boosted regression trees) to obtain accurate response time predictions.

UofG researchers have since investigated several approaches in which query efficiency prediction could be leveraged to reduce the time taken for the search engine to process queries while enhancing the quality of the search engine's results. Their research has demonstrated that query efficiency prediction can be used to accurately predict which query server would be available next to execute a query, to minimise the time that a query is queued and not being processed. They have used query efficiency prediction to automatically reconfigure the search engine to trade-off some of the result quality for queries predicted to take too long to execute. More recently, they have used query efficiency prediction to decide which queries could be rewritten to be more effective without exceeding a response time deadline.

“This selective approach allows the response time of queries to be kept under the agreed Service Level Agreement (SLA) of 99% of queries answered within 100ms”.
Quote from the company

Following the presentation of Macdonald/Ounis’ concept of query efficiency prediction at SIGIR 2012, a commercial search engine picked up this work and integrated it into their production system to identify particularly long-running (slow) queries (so-called tail latencies). Indeed, if a search engine is slow in answering the queries, users become dissatisfied with the search engine, and often switch to an alternative search engine leading to a loss of users and advertising revenue for the engine. At the beneficiary commercial search engine, a two-second slowdown in search response times attributed to tail latencies was found to reduce revenue per user by 4.3%.

"[our research] would not have been possible without the research of the University of Glasgow into query efficiency prediction" .... and has "increase[d] the capacity of the servers in [their] data centre by 50%".
Quote from the company

The beneficiary search engine has achieved increased productivity through applying UofG’s query efficiency prediction to identify slow queries, which are then selected for parallel processing using multiple CPU cores. This selective parallelisation approach avoids wasting such resources in unnecessarily parallelising fast queries. 

The beneficiary commercial search engine has also grown its market share significantly, with high penetration in some geographical markets. At global scale, the beneficiary search engine's market penetration means that hundreds million users have benefitted from improved results with significant penetration to the UK, Hong Kong, Taiwan, and French markets. Thus, overall, UofG research has enhanced the performance of the beneficiary search engine, with the commercial and economics benefits that follow from the increased user satisfaction.

Integration of UofG research into the beneficiary search engine has also directly resulted in gains in server productivity, which have led to both economic and environmental impacts. The company has indicated that the efficiencies achieved by the application of the query efficiency prediction techniques has allowed them to "potentially reduce the number of server resources by one third" at the scale of "a few hundred thousand servers" while still attaining their SLA. For example, assuming that a third of ~200,000 servers of the company need not be purchased, at a cost of USD1,450/unit, this would equate to a capital saving of ~USD96 million to the company. Moreover, as data centres have significant power overheads, this would also lead to actual energy savings. As data centres are often located in areas of natural resources, such as hydroelectric energy sources, the additional green energy made available by a one-third reduction of ~200,000 servers would be 151GWh (including cooling overheads etc). Indeed, assuming a commercial energy cost of USD0.088 per kWh, this would equate to USD13.2 million per annum at September 2020 prices.