CS8803-AIA Proposal: A Greedy Algorithm for Web Advertising
Leo Singleton <lcs@cc.gatech.edu>
Spring, 2006

I. MOTIVATION

Internet advertising is a rapidly growing industry that is still in its infancy. Evolving from the link exchanges on the mid-1990s to the start-ups of the dot-com boom, Internet advertising is currently a $10 billion industry, quickly becoming a significant part of the total $250 billion advertising market [3]. In the past two years, Google has gained significant market share--nearly one third of the industry's revenue--by introducing its AdSense program, which provides keyword-targeted advertisements. This dramatically increases the value of Internet ads to advertisers, because it allows them to target their ads to a specific audience--just as in traditional advertising, a car manufacturer might target their ads towards an automotive magazine or Consumer Reports.

While this growth of Internet advertising has led to direct benefits for the advertisers, the benefits for the content publishers has not been so clear. Many web sites rely solely on advertising revenue, just as traditional content publishers, such as a magazine or newspaper would. These sites subscribe to an advertising network--such as Google AdSense, Fastclick, or Casale--and allow these advertising networks to place ads on their site, in exchange for a share of the revenues. However, unlike magazines or newspapers, web-based content publishers have little if any control over which ads are shown on their site. Ad-placement algorithms are considered to be the "secret sauce" of advertising networks, and content publishers become forced to show the ads that are selected by their advertising network. This can lead to large fluctuations in advertising revenue, and non-optimal ad placement. In addition, the rapid growth of Internet advertising has led to a direct growth in ad blockers, which cut off vital revenue for these web sites.

This project proposes a new model for Internet advertising--one that incorporates the content publishers into the advertising decision process. Similar to a greedy algorithm in Computer Science, this model pushes the final advertising decision to the web servers that serve the actual content. Rather than the advertising network providing a single ad to show, the content publisher receives a selection of ads. Each publisher in the network then makes the local optimum choice--collecting data on the effectiveness of ads previously shown, and using this data to select which ad will perform the best. With each endpoint selecting a local optimum, the network of advertisers and content publishers as a whole should become more effective.

II. RELATED WORK

Existing advertising networks, such as Google AdSense, Fastclick, and Casale, use JavaScript to embed their advertisements into content publisher's pages. For example, the following code snippet shows the typical HTML code to display Google ads:

<script type="text/javascript"><!-- google_ad_client = "pub-1234567890123456"; google_ad_width = 120; google_ad_height = 600; google_ad_format = "120x600_as"; google_color_border = "FFFFFF"; google_color_bg = "FFFFFF"; google_color_link = "0000FF"; google_color_url = "008000"; google_color_text = "000000"; //--></script> <script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> </script>

Thus, the advertisement itself is served directly from Google's server, pagead2.googlesyndication.com, and the web site itself has little control over the advertisement served. The logic of selecting an ad is handled entirely by the ad network itself--the web site publishing the content can only control the size, position, and color, as shown in the example above. In addition, this is easily detectable by ad blockers, since the JavaScript points to a known advertising server.

While the advertising networks keep their ad placement algorithms highly confidential, the academic and research communities offer some insight into possible algorithms. It agreed upon that a simple linear programming model is insufficient for ad placement. Consider the following example, discussed by Tomlin [6]:

Assume there are two advertisements, X and Y, where X earns 51 cents per thousand impressions and Y earns 49 cents per thousand impressions. A linear program,

Maximize 0.51 X + 0.49 Y
Such that X + Y = 100, X & Y >= 0

would yield the solution X = 100, Y = 0--thus, only ad X would be shown. However, the values 51 and 49 cents are only approximations of the ad revenues--since they may only be known with 10% accuracy, there is a significant probability that showing advertisement Y could yield more than advertisement X. Thus, it would be advantageous to rotate ads X and Y.

Chickering and Heckerman of Microsoft Research propose a solution to this problem of using "buckets" [2]. They group web pages into clusters, and use web server logs to calculate the probability of clicks on each advertisement in each cluster. They then group these probabilities into buckets of similar probabilities, and choose from the buckets. Their algorithm, when used to serve advertisements on msnbc.com, showed a 20-30% increase in click-through rates.

The current advertising networks use two different pricing models: Cost Per Mille (CPM) and Cost Per Click Through (CPC). The CPM metric comes from print advertising, and is the cost for 1,000 impressions, or showings of an ad. Initially, Internet advertising networks offered only CPM pricing, but CPC pricing offers advertisers a guaranteed amount of traffic to their web site, and thus is becoming popular. Google offers CPC pricing which allows advertisers to bid on individual keywords--keyword pricing can range from as little as 1 cent per click to as much as $100 per click [4].

Advertising networks which offer CPC pricing must work to prevent fraud (web publishers clicking links for the revenue). Anupam et al. [1] demonstrated a possible attack on CPC networks by using IP spoofing, to artificially inflate click counts, while making the attack appear as if it is coming from legitimate viewers of the web site. Recent work by Metwally et al. [5] have demonstrated how unique ad identifiers passed through cookies and bloom filters can be used to prevent such attacks.

III. PROPOSED WORK

This project aims to implement a prototype for an advertising network, which works like a greedy algorithm. It will use the standard CPC pricing model, where advertisers select a bid price per click when placing an ad. However, rather than having the advertising network select ad placement, and excluding the content publishers from the selection process, the advertising network will deliver a collection of advertisements--along with their bid prices and other statistics--to a content publisher in a standard (XML) format. The content publisher's web server could then run software to select which ads to display to a user, and use server-side includes or scripting to integrate the ads into the web site's content. Thus, the ads no longer rely on JavaScript running on the visitor's web browser. The two figures below illustrate the difference between current advertising systems and the proposed system.


An important aspect of this system is the pricing of ads. With this model, advertisements are bought and sold in an auction style. Advertisers attach a price per click to each of their ads. The higher the bid, the more impressions their ads are likely to receive. The advertisers can also target their ads to certain web sites or categories of web sites by offering a higher price per click--in the example discussed before, the car manufacturer may be willing to pay 50 cents per click from automotive sites, but only 20 cents per click from all other sites.

The advertising network forwards these offers to the content publishers via the XML data. (Typically, an advertising network would take a percentage of the price here--current advertising networks collect approximately 15-40% of the cost of the click.) Since the publishers have full knowledge and control of the ads shown on their sites, they can collect statistics on the click-through ratio of advertisements (how many times a visitor clicks on an ad, out of how many times the ad was shown) and choose to show the most profitable ads. A publisher would want to maximize their revenue per impression--the product of the cost per click offer from the advertiser and the click-through ratio of the advertisement. Thus, the publishers collect data and determine the local optimum for advertising on their site.

Furthermore, an advertiser can collect data to maximize the profitability of their ad campaign. For instance, if statistics show that traffic coming from one web site is twice as likely to make a purchase than visitors coming from other web sites, the advertiser can raise their cost per click bid on the more profitable site and lower their bid on the others. By increasing their bid price, they will gain additional advertising on the profitable site, and maximize the profitability of their advertising campaign.

This approach offers numerous potential benefits over existing Internet advertising networks:

For the scope of this class project, the following assumptions will be made:

IV. PLAN OF ACTION

The prototype system will be implemented in PHP, using a SQL database as a backend. The hardware resources required are only a web server--either College of Computing or personal resources can be used. A workable demo will be created by the end of the semester. The table below summaries the schedule for this project.

Feb. 20-24 Finalize Project Proposal and Requirements
Feb. 27-Mar. 3 Design of XML Protocols
Mar. 6-10 Design of Database Schema / Initial Implementation
Mar. 13-17 Implementation
Mar. 20-24 Spring Break
Mar. 27-31 Implementation
Apr. 3-7 Debugging / Experimentation
Apr. 10-14 Prepare Final Report / Presentation
Apr. 17-21 Final Project Presentation

V. EVALUATION

Constructing a working demo of the system will allow possible problems to be discovered and analyzed. In particular, some of the following obstacles will have to be addressed:

Time permitting, additional experimentation will be performed using the system. Two possibilities include:

VI. BIBLIOGRAPHY

[1] V. Anupam, A. Mayer, K. Nissim, B. Pinkas, and M. Reiter. "On the Security of Pay-Per-Click and Other Web Advertising Schemes." In Proceeding of the 8th International Conference on World Wide Web, 1999.
[2] D.M. Chickering and D. Heckerman. "Targeted advertising with inventory management." Microsoft Research Tech Report MSR-TR-2000-49, August, 2000.
[3] K.J. Delaney. "In Latest Deal, Google Steps Further Into World of Old Media." The Wall Street Journal, January 18, 2006, Page A1.
[4] Google AdWords. https://adwords.google.com/support/.
[5] A. Metwally, D. Agrawal, and A. El Abbadi. "Duplicate Detection in Click Streams." In Proceedings of the 14th WWW International World Wide Web Conference, 2005.
[6] J.A. Tomlin. "An entropy approach to unintrusive targeted advertising on the web." In Proceedings of WWW9, 2000.