Spam Grows Up
The increasing threat of internet abuse (view PDF)
by Dan Gillick

Consider the email that found its way into my Gmail inbox this morning: "Any women will jump into the abyss for a man that wears a Submariner SS watch. At any place of the world, you will know the right time. Hurry to Click." Though this sounds like a parody— mock-heroic advertising—the numbers convey a more serious narrative. In 2007, worldwide email spam increased by 100% to over 120 billion messages daily, accounting for 85-95% of all email. Some estimates say this number doubled again in 2008. That means, on average, more than 150 unwanted emails delivered to each internet user every day (though Bill Gates is rumored to receive upwards of 10,000).

The scale of this absurdity appears to justify an existentialist philosophy on the part of Internet users, who tend to treat spam like Estragon treats his boot in Waiting for Godot: he struggles to remove it, gives up, and mutters "nothing to be done..." In the end, though, the tragicomedy—"killer softwares for the price of nuts"—has an economic interpretation: while most men have little interest in abyss jumpers, it just takes a few curious souls who think nut-priced software sounds good to keep such spam campaigns profitable. As long as average revenue per email is more than the cost to send the email, the spammer's logic—send as much as possible—makes good sense.

Until recently, the dollars and cents details of spam's economic proposition were mostly a matter of rampant speculation. IBM Internet Security Systems expert Joshua Corman is widely quoted as claiming that spam sent from the notorious Storm botnet—an army of hacked personal computers controlled by spammers—is generating "millions and millions of dollars every day." Last year, a team of researchers at UC Berkeley and UC San Diego set out to test this claim.


Spamalytics
Christian Kreibich, a staff member at the International Computer Science Institute's Networking group in Berkeley and coauthor of the research paper "Spamalytics: an empirical analysis of spam marketing conversion," presented at the Association for Computing Machinery Conference on Computer and Communications Security last year, begins with a plea: "A lot of people misinterpret what we're doing. We're not sending spam!" And already, the kinds of disadvantages the researchers face in the escalating struggle to secure the Internet are apparent. "The best way to measure spam is to be a spammer"—the paper's most quotable line, taken out of context, has provoked some misguided outrage.

The research, better understood as slipping tracking devices into outgoing spam emails, documents the lifespan of half a billion such messages sent by the Storm botnet. A botnet is a network of "bots," computers in offices, homes, coffee shops, and atop laps around the world that have been compromised by the latest incarnation of computer virus. Storm propagates in many ways, often through spam that tricks users into downloading and running it themselves. Once installed, the software broadcasts its availability to the Storm network. A master server organizes tasks and "proxy" bots distribute specific instructions to "worker" bots, which return status reports and request more work. Some estimate that Storm, at its peak in late 2007, controlled over 1 million computers.

Kreibich and his colleagues intentionally installed the software that runs Storm on their own servers, which began communicating with the vast network of compromised computers. Within Storm's command and control hierarchy, their machines served as renegade proxies, receiving instructions from the botnet operator that they relayed en masse to workers requesting tasks, though altered slightly to direct curious email readers to a mock pharmaceutical web site built by the researchers. The site allowed visitors to fill a shopping cart, but the checkout link returned an error message so neither personal information nor money was exchanged.

Over 26 days, they tracked some 350 million pharmaceutical emails. 76% of them were never delivered (blacklists block all email from a list of addresses that are known to be operated by spammers, for example), and 99% of what remained was probably blocked by inbox spam filters. In the end, 10 thousand users visited the pharmacy page, and just 28 tried to make purchases, averaging $100 each. All but one involved male enhancement products. Estimating that they surveyed 1.5% of the Storm network gives an approximate viagra-inspired revenue of $3.5 million in a year.1 "A bit less than Ômillions of dollars every day,' but certainly a healthy enterprise," the paper explains.

But what about the cost of sending email—the other half of the equation? Are Storm's operators selling Viagra or are they selling the means to sell Viagra? The possibility of the latter is frightening because it suggests a maturing underground economy as opposed to a few isolated programmers causing problems. Anecdotal reports suggest that the retail price of spam delivery, the going rate on the black market, is nearly $80 per million emails. At this rate, however, Storm's "clients" would be losing money quickly: only $1 in revenue for every $10 spent, assuming they keep all the profit from the purchases. And yet, the spam keeps coming.


Abuse arrives
The world's first email spam (the term itself derives from a Monty Python skit involving singing Vikings at a spam-heavy restaurant—repetition ad nauseum) was sent on May 3, 1978 by Gary Thuerk, an aggressive marketer of DEC "minicomputers," who had an employee enter the electronic addresses of 600 West Coast customers—mostly computer scientists—by hand. The message, written in all capital letters, advertised product demonstrations in Los Angeles and San Mateo. Delivered via Internet predecessor Arpanet, it was greeted with widespread hostility and a notable promise from Arpanet Management Branch Chief, Major Raymond Czahor: "appropriate action is being taken to preclude its occurrence again."

In September of 1990, Vern Paxson, now Professor of Computer Science at UC Berkeley and Senior Scientist at the International Computer Science Institute, enrolled in a "special topics" course on networking as a graduate student at Berkeley. He began measuring network traffic, the amount of information flowing over the Internet. At the time, there were some 313,000 Internet hosts, or connected computers, passing 9.5 megabytes of data—the textual equivalent of the six longest works of Charles Dickens—through USENET bulletin boards (a newsgroup precursor to the World Wide Web) in a day. By the time his first paper on network measurement was published in May of 1994, the Internet had grown at least ten times larger, including 3 million hosts shuttling considerably more bulletin board data each day than Dickens wrote in his lifetime.

Looking back, the only thing more striking than the rate of growth is the consistency of the expansion. Between 1986 and 1994, the total volume of USENET traffic grew by 75% each year with startlingly little deviation. And then, everything changed. Between late 1994 and 1996, the average size of USENET postings, which had remained virtually constant since the mid 1980s, increased nearly tenfold. People were no longer just exchanging small written messages; they were uploading pornography and stolen software. "Abuse had arrived," Paxson declares.


Amateurs

"Mid- to late-90s network abuse was characterized by vandals and braggarts," says Paxson. "Hackers were energetic but imitative," their motivations petty rather than financial, often tagging, graffiti-style, the software tools they wrote for exploitation. He estimates that nearly 75% of junk postings involved stolen software during this period.

Meanwhile, as the number of connected computers continued to grow steadily into the new millennium, a far more nefarious trend was developing. Paxson shows a graph of automatic scan activity observed at Lawrence Berkeley National Lab (LBL)—programs looking purposefully or at random for computers to connect to easily—that shows a sharp increase in late 1999. The era of automated attacks was dawning.

The conceptual breakthrough that gave birth to the modern state of affairs is the self-replicating program. Abstractly, a computer program takes some input, does some computation, and returns some output (most software combines several of these units to produce an interactive effect). While the theory of a program that could output itself was developed before computers were a reality, practical implementations appeared in the 1980s. The term virus was coined in 1983—a program that infects other programs, modifying them to email everyone in your address book, for example, often with copies of itself. While a virus typically requires some action on the part of the user, like opening a misleading email attachment, a worm, by contrast, works independently, exploiting a flaw in the design of some common software like Microsoft Windows.

"When an attacker compromises a host, they can instruct it to do whatever they want," Paxson says. In particular, "automatically instructing it to find more vulnerable hosts to repeat the process creates a worm—a program that self-replicates across a network." Since each new copy works on copying itself too, a worm can grow exponentially fast. The Code Red worm of July 2001 infected 360,000 computers in 10 hours. On January 25, 2003, Slammer infected 75,000 computers in under ten minutes. Airline flights were cancelled, election proceedings faltered, and ATMs failed. Paxson estimated that a well-written worm could cause upwards of $100 billion of damage in a day.


The invisible hand
While worms continue to plague the Internet—Conficker infected nearly 15 million Windows-based personal computers in January—the worm era, characterized by an intrepid anarchist playfulness, gave way to something with real staying power: markets. "A sophisticated underground economy has emerged to profit from Internet subversion," Paxson explains. Bots are herded together into botnets, computational armies with enormous collective bandwidth, and "dirt-cheap access to bots fuels monetization via relentless torrents of spam."

For the average emailer, this monetization has manifested itself mostly in email volume. But behind the scenes, all sorts of new markets are flourishing. ProAgent2.1 ("records all keystrokes... usernames, passwords... completely hidden!") is sold by Spy Instructors Software, which advertises its own customer support department. AllBots, Inc. offers "account creators" for MySpace ($140-$320), YouTube ($95), and Friendster ($95), and advertises "GOOD News!!! We have just integrated CAPTCHA Bypasser—software for automatically reading the squiggly-lined characters intended to confirm humanness."

Opportunities for monetization lead to specialization, competition, and increasing ingenuity. As Paxson points out, these trends are making technical security research much more difficult than "fending off ardent amateurs." Furthermore, the emergent economic ecosystem, often built on affiliate programs, makes litigation tricky. "Selling software to efficiently subvert a machine is probably not illegal," he notes.

Botnets like Storm seem to be growing, fooling na•ve users into installing the program in a variety of ways. In an expansion campaign tracked by Kreibich, Paxson, and their colleagues, Storm computers emailed huge numbers of "AwesomePostcards"—complete with dancing banana—for users to download the malicious software themselves, infecting machines Trojan-horse style. Kreibich reports that this campaign was alarmingly successful: "One in ten people visiting an infection website downloaded the executable and ran it."

Once Storm is installed, it's virtually undetectable. It uses very little processing power and does its work quietly and discretely when nobody is likely to mind. Orders come in gradually, perhaps a few thousand email addresses at a time, to email regarding Viagra, knock-off watches, stolen software, awesome postcards. Modern spam filters are clever, Kreibich explains, so "no two emails that Storm sends are exactly the same." The bots are instructed to create random permutations of the essential words and letters, specifically designed to slip past the defenses of Gmail, Hotmail, and Spamassassin.


What are we up against?
While a few extra emails about sexual enhancement products are mostly an annoyance, the possibilities for an attacker who controls hundreds of thousands of computers are alarming. Shortly before the 2008 South Ossetia war, a Distributed Denial of Service (DDoS) attack took down the web sites of Georgian President Mikhail Saakashvili and the National Bank of Georgia. This was the work of a botnet, its bots instructed to connect to these sites simultaneously, overloading the servers that process incoming requests for data (it remains unclear whether this was coordinated by the Russian government).

DDoS attacks, described in a Wired article as "the digital equivalent of filling a fishtank with a firehose," have recently targeted the BBC, CNN, AlertPay (online payment), and GoDaddy (domain name registration), for example. Some of these, like a current attack on KidsInMind.com, a movie rating site for parents, appear to be acts of "hacktivism," but the monetary incentive—demanding ransom from companies unprepared to parry such attacks—is feeding a post-cause generation.

Unfortunately, spam and DDoS are just the tip of a formidable iceberg. Identity theft, by logging keystrokes on compromised computers or by coordinated password guessing, for example, appears to be on the rise, though it seems that this is not yet the focus of most botnet operators. More generally, any kind of private information with value, from internal business statistics to government secrets, creates incentives for theft.


Bitblazing
Opportunities for attackers create research projects for graduate students like those working with Dawn Song, a computer science professor at UC Berkeley. A deep understanding of what makes software exploitable and the ability to infer the logical structure of a program before running it are the keys to buttressing the user side against even the most novel attacks. Broadly speaking, these are Song's research goals in her group's BitBlaze project, which aims to screen software for potential danger and defend against malicious code—exploits that cede control to an attacker.

Prateek Saxena, a graduate student working with Song, outlines the most common exploit around, involving a "stack buffer overflow." By studying a piece of software, an attacker can often find a specialized input that causes the program to crash, and in the process, return control to the attacker rather than the user. While it is good programming practice to include checks for buffer overflows, Saxena says "missing a few is all but inevitable." Programmers, after all, are human. Microsoft's updates or "patches" often add an overflow check for some buffer buried deep in the Windows operating system. Unfortunately, according to Saxena less than 5% of users update in a timely fashion. In addition, Song's group has shown that exploits can be generated automatically by using a Microsoft patch to find flawed code—a serious concern given how quickly worms can spread.
More generally, the group is interested in program analysis, a subfield of computer science concerned with inferring the behavior of a program from its source code. This is important for security because it is "a step up from antivirus signatures," Saxena explains (traditional antivirus software looks for distinctive sequences of characters in files—like a fingerprint), "which fails because malware encodes itself."

Consider the problem Kreibich and his colleagues faced in reverse engineering Storm. Following an AwesomePostcards link intentionally, they downloaded a 140 kilobyte executable file. That's around 300 pages filled with 0s and 1s. From this binary novel, they sought first to distill the command and control structure of Storm, and second, to create a subtly modified version of the file for their research purposes. Song's group would like to scan such binary files automatically, looking for suspicious instructions.

Their approach is two-pronged. The first, static analysis, is more theoretically appealing and more challenging. Each computer program, at heart, enacts some underlying flow chart. A student in an introductory computer science course might be asked to produce such a chart from a few lines of code—often a tricky exercise. The task of static analysis is to create this flow chart automatically, from arbitrary code. There is one other complicating factor: modern programming languages provide a layer of abstraction between the commands the machine can actually process and the way human programmers think. "Compiling" a program translates it from a human-readable language into a machine-readable language, written and stored in binary.

The second prong involves dynamic analysis—taking a functional or behavioral approach to understanding a mysterious binary file. Saxena calls this strategy "taint tracking." The researchers create a "virtual machine," an operating system running inside the normal operating system, and run programs in this controlled environment, collecting information. What data is read from memory and written to memory? What files are accessed? Is any connection established with remote computers? What kind? By assembling such statistics for known safe programs and known malware, they can compute the probability that some unknown program may be dangerous based on its behavior.

One goal of the BitBlaze project is to revolutionize antivirus software by combining elements of both static and dynamic analysis. By testing unknown software in a controlled analysis environment, Song hopes to dramatically improve protection for Internet users. "People download random programs all the time, like curious children will chew on just about anything," Song says. Internet users need something like watchful parents to keep them out of trouble.


Carrots and sticks

Professor John Chuang, at Berkeley's School of Information, shares Song's feelings about required supervision, but takes more of an economist's perspective. "How did botnets come about?" he asks rhetorically, referring not just to the technical achievement but to their centrality in the developing black-market economy. "The fundamental insight is that there is a misalignment of incentives at work." While the botnet operators profit, the perceived cost to each user is effectively zero. Individuals almost never know if they are infected and don't care. Even if the result is some kind of identity theft, the connection between a dodgy website visited six months ago and some mysterious credit card charges is tenuous at best, and practically, the credit card companies and banks almost always assume the financial burden in such cases.

Chuang likens the botnet phenomenon to a kind of "reverse free-riding." Whereas file sharing or public television benefit everyone regardless of their support, botnets are "a peer-to-peer network contributing to network insecurity—a public bad as opposed to a public good." To free-ride the Internet is to contribute to a public bad. "Botnet operators have stumbled upon or engineered technology based on this misalignment and exploited it to the fullest," says Chuang.

If the basic problem is a misalignment, then the solution may involve re-aligning. Chuang and his students are just beginning to explore what this might involve. The "stick" scenario makes users economically liable for security breaches originating at their computers; the "carrot" option—"typically more successful," says Chuang—rewards users who invest in security. Good behavior—keeping antivirus software up to date, installing operating system patches, avoiding suspicious websites and downloads—could result in a rebate from an Internet service provider like Comcast or Verizon. Otherwise, Chuang suggests, they could raise premiums, like car insurance rates increase after an accident.

The car insurance analogy is not perfect, but it reveals something about how new and unregulated the Internet industry is. "I've heard this line about how we call it the ÔInformation Superhighway'," says Chuang, "and yet there's no license to drive or driver's education." A reckless driver is dangerous for everyone on the road, so it's good for society to insist on training drivers and penalizing them when they go too fast. Perhaps Internet users should be treated similarly.

Missing data
Back at the International Computer Science Institute, Dr. Paxson produces a few photocopied pages from a precarious stack. They are transcripts of legal proceedings—a credit card fraud case. He flips through and points to a table indicating the stolen amounts, a few million dollars in total. "This guy's not a punk," he says, "but he's probably not a kingpin either." Such fragmented evidence is Paxson's response to a question about the size of the industry. "I wouldn't feel comfortable quoting a number. Not even an order of magnitude," he says.

To understand the discrepancy between the retail price of spam (how much a botnet operator charges to send junk email) and the conversion rate observed in the Spamalytics study requires much more information about the industry. The statistics suggest that Storm is not particularly decentralized—perhaps a few disgruntled expert programmers making a good living selling Viagra. Or maybe the one campaign the researchers tracked was just a side-project, an experiment much less fruitful than their primary activities.

Kreibich admits he knows little about economics, but his sense is that the market is far from mature. To study this question, the group, which recently hired a new postdoc with an economics background, is trying to measure diversification. Spam emails have links, and each link is associated with a domain, which somebody had to register. Unfortunately, databases with this information are neither centralized nor standardized, so automating these lookups is difficult. Worse, registrars offer "domain testing," allowing a potential client to see what kind of traffic a new domain name receives before finalizing their purchase. Spammers capitalize on this service, using a temporary domain for a few days and then moving on.
Another way to study diversification is through the appearance of the destination sites. While spam emails show impressive variety, the pharmacy sites they link to, for example, are quite consistent, allowing the researchers to make some inference about the major players in the field. At this point, Kreibich says they have a reasonable idea of "which botnets are sending which spam."

One interesting proposal, Paxson's idea, which has not yet developed into a research project, involves studying the unusual "mule" market. Some spam emails offer the opportunity to "make money from home"—receiving and re-shipping packages or transferring funds to international bank accounts. What differentiates these mule requests from typical spam is that there is often a person at the other end. "We responded to one of these emails," says Kreibich, "and sure enough, we were able to see from the reply—this guy in Moscow was using a Macbook, running a particular version of Microsoft Outlook." By automatically generating responses, spamming the spammers, the researchers could try to assemble a map of mule requests. Such data would be invaluable, Kreibich suggests, because "the economy may be bottlenecking on mule supply"—a crucial method for distributing the money-laundering task.

As with each project, "we have no idea what we'll find," Kreibich says, since so much about the market and its operators is still unknown. In November, two Internet service providers decided to stop routing traffic from McColo, a web host suspected of housing computers involved in criminal activity. In one day, worldwide spam dropped by over 60%. The demise of Intercage, another host known to have operated a number of Storm's control servers, led to a similar but less dramatic reduction in September. Both McColo and Intercage are based in California. "We always suspected Storm was operated out of another country, likely Russia," Kreibich says, "and here it turns out that some of their most crucial infrastructure was located just down the street."

Dan Gillick is a graduate student in computer science.





The notorious Russian Business Network (RBN), referred to as "the baddest of the bad" in a report written by security company VeriSign, is "a for-hire service catering to large-scale criminal operations." Originally organized by computer science graduate students as a legal Internet service provider, illegal activity proved financially irresistible. Little is known about the network, but rumors abound: it has been blamed as the perpetrator of the Georgia cyber-attacks; its leader, known as "Flyman," is supposedly related to a powerful Russian politician; it is the alleged operator of Storm.

Many Internet providers host illegal material—online gambling sites, for example—but according to VeriSign, "the difference is that RBN is solely criminal." They also seem emboldened by immunity from Western law enforcement. When, in late 2006, the National Bank of Australia tried to fight the "Rock Phish" scheme that tricked users into revealing account numbers and passwords, the RBN took down their website for three days.
"RBN feel they are strongly politically protected. They pay a huge amount of people. They know they are being watched. They cover their tracks," says VeriSign. According to the report, only strong political pressure on Russia will keep the RBN in check.









Comments on this article? Drop us a line at with 'letter to the editor' in the subject!






Home | Read | Blog | Join us | About us
© 2009 Berkeley Science Review