Twitter's Big Data crunching 'BotMaker' muscles in on spam …

ByMike Wheatley cycle, data, hacktivists, jeyaraman, machine, real, silicon, spammers, tinkerbots, wheatley Comments Off

If you’re an avid Twitter user, you might have noticed a significant drop in the amount of spam messages and tweets bugging you. That’s because Twitter’s introduced a new anti-spam system called BotMaker that’s helped it to achieve a 40 percent reduction in its key spam metrics.

Twitter’s Raghav Jeyaraman describes in a lengthy blog post why fighting Twitter spam is much more challenging than defending against traditional email spam. He also revealed how Twitter’s developers went about creating BotMaker, and provides a simplistic look at its architecture.

Why spam loves Twitter

There’s a good reason why Twitter is so vulnerable to spam – it’s wide-ranging APIs, which are designed to let developers easily interact with the site, means that spammers “know (almost) everything” there is to know about how it functions. As a result, it’s proven very easy to create and distribute spam, and very difficult to deploy countermeasures against it.

Twitter’s real-time nature presents another problem too, because it means countermeasures that are deployed do not add to the latency of the user’s overall experience.

Keeping in mind these challenges, Twitter’s spam fighters needed to design a system that would do three things – prevent spam from being created; reduce the amount of time spam is visible; and reduce the reaction time to new spam attacks. At the same time, Twitter had to ensure that no one was able to tamper or bypass its system, and that it didn’t lead to more latency.

BotMaker to the rescue!

Such a complex challenge requires an even more complex system, and BotMaker was devised in three parts. “Scarecrow” is a low-latency subsystem designed to check for spam in the write path of Twitter’s main processes (tweets, retweets, favorites, messages and so on) in real-time. Meanwhile, “Sniper” is described as a “computationally-intense and learning sub-system” that checks in “near real-time” the user and content event logs of Scarecrow.

Finally there’s BotMaker itself, which is constantly being fed data from Scarecrow and Sniper. It’s job is to issue one of three commands to the write path (accept, challenge or deny), and also to the actioner (delete message, reset password, suspend), to cut out much of the spam. In addition to these efforts, Twitter runs periodic checks on all of the data BotMaker compiles to try and sniff out more spam and dodgy accounts.

Image credit: Twitter blog

The end result is an anti-spam system with a low-latency filter that’s capable of cleaning up spam with high-latency processes. It’s also capable of machine learning, which means it can adapt to get better as time goes by.

BotMaker’s rule language and data structures were built in a way that allows for rapid development, testing and deployment of system wide code changes. This allows BotMaker to quickly iterate and refine its rules and models in the evolving fight against spam.

“Spam evolves constantly,” wrote Jeyaraman. “Spammers respond to the system defenses and the cycle never stops. In order to be effective, we have to be able to collect data, and evaluate and deploy rules and models quickly.”

Jeyaraman explained that this was achieved by making BotMaker language typw safe, all functions pure and all data structures immutable, while ensuring the runtime supports common functional programming idioms.

photo credit: Tinkerbots via photopin cc

About Mike Wheatley

Mike loves to talk about Big Data, the Internet of Things, Hacktivists and hacking, but he also hates Google and can never resist having a quick dig at them should the opportunity arise 🙂 Got a REAL news story or tip? Email [email protected].

View all posts by Mike Wheatley

This article:

Twitter's Big Data crunching 'BotMaker' muscles in on spam …

Author

Mike Wheatley