Background

Patientslikeme.com noticed suspicious activity at 1 am in one of its forums which users express highly personal information on their experiences with medical conditions including depression and other emotional disorders. A new user was seemingly copying every single entry on the forum.

Problem Being Addressed

Upon identifying the suspicious activity, PatientsLikeMe decided to block the user. Note: they were not using ScrapeDefender at this time.

Action Taken

PatientsLikeMe was using confidential software that monitors unusual activity, which notified site administrators of suspicious behavior.  After the behavior was flagged, the chief marketing officer, David Williams, identified the suspicious user as a bot running an automated script. He blocked and then shut down the user’s account. However it was unclear how long this suspicious behavior had been occurring.

Problems Experienced

PatientsLikeMe identified three other suspect sites and blocked their access by the next afternoon. All suspect accounts were traced back to Nielsen Co., the  media-research firm that provides monitoring services for various mass media including the Internet and collects data for its clients.

PatientsLikeMe sent a cease-and-desist letter to Nielsen. Ten days later, Nielsen agreed to stop scraping but then said it was unable to remove the scraped data from its database. A company spokesman later said Nielsen had found a way to quarantine the PatientsLikeMe data to prevent it from being included in its reports for clients.

PatientsLikeMe’s president, Ben Heywood, disclosed the break-in to the site’s 70,000 members in a blog post where he reminded users that PatientsLikeMe sells its data in an anonymous form, without attaching user’s names to it. That setoff a debate on the site about the legality of selling sensitive health related information. The company says most of the 350 responses to the blog post were supportive. But PatientsLikeMe says 218 members subsequently quit and were apparently concerned their real names could be traced through usernames that may have been copied. “I felt totally violated,” says Bilal Ahmed, a 33-year- old resident of Sydney, Australia, who used PatientsLikeMe to connect with other people suffering from depression. He used a pseudonym on the message boards, but his PatientsLikeMe profile linked to his blog, which contains his real name.

Trends Identified: Data scraping is now an accepted practice of main-stream companies who have the ample resources to successfully collect and use web site’s confidential information.

The use of commercially reasonable security practices is the standard practice for protecting Web sites that publish confidential information. In this case, patients (site users) volunteered the personal information and related medical histories. While PatientsLikeMe clearly states in its Terms Of Use they may use and even resell user information, they were allegedly not fully prepared for a scraping attack of this magnitude nor were their users fully protected.  It may have been a surprise that the Web scraping activity originated from the century old, large and respected US based firm Nielson Company rather than a smaller firm.

Recommendation

The use of a commercially accepted anti-scraping  solution, such as ScrapeDefender, would have identified the attack immediately and blocked all four suspect accounts at the start of the attack rather than it was too late.

The Future

The scraping threat will only worsen in the future. According to the Wall Street Journal, Oct 12, 2010 firms will double spending on data scraped from the internet in 2012 ($840 million) and larger, blitzkrieg like attacks will plague sites.

References

http://online.wsj.com/article/SB10001424052748703358504575544381288117888.html