What is scraping?

Scraping occurs when a user or bot copies large amounts of information from a website. Scraping can be benevolent or malicious.  See Glossary for definitions.

What is the scraped data used for?

Scraped information can be used for weather data monitoring, Web data integration, online price comparison, research, and website change detection. More malicious uses include data theft, site copying, personal information use, etc.

What are the impacts of scraping on my site/web business?

Scraping may overload the server and bandwidth, lead to denial of service, data value dilution, loss of revenue, and brand devaluation.

I don’t care about the content on my website. How can I benefit from a scrape prevention product?

Scraping threatens site layout, fonts, site user information, etc. There is a good chance that users of your site do not want their information taken.

Don’t CAPTCHA’s eliminate a scraping threat?

No. If scraping is done manually then a human can fill out a CAPTCHA without issue. Optical Character Recognition (OCR) systems can also translate CAPTCHA’s so bots can solve them and continue scraping.  Additionally, CAPTCHA farms employ humans to solve thousands of CAPTCHA’s for a few dollars thereby rendering CAPTCHA’s wholly ineffective.

What is a bot?

Bots are software applications that run simple and repetitive automated tasks on the Internet.

Is scraping legal?

Though scraping may be against the terms of use of some sites, the enforceability of those terms of use may be difficult to exact. U.S. courts have ruled that the duplication of facts is legal, though in other cases the courts have held scrapers liable for committing trespass to chattels (from Wikipedia: “Trespass to chattels is a tort whereby the infringing party has intentionally (or in Australia negligently) interfered with another person’s lawful possession of a chattel (movable personal property”).

How can I know if I am being scraped?

A human may be able to detect scraping by checking a site’s log files for repetitive clicking by an IP address that appears inconsistent with normal site use. That said, looking through log files is not a human job and is best accomplished by scrape prevention product. Of course, scraping is obviously evident if your site’s information appears elsewhere in the form of a copycat site.

Why should I protect my site against scraping?

See “What are the impacts of scraping on my site/web business?”