What is scraping?
Scraping occurs when a user or bot copies large amounts of information from a website. Scraping can be benevolent or malicious. See Glossary for definitions.
What is the scraped data used for?
Scraped information can be used for weather data monitoring, Web data integration, online price comparison, research, and website change detection. More malicious uses include data theft, site copying, personal information use, etc.
What are the impacts of scraping on my site/web business?
Scraping may overload the server and bandwidth, lead to denial of service, data value dilution, loss of revenue, and brand devaluation.
I don’t care about the content on my website. How can I benefit from a scrape prevention product?
Scraping threatens site layout, fonts, site user information, etc. There is a good chance that users of your site do not want their information taken.
Don’t CAPTCHA’s eliminate a scraping threat?
No. If scraping is done manually then a human can fill out a CAPTCHA without issue. Optical Character Recognition (OCR) systems can also translate CAPTCHA’s so bots can solve them and continue scraping. Additionally, CAPTCHA farms employ humans to solve thousands of CAPTCHA’s for a few dollars thereby rendering CAPTCHA’s wholly ineffective.
What is a bot?
Bots are software applications that run simple and repetitive automated tasks on the Internet.
Is scraping legal?
How can I know if I am being scraped?
A human may be able to detect scraping by checking a site’s log files for repetitive clicking by an IP address that appears inconsistent with normal site use. That said, looking through log files is not a human job and is best accomplished by scrape prevention product. Of course, scraping is obviously evident if your site’s information appears elsewhere in the form of a copycat site.
Why should I protect my site against scraping?
See “What are the impacts of scraping on my site/web business?”