Neha Mehta is a J.D. Candidate, 2021 at NYU School of Law.

Introduction to Data Scraping

It’s hard to imagine how the Internet would function without web scraping. Web scraping, often referred to as crawling or spidering, is the automated extraction and collection of large amounts of data from websites. The primary function of web scraping is to utilize the repositories of data collected to provide meaningful insights on a variety of metrics, including real estate price comparison, website competition monitoring, and stock market analysis.  The practice is ubiquitous and has been employed by many sites, including common search engines, such as Google and Yahoo. Similarly, many websites permit web scraping by third-parties to provide real-time analytics. Despite the benefits of scraping, scraping can be problematic when web scrapers collect information on sites without explicit consent. In the last few years, the improper collection, retention, and use of third-party data has grown to be an alarming phenomenon.

Computer Fraud and Abuse Act

Even though web scraping is pervasive, the legality of the practice has not been definitively settled. Given the novelty of web scraping, in addition to its highly technical nature, there is no comprehensive legal framework that aims to regulate web scraping. However, plaintiffs who have sought to bring suit against third-party companies that engage in automated scraping of user data have turned to the Computer Fraud and Abuse Act (CFAA). The CFAA was passed in 1986 and was meant to protect online data from improper web scraping by imposing both criminal and civil liability. The CFAA broadly states that “whoever…intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains…information from any protected computer…shall be punished.” Because the CFAA does not explicitly define “without authorization,” courts have struggled to interpret its meaning, often resulting in conflicting judicial consideration. For example, one issue the courts have had to confront is whether “without authorization” should be interpreted within the scope of a company or site’s terms of use.


The hiQ Labs, Inc. v. LinkedIn represents the newest development in cases concerning third-party web scraping practices in violation of a site’s terms of service. HiQ Labs, a data analytics company, scraped data from public LinkedIn profiles to develop competing analytics tools; hiQ labs would routinely sell the data it had aggregated from LinkedIn users to employers.  In response, LinkedIn issued a cease-and-desist letter claiming that hiQ Labs had violated the CFAA and LinkedIn’s User Agreement, and ordered hiQ Labs to stop accessing and retaining public user data. HiQ Labs preemptively filed suit seeking injunctive relief, which the district court granted. Therefore, on appeal, the Ninth Circuit had to determine if after hiQ Labs had received LinkedIn’s cease and desist letter, whether any further scraping was done “without authorization” within the scope of the CFAA.

The Ninth Circuit found that automated scraping of publicly accessible data likely does not violate the CFAA, even if the site owner tries to revoke access through a cease-and-desist letter. The court reasoned that the “without authorization” provision does not apply to publicly accessible data and covers private data that web scrapers collect when circumventing “permissions, such as username and password requirements.” Here, LinkedIn’s data was open to the public. Moreover, the Court pointed to the legislative history of the CFAA and noted that the “prohibition on authorized access” was not meant to police automated scraping of publicly available data.


While hiQ Labs, Inc. v. LinkedIn Corp. signifies a win for those who advocate that information on publicly available sites is free to utilize, the Ninth Circuit did not issue a slam-dunk ruling; rather, the court’s opinion was far from definitive. Even though the court maintained that scraping public data from websites is legal under the CFAA, it observed that there were alternative laws that could provide recourse to corporations or websites who seek to prevent wholesale copying of public information, including claims rooted in trespass to chattels, copyright infringement, conversion, unjust enrichment, etc. Thus, the Ninth Circuit’s holding may be narrow in scope, limited to the legality of web scraping under the CFAA.

Additionally, the court expressed concern about providing companies, like LinkedIn, sole discretion in deciding who can collect and use data that companies do not own and otherwise make publicly available. Providing companies with the power to determine what information third-party sites may collect and use may lead to the creation of an information monopoly that would work contrary to the public’s interest in preserving an open and fair Internet. However, if authorization under the scope of the CFAA is interpreted to mean password or login-protected, the Court’s decision may in turn incentivize companies and websites to implement securities measures that would transform publicly available data into private data. This is especially likely given cease-and-desist letters, one of the few legal remedies available to companies, are not likely to deter third-party scrapers.

For the time being, the Ninth Circuit’s ruling provides relevant guidance on how the court will treat future web scraping cases. However, with the lack of uniformity nationwide regarding the interpretation of “authorization” and the perceived benefits of web scraping, comfort may come only if the Supreme Court resolves lingering questions related to the scope and application of the CFAA.