On May 9, 2024, an opinion was issued in the case of X v. Bright Data by the US District Court for the Northern District of California, on the topic of copyright preemption. On first blush, this opinion is important for what is says about the limited ability of social media sites to prevent data scraping via their terms of service. But it also provides some interesting commentary on the more general issue of copyright preemption.
In this case, X sued Bright Data based on violation of the X terms of service, which prohibit “Misuse of the Services,” and specifically, “scraping the Services in any form, for any purpose.” Such terms have long been common in online terms of service, and have become even more common after the last year’s rush of machine learning model developers scraping public sites for training data.
The decision was issued by Judge Alsup, who famously ruled for Google on the issue of protectability (or lack thereof) of APIs under copyright law. Based on that opinion alone, if nothing else, Alsup is known for his sophistication about technology issues; he famously learned some Java to opine on that case.
Preemption: State Versus Federal Law
Preemption is a key concept in copyright law. Preemption dictates the interaction between US state law and federal law. Copyright is federal law only; the US Copyright Act of 1976 made this crystal clear under Section 301(a). But it says that legal claims “are not preempted if they fall outside the scope of 301(a)’s express preemption and are not otherwise in conflict with the Act.” Ryan v. Editions Ltd. W., Inc., 786 F.3d 754, 760 (9th Cir. 2015).
The policy reason behind this strong statement of copyright preemption was mainly to prevent individual states from making laws creating their own more restrictive, conflicting versions of copyright law. Copyright law is a balancing act: it allows authors exclusive control of certain activities, like copying and distribution of their works, but that power is balanced against the rights of others to use works of authorship in some ways.
Particularly for works like databases and software, copyright protection has many limitations. In the Oracle v. Google case, all of these doctrines came into play: idea/expression dichotomy, merger, short words and phrases, de minimis–and pivotally, fair use. All these limit the power of the author to control certain uses of their works. More restrictive state law–in the form of contract, unfair competition, and similar theories–threaten to rewrite the balance that federal copyright law represents.
Alsup notes in the decision that there are two clauses to Section 301(a)–scope and conflict. State law claims can survive preemption if they deal with something outside the scope of copyright law, such as overloading servers or use of name and likeness. But the statute also refers to conflict. “Although conflict preemption has played second fiddle to express preemption in the caselaw as of late, it is the more appropriate consideration when … enforcement of state law undermines federal copyright law.” Therefore, even if the state law claim is not within the scope of copyright law, conflict preemption can exist when enforcing the contract would be “an obstacle to the accomplishment and execution of the full purposes and objectives of Congress.” Crosby v. National Foreign Trade Council, 530 U.S. 363 (2000), at 373.
Alsup also emphasizes that conflict preemption is particularly important when the contract to be enforced is a standard form contract, as opposed to a contract negotiated between two parties. This kind of one-to-many relationship looks closer to what copyright law was intended to govern, whereas one-to-one contracts allow parties to negotiate a different balance of rights if they desire to do so.
The opinion goes on to list three ways that enforcing a contract prohibiting scraping data would undermine the policy of copyright.
- Copyright empowers copyright owners to exclude others from reproducing, adapting, distributing, and displaying their copyrighted works. But X did not own the copyright to the user-generated content (UGC) on its site; its terms of service, unsurprisingly, only grant X a non-exclusive license. “X Corp.’s state-law claims based on scraping and selling of data would empower X Corp., as a non-exclusive licensee, to exclude others from reproducing, adapting, distributing, and displaying X users’ copyrighted content”—even though X users licensed their copyrighted content to X to make it freely available. Enforcing the contract would take the power to enforce the copyright away from its true owners.
- Similarly, enforcing the contract would interfere with the copyright doctrine of fair use, which grants everyone the right to use copyrightable works in ways that encourage creativity and other policy benefits.
- Last, enforcing the terms would upset the balance of copyright law, which is a “scheme of carefully balanced property rights that give authors and their publishers sufficient inducements to produce and disseminate original creative works and, at the same time, allow others to draw on these works in their own creative and educational activities.” Goldstein on Copyright § 1.14 (3d ed. 2023).
Accordingly, the court ruled that X’s claims under its terms of service were preempted by copyright law, to the extent based on scraping of data.
Sauce for the ML is Sauce for the OSS
This case could have significant implications for the tech world. First, it could create opportunities for those training AI models to scrape content from websites, regardless of contrary prohibitions in the sites’ terms of service. Of course, site operators with treasure troves of data will still have an advantage over scrapers. Even if site owners cannot entirely prevent others from scraping UCG, they can sell preferential access to their APIs. Moreover, the reasoning of the decision might not hold for sites whose content is not primarily UCG.
But second, it could have implications for open source enforcement. For decades, enforcement of open source licenses has been prosecuted under copyright law. The pending SFC v. Vizio case is an attempt to avoid this avenue and bring an action under contract law. In that case, Software Freedom Conservancy brought an action for violation of GPL based on a pure contract theory, seeking specific performance of the contract (i.e. an order to release source code) and not seeking any damages or other copyright remedies. Specific performance is a primarily contract remedy–a rare one at that–and is nearly unheard of under copyright law. The defendant moved to remove the action to federal court based on copyright subject matter and preemption, but lost that battle. The claim was was bounced back to state court, where it currently awaits trial.
The Vizio case, like the X case, is in the 9th Circuit. Like terms of service, open source licenses are one-to-many arrangements, and as in the X case, the plaintiff is not the author. Alsup’s shift of focus to conflict preemption could provide a basis for appeal, or otherwise influence the outcome of that case.
