PHP License Metamorphoses to BSD

The PHP project announced it is moving to a new license.

PHP is a scripting language used for web development. It can be embedded within HTML and used to create dynamic web pages. It is the “P” in the LAMP stack (Linux, Apache, MySQL and PHP)–although some people use the P to refer to Python or PERL. The Zend Engine, the core of PHP.

For years, PHP as a whole has been offered under the PHP License and the Zend Engine License–both permissive licenses. The PHP License is OSI approved, but the Zend Engine License is not. The Zend Engine License has specific naming restrictions related to “Zend” and “Zend Engine,” sometimes referred to as advertising clauses or attribution clauses. Such restrictions were common in early permissive licenses like Apache 1.0, but have since been deprecated by the open source community and do not appear in most recent permissive licenses.

License changes can be a challenge, because unless a project uses a contribution license agreement (CLA), it must get permission from all contributors to change the license for their contributions. In major projects, this has been done a few times, such as the Wikipedia migration and the OpenSSL change, but it’s a big project that can require broad socialization and risk that a contributor will object. These changes usually take place with popular projects whose licenses are outdated, ad hoc, and confusing.

But PHP has found a neat trick to avoid having to get permission from every contributor. Like many open source licenses, the PHP license allows the license steward to issue new versions.

  5. The PHP Group may publish revised and/or new versions of the
     license from time to time. Each version will be given a
     distinguishing version number.
     Once covered code has been published under a particular version
     of the license, you may always continue to use it under the terms
     of that version. You may also choose to use such covered code
     under the terms of any subsequent version of the license
     published by the PHP Group. No one other than the PHP Group has
     the right to modify the terms applicable to covered code created
     under this License.

Apparently, PHP, as license steward, is redefining its own license as the BSD license. The announcement says that the BSD License will be adopted as the PHP License v.4 and as the Zend Engine License v. 3.

Meta Wins Partial Summary Judgment in AI Infringement Claim

On the heels of the landmark judgment in favor of Anthropic this week, a judge in another pending AI copyright case, Kadrey v. Meta, ruled for the defendants.

Thirteen authors, including most notably Sarah Silverman, sued Meta for using their copyrighted books, downloaded from “shadow libraries,” to train its large language model (Llama). The court explained, “A shadow library is an online repository that provides things like books, academic journal articles, music, or films for free download, regardless of whether that media is copyrighted.” The most notorious of these is called The Pile.

Even though Judge Chhabria ruled for the defendants, the language of his opinion was extremely favorable to the plaintiffs. The court said, for example: “[B]y training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way.” This statement points to the final and most important factor of fair use–effect on the market for the original work–and suggests that, if the case were argued correctly, this factor would weigh in favor of infringement.

The plaintiffs had argued that Llama could reproduce snippets the text of their works, and that Meta’s unauthorized training diminished their ability to license works for AI training. However, the court stated that “Llama is not capable of generating enough text from the plaintiffs’ books to matter, and the plaintiffs are not entitled to the market for licensing their works as AI training data.”

Keep in mind that this same judge had stated in a pervious hearing on this case, “I understand your core theory. Your remaining theories of liability I don’t understand even a little bit.” https://www.reuters.com/legal/litigation/us-judge-trims-ai-copyright-lawsuit-against-meta-2023-11-09/

The court implicitly lamented that the plaintiffs did not assert sufficient facts to withstand summary judgment, noting, “Because the issue of market dilution is so important in this context, had the plaintiffs presented any evidence that a jury could use to find in their favor on the issue, factor four would have needed to go to a jury.”

The court strongly hinted that similar cases could benefit from better advocacy. “As for the potentially winning argument—that Meta has copied their works to create a product that will likely flood the market with similar works, causing market dilution—the plaintiffs barely give this issue lip service, and they present no evidence about how the current or expected outputs from Meta’s models would dilute the market for their own works.” This is what one might call a playbook for bringing a more successful claim.

Given the state of the record, the Court has no choice but to grant summary judgment to Meta on the plaintiffs’ claim that the company violated copyright law by training its models with their books. But in the grand scheme of things, the consequences of this ruling are limited.

This particular case is not quite over yet. But removing the infringement claims is a significant win for the defense.

It may be no coincidence that this case came on the heels of Judge Alsop’s opinion only days ago. The order in this Meta case referred specifically to Judge Alsop’s opinion, disagreeing with some of his fair use analysis.

AI Training Ruled Fair Use

This week, in Bartz v. Anthropic, Judge Alsup (Northern District of California) ruled that training AI large language models (LLMs) on lawfully acquired works of authorship is fair use.

This is a landmark ruling by the highly respected judge, who handled the Oracle v. Google case.

Infringement claims regarding AI come in two basic flavors: that the act of training is infringement, and that the AI producing output similar to the input is infringement. This ruling is only about the first flavor–the training stage.

Two Acts of Copying

In this case, the defendant purchased copyrighted books, tore off the bindings, scanned every page, and stored them in digitized, searchable files. (This is called destructive scanning, which is faster and easier to do than non-destructive scanning that preserves the original book.) It used selected portions of the resulting database to train various large language models. But Anthropic also downloaded many pirated copies of books, though it later decided not to use them for training. These copies were retained in a digital library for possible future use.

The plaintiffs are authors of some of the books.

Anthropic moved to dismiss the claims based on fair use, and Alsup found the act of training to be transformative, one of the key factors in modern fair use doctrine. Regarding transformation, Alsup cited the Google Books case, one of the key decisions on fair use in the digital age. (Authors Guild v. Google, Inc., 804 F.3d 202, 217 (2d Cir. 2015)).

The Fair Use Analysis

Fair use is analyzed according to four non-exclusive factors set out in 17 USC 107. On the first factor of fair use, the court distinguished between scanning and pirating activities. The court called the destructive scanning of the books a “mere format change,” which supported a finding of fair use. The purpose of the copy was to support searchability. Anthropic only ended up with the digital copies, not the books.

Before buying the physical books, Anthropic “downloaded over seven million pirated copies of books, paid nothing, and kept these pirated copies…even after deciding it would not use them to train its AI.” The court viewed this differently from the scanning: “Such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.” The court was not convinced by Anthropic’s argument that the use would ultimately be transformative. Citing the recent Warhol case, the order says, “what a copyist says or thinks or feels matters only to the extent it shows what a copyist in fact does with the work.”

The last of the factors in a fair use analysis–usually considered the most important factor–is the effect of the otherwise infringing activity on the market for the original work. The court said, “The copies used to train specific LLMs did not and will not displace demand for copies of Authors’ works, or not in the way that counts under the Copyright Act.” But this was only for the purchased copies; the court reached the opposite conclusion for the pirated copies.

What’s Next?

The case can now proceed to trial only for the pirated copies. For the purchased books that were destructively scanned, the claims were dismissed.

This case is a class action, and the motion for class certification is still pending. If the class is not certified, plaintiffs often give up or settle for small amounts. Law firms that specialize in bringing class actions depend on a class certification of a large class to increase damages, and accordingly, their fees.

There are about 40 pending cases in the US on AI and copyright, and many of them may have suffered a setback with this opinion. Alsup’s opinion is in line with what many copyright commentators (including me) have proposed: that training is lawful if done with lawful access to the training material. The decision of a district court will not bind cases pending in other districts. However, because Alsup is a well-respected jurist, his analysis may persuade other courts to follow suit.

The court did not reach the second flavor of infringement claims regarding output, because it was not at issue here. But many commentators are skeptical that such claims will be successful for properly trained models. ML models typically do not produce “copies” in the sense intended by the copyright law. Claims regarding output may therefore be relegated to trademark, publicity and trade dress claims, which are outside of the ambit of copyright law.

Postmodern Art and Cannabis Law

I’m intrigued as to why an article about cannabis law cites to an article I wrote over thirty years ago about copyright fair use in postmodern art. But the Tulsa Law Review has paywalled their prestigious journal, forcing me to pay if I want to find out, and honestly, I’m not quite that intrigued.

By the way, you can download my article for free, and there is even an update here.

AI Could Be Your Next Team for Clean Room Development

Clean room developments are necessary when a developer wants to “cleanse” the intellectual property burden of third party software. The need arises when third party software is provided under unacceptable license terms, or not licensed at all. This is one of the trickiest tasks in software development, but it has a long history of best practices.

The canonical clean room development seeks to avoid trade secrets of proprietary software. But the rise of open source has resulted in the need to do a different kind of clean room project, meant to avoid the copyright in open source software–usually for GPL licensed packages. The two situations call for a slightly different approach. A clean room process for proprietary code seeks to avoid trade secrets and copyright burdens, whereas clean room development in open source is entirely about copyright–because there are no trade secrets in open source software. In either case, a team of developers seeks to write new implementing code from scratch, so that code will perform the same tasks, with the same inputs and outputs, as the original or “target” code.

A traditional clean room development process looks something like this:

  • Separate Development Teams: Create two teams of developers: a specification team that works on specification development for the target code, and an implementation team that writes the new implementing code. 
  • Create a Specification: The specification team, which has access to the target code, extracts the specifications for the software’s requirements and expected behavior. Software, at the end of the day, is a set of inputs and outputs, and its specifications state what outputs you should expect when certain inputs are used.
  • Reimplementation: The implementation team writes the new software according to the specification developed by the specification team. This must be done in an environment that is “cleansed” of the target code. Ideally, the implementation code has never read the target code.
  • Verification: The development team tests the newly implemented clean code. If there are bugs, the specification team can only confirm the accuracy of the specification. The specification team cannot suggest bug fixes, because that might result in inadvertent copying. Bug fixes are done by the implementation team.
  • Iterate: Repeat until the development is done.

Of course, there are far more complex processes for clean room development. Some have three teams, and most have a lot more steps. I have seen guidelines so many pages long they have a table of contents. But the above is the essence–not to mention the most my clients have the patience to read.

Not Enough Humans

The problem most companies have when performing a clean room development is that they don’t have the resources to create two separate teams. Even if they do, they usually cannot create an implementation team that has never been exposed to the target software–and doing so is particularly difficult when the target software is open source, because there is no way to prove lack of access to publicly available materials. For an open source clean room process, we usually make do with developing implementing code in an environment that does not have local access to the target code.

But now, with the advent of AI, we have an alternative way to approach clean room development.

I pause here to note that while there are those who think that all generative AI is prima facie copyright infringement, I don’t agree. As long as the model has been trained on enough inputs, it should not parrot any one input. (More on that here.) So let’s set that issue aside, because if you disagree with me, you shouldn’t be using AI coding tools at all and you should just put this article aside.

An AI that writes code (like Claude or Co-Pilot) has probably been exposed to almost all the open source code ever written. But via that training process, it is unlikely to focus on specific target code. So, companies struggling to staff a clean room development might consider replacing one or both of the teams with AI.  As always, some human oversight is necessary to check that an AI generative process has been done correctly. But using it would still greatly reduce the headcount necessary to implement the clean room process.

  • Specification Team. AI is better at some tasks than others, but I have found that AI is quite good at summarizing text. If you ask it to write the specifications for target software, it will probably do a good job. You could use an AI for your specification team, and that would help avoid “contaminating” your implementation team with access to the target software.
  • Implementation Team. AI is quite good at writing code, though more human oversight would probably be necessary to use AI for this purpose. AI-assisted coding still requires human curation, and also usually requires human debugging. Debugging is a complex logical task, and the current flavor of AI–transformation based models–are better at text generation than logic. But in a pinch, you might use AI as your implementation team and use the specification team for quality control.

Neither of these suggestions should be surprising. AI code generation greatly reduces the human effort necessary to produce code, and clean room projects are human-intensive. For an open source target, I think the use of AI as a specification team is quite interesting. For proprietary code, using AI as the implementation team may be particularly interesting, because AIs are mostly not trained on proprietary code, making the cleansing more reliable.

Always remember: wash your hands before you code!

How to Get Lawyers to Use AI

Lawyers love to talk about using new technology to make their practice more efficient. All during my career, I’ve heard law firms touting the great benefits or knowledge management, AI (generative and non-generative) research and writing tools, and of course, getting lawyers to do their own administrative work. But unfortunately, they are better at talking about it than actually doing it. Perhaps it is the billable hour model, which rewards inefficiency, or just that most lawyers aren’t all that tech-savvy.

So of course, when ChatGPT and generative AI snatched the attention of the world a couple of years ago, lawyers started speculating about how sophisticated LLMs would change legal practice. Some opined we would all be made irrelevant and rendered unemployable. (Well, not yet.) Some announced grand partnerships with AI providers–though I would be skeptical that has reduced any client bills much. Some think we can use AI to do better, faster work. In that respect, faster is more likely than better. AI, it turns out, is not so good at quality control, and lawyers are all about quality control.

Technological breakthroughs can have profound changes in our world, but they tend to happen more gradually than click-bait articles would have us believe. I remember, once, wondering how people would ever accept self-driving cars. But we didn’t exactly. We accepted cruise control, then automatic parking, then lane-keeping assistance, and automatic braking. The autonomous features we accepted just keep expanding.

AI in legal practice is the same. Slowly, AI has crept into our practices. I doubt that I am much different from most lawyers in this way. Here are some of the tasks for which I have used AI in the past couple of years:

  • Creating first drafts of articles and memos. Even if I end up re-writing the entire thing, a first draft is useful and can be thought-provoking. I would equate ChatGPT, in this respect, with a first year associate. (But sometimes, I am sad to say, with better grammar.) But a first draft is about as far as it goes. The LLMs have no savvy, no creative ideas, and no experience. They can’t put two and two together and make five–well, they can, but they think that’s accuracy, rather than synergy.
  • Summarizing contracts or other documents. LLMs are useful for creating summaries of documents. In this respect, they are usually quite accurate. So, I can use them as a double-check for my own reviews or summaries, or as a basis to draft them. A word to the wise: if you do this, be sure not to feed confidential information into the model, unless you have a paid access account that keeps in the input confidential and prohibits training on it. Disclosing confidential documents to an LLM would usually be a violation of client confidentiality.
  • Legal research. General purpose LLMs are not very good at most conventional legal research–finding court opinions, statutes, and similar sources. That is because they have no sense of what is authoritative, other than consensus of their training data. LLMs can be useful for factual research; these days I find myself using the Google AI-driven search quite a bit. I particularly appreciate that it cites to outside sources so I can check the legitimacy and reliability of the information.

Now…the Part I Didn’t Expect

Recently, a client sent me an agreement to review. It was a simple agreement, and not badly written. I asked where the agreement came from, and my client admitted they had used an LLM to write it.

But this did not bother me. Clients send their lawyers form agreements all the time, saying, can we use this? They usually seem to think this saves legal fees, even though it does not really do that. Until now, the documents they sent were usually something they copied from the web–often without even changing the names–and often wildly inappropriate for the situation.

In contrast, the LLM-generated agreement sent to me was on point, contained no typos, and had most of the relevant terms. It took me about half an hour to fix it up for use–probably about the same time it would have taken me to start from one of my own forms, and probably less time than fixing up some piece of dreck that was codged from the web. Perhaps I am unusual in that I would rather not spend my hours, nor to charge my clients for the hours, spent fixing up badly written stuff from the web.

But it did make me realize how lawyers will finally be dragged into using AI, rather than just pontificating about it. It won’t happen from their own volition, or even due to pressure from clients to be more efficient. It will because their clients will start using auto-generated legal documents, and the lawyers will be expected to fix them, or explain why they are not good enough to both fixing.

Welcome to 2025!

Thomson Reuters Wins a Victory Against Both AI and Public Access to Law

Recently, an opinion was published by a magistrate judge in Delaware, Stephanos Bibas, in Thomson Reuters v. Ross Intelligence, a case filed in 2020 about AI training.

For most of us who have been following the spate of ongoing cases about generative AI, this case was not too high on the radar, because it’s about regular AI: substantive number crunching. Moreover, the case was filed in 2020, long before all the hoo-hah about ChatGPT started. Apparently, in the depths of a societal shutdown from COVID-19, Thomson Reuters decided it needed to shore up its putative rights in public domain legal texts.

I have to confess that it took quite a while to sit down and read this opinion, because I just didn’t want to. I think the legal reasoning in this opinion is not correct, but that’s not why it took me so long. The effort of the legal publishers to restrict access to primary law sources causes me great dismay. So, if you want objective legal analysis, you should probably read some of the other commentators who have written about this opinion.

Legal Opinions are Freely Available

In the US, legal opinions written by courts are in the public domain–they enjoy no copyright protection. That is as it should be. Everyone should be able to read the law and decide for themselves what it means. And while it’s true that access to primary information is usually not enough, and those with the money to hire the best-dressed lawyers often win legal battles–more about that below–free access to primary legal sources, like court opinions and statutes, is table stakes for political liberty.

I graduated law school in 1994, which was just on the cusp of widespread access to the Internet. So, I learned to research the law in books–tree-killing, heavy, expensive hardback books called case reporters. While the online legal research tools, Westlaw and LEXIS, were available to us then, they were viewed with skepticism. And they were rudimentary, expensive, and often more confusing than the books. So, a library, like the one at UC Berkeley where I attended, merely had to buy the case reporters, and then anyone was free to read them.

Because legal opinions are usually long and tedious to read, case reporters include micro-summaries printed at the front of the case text, called headnotes. And with all due deference to whatever poor underemployed lawyers create the headnotes, they are not what you would call poetry. They are summaries of individual nuggets of law, intended to point a researcher to actual language in the case. They are like a very wordy index. No competent lawyer would use a headnote standing alone. Headnotes are merely entry points to the official legal text. And for the most part, they either quote or slightly rephrase the source material.

Here is a (made up) example from the opinion:

Headnote:

Originality, for copyright
purposes, means that the
work was independently
created and has some minimal degree of creativity.

Opinion:

Original, as the term is
used in copyright, means
only that the work was
independently created by
the author (as opposed to
copied from other works),
and that it possesses at
least some minimal degree
of creativity.

Although this is a hypothetical example, it is fairly typical of what headnotes contain: a verbatim or slight rephrasing of what the court says in the opinion.

A Doomed Business

Since the mid-1990s, the legal publishers have struggled. After all, their business model is almost completely obsolete. Most legal text is available online, free of charge. Since that time, legal publishers have been on a campaign to squeeze the last remaining value out of their businesses, in part by suing people for infringing their so-called copyrights in what they create: headnotes and page numbers.

But the truth is this: no one needs headnotes anymore. With sophisticated pattern-matching search, not to mention AI-enhanced search, most research can be done on the web. If you want to find legal opinions on the web, for example, there is the Free Law Project, which even allows access via REST APIs. (I am a member for the princely sum of $10 a month.) And there is Google Scholar, which is free, and even recommended by the Library of Congress. Not to mention FindLaw and Justia. And today, search technology is good enough so that it’s pretty easy to find the law online free of charge.

The opinion in this case says, “The law is no longer a brooding omnipresence in the sky; it now dwells in legal research platforms.” This portends the opinion’s ill-advised conclusions. In truth, the law is indeed an omnipresence in the sky–it’s on the world wide web. And it is not, and never should be, confined to “legal research platforms.” The advent of the web should have freed us from the firewalls of legal research and democratized access to the law.

So, the legal publishing business, which once carved out a living by filling a practical gap–the difficulty of the body politic getting access to paper legal documents–is now simply unnecessary. And the entire business of paywalling access to primary legal sources should have died a graceful death.

If you are of a mind to lament businesses toppling due to the advance of technology, I refer you to this blacksmithing video. Blacksmithing is fascinating to watch, but I don’t want to fire up a forge every time I want a new kitchen utensil, and I don’t want to pay a thousand bucks for a hand-made skillet. Also, just imagine what an outcry there would be if a big, bad tech business like Google tried to firewall access to public domain legal documents. Sacrilege! Techno-oligarchy! But when a dying business like Westlaw tries to do this, it’s perfectly fine, because…authors’ rights!

…Nevermind

The opinion begins, “A smart man knows when he is right; a wise man knows when he is wrong.” The judge proceeds to overturn his own opinion on the exact same topics from 2023.

For those of you who are not lawyers, I want to convey how bizarre it is for a federal judge to reverse an existing opinion without one of the parties appealing the decision. It just doesn’t happen–and for good reason. The job of judges is to arbitrate disputes. Judges don’t just decide what they think we ought to hear. They answer motions brought by litigants. But in this case, the judge just literally changed his mind of his own accord, between 2023 and 2025. Otherwise known as the ChatGPT era.

This is the first signal that something is rotten in Denmark. But then, we get to the substance of the opinion.

Headnotes are Copyrightable Material?

If you have your doubts about headnotes being protectable under copyright when reading the above example, you are not alone.

This court, in an ominous foreshadowing of its misguided conclusion, says “A headnote is a short, key point of law chiseled out of a lengthy judicial opinion.” It proceeds to analogize the creation of headnotes with the work of a sculptor–like Michelangelo, who famously (and probably apocryphally) said, that to carve an elephant, you just chip away the part of the marble that doesn’t look like an elephant.

More than that, each headnote is an individual, copyrightable work. That became clear to me once I analogized the lawyer’s editorial judgment to that of a sculptor. A block of raw marble, like a judicial opinion, is not copyrightable. Yet a sculptor creates a sculpture by choosing what to cut away and what to leave in place. That sculpture is copyrightable. 17 U.S.C. §102(a)(5). So too, even a headnote taken verbatim from an opinion is a carefully chosen fraction of the whole. Identifying which words matter and chiseling away the surrounding mass expresses the editor’s idea about what the important point of law from the opinion is….So all headnotes, even any that quote judicial opinions verbatim, have original value as individual works. That belated insight explains my change of heart. In my 2023 opinion, I wrongly viewed the degree of overlap between the headnote text and the case opinion text as dispositive of originality…. I no longer think that is so. [Emphasis added]

But creating a headnote is not like carving a work of art from a block of stone. It’s more like taking a core sample from stone to see if it has gold in it. Sure, there is skill in deciding where to drill. But pointing to the gold is not expression. And neither is the sample that results.

The 2023 opinion in this case said this issue should go to a jury, “to decide whether the headnotes … were original enough” for copyright protection. That was the right result. This new result takes this conclusion away from the fact-finder, and hands the plaintiff a potentially huge victory.

What Will Happen?

One thing that seems clear here is that AI may die an ignominious death if the decision is used as precedent for other cases. Don’t get me wrong–I am not a supporter of this particular defendant. I hope that machine learning models to analyze the law are developed and offered free of charge to everyone–not by a defendant that couldn’t figure out how to train an AI without using headnotes. I care more about the next defendant in the next case, who tries to train AI to analyze public domain material, and gets slapped down by an aging business sector that makes its living aggregating that material and is only trying to forestall its inevitable death at the hands of innovation.

Do you think the plaintiff has the resources to create great AI for legal research? That’s unlikely. They are an aging business, and AI training is expensive. But this opinion says that “it does not matter whether Thomson Reuters has used the data to train its own legal search tools; the effect on a potential market for AI training data is enough” to deny the defendant the right to bring a fair use defense to a jury.

Unfortunately, this particular defendant probably will not have the resources to keep defending this claim. Ross Intelligence closed down its product in 2021 because of this lawsuit. Contrast the fact that Thomson Reuters used Kirkland and Ellis, one of the top IP firms in the country–and one of the most expensive–on this case. So the outcome does not even benefit the plaintiffs much here, it just threatens AI development everywhere.

Perhaps some other AI developers with deep pockets might step in to help fund a legal war chest to overturn the decision. This decision not only suggests that AI training is not fair use, but that the question of fair use is a matter for a judge, and would not even go to a jury. Fair use is likely to be a significant defense in many of the cases currently pending about AI. In Oracle v. Google, a fair use case that went on for over a decade and costs many millions, we saw how vulnerable fair use is to rights-holders with big war chests. In that case, similarly, the Federal Court of Appeals tried to take the decision making on fair use away from the jury. Luckily, the Supreme Court thwarted that attempt, and we were all lucky that the defendant, Google, had the money and the gumption to fight that case for over ten years. Otherwise, most APIs would be paywalled now.

Do not celebrate this as a victory for authors’ rights. Headnotes are not fine art. This opinion is a mistake, and I hope it is corrected.

And All the Rest…

There is a lot more to say about this opinion, and I have not attempted to analyze it all here. For example, the copyright protection of the Key Number System even more absurd than protection for headnotes, and yet this opinion seems to allow for it. The opinion glosses over countervailing law in the Oracle v. Google case, saying dismissively that “those cases are all about copying computer code. This case is not.” It also comments, with little analysis, that AI training is not transformative–which is probably is. (The name of the current method for machine learning training is in fact the transformer method, and the name is apt. Machine learning models look nothing like their inputs.) The transformation test usually wins the fair use question. Also, the opinion comments that each headnote is an individual copyrightable work, a scary piece of dictum that could result in precedent for eye-watering statutory damages, for which a court can award up to $150,000 per work.

Also, those trumpeting this opinion as a victory for authors will no doubt ignore that this case only involves one side of the question: the input side. For generative AI, the main issue is probably the output side. The opinion itself says, “Because the AI landscape is changing rapidly, I note for readers that only non-generative AI is before me today.” But that statement will be roundly ignored. By writing “for readers,” this judge knew very well that this decision would create an avalanche of conclusions that AI training is infringing. And the media wants a win, and so that’s what it will report.

9th Circuit Clarifies Derivative Works of Software: Oracle v. Remini Street

A new opinion from the 9th Circuit, filed December 16, 2024, may change the way we think about copyleft licensing. It’s rare for the courts to opine on what constitutes a derivative work under copyright law, even less so for software. The interpretation of copyleft licenses–GPL and AGPL in particular–turn heavily on the notion that integrating code creates a derivative work. This idea is particularly bound up in the text of GPL version 2, which uses the term derivative work liberally (albeit inconsistently) as a basis for exerting control over licensing of integrated libraries.

Two Software Giants

Remini Street and Oracle are direct competitors engaged in a long and expensive legal war. Remini Street offers support services for various Oracle software, notably PeopleSoft software (which handles HR and similar tasks). Oracle wants to stop them.

Oracle first sued Rimini for copyright infringement in 2010. In a prior leg of this legal journey, the trial court found that Rimini infringed on Oracle’s copyrights by engaging in “cross use.” (This term is not common, but it apparently means applying software updates to multiple users.) Rimini then updated its business practices and sought a declaratory judgment that its revised “Process 2.0” did not infringe Oracle’s copyrights. Oracle counterclaimed for copyright infringement, seeking more than one billion dollars in damages.

At summary judgment, the trial court held that Rimini had infringed Oracle’s PeopleSoft copyrights and that an update created for the City of Eugene’s PeopleSoft implementation was a “derivative work” of Oracle’s software. The district court then entered a permanent injunction against Rimini, ordering it to delete certain software files.

The trial court held that Rimini-written files and updates developed were infringing derivative works because they only interact and are useable with Oracle software. (Oracle Int’l Corp., 2023 WL 4706127, at *66.) In effect, the district court adopted an “interoperability” test for derivative works—if a product can only interoperate with a specific copyrightable work, then it must be derivative of that work.

However, the 9th Circuit disagreed, saying “Mere interoperability isn’t enough.” The court distinguished translations, abridgments, and other examples of derivative works, named in the copyright statute’s definition, from interoperable software that does not substantially incorporate the original work, noting, “Whether a work is interoperable with another work doesn’t tell us if it substantially incorporates the other work.” It recited that “Rimini claims that thousands of its files fall into this category—programs that are interoperative with Oracle’s PeopleSoft but do not contain Oracle’s copyrighted code. Without more, mere interoperability isn’t enough to make a work derivative….Instead, a derivative work must actually incorporate Oracle’s copyrighted work, either literally or nonliterally. …[S]imply being an extension or modification of a copyrighted work without any incorporation is not enough to create a derivative work.”

The court declined to reach a conclusion on two other interesting points of copyright law in software licensing: whether Oracle’s customers are ultimately owners or licensees of their copies of Oracle’s software, and whether Rimini could rely on the “essential step” defense under § 117(a)(1), which allows running of software without a copyright license. Both of these are key to enabling third parties to host and manage software for customers who have purchased licenses from software vendors.

Effect on Copyleft

This new opinion is potentially relevant to the interpretation of GPL, particularly GPL version 2. One premise of copyleft is that the license can only control activity that requires a copyright license. Otherwise, there is no need to adhere to the license terms. But when GPL 2 was written, exactly what copyright covers was unclear. So, downstream developers of GPL have always pondered this question: if I create a library that interoperates with the GPL program, but is a separate dynamically linked library, must it also be provided under GPL or a compatible license?

This question was first brought into focus by developers who wanted to create proprietary loadable kernel modules (LKMs) for the Linux kernel. LKMs are, by definition, within the same executable process as the kernel, which is covered by GPL, but separate, dynamically linked, library files. (Similar questions exist for, say, libraries in user space that operate with GPL programs in user space, but the LKM question is the quintessential embodiment of this question, so I use it here as a proxy for the larger question.)

In particular, the practical question is whether proprietary LKM developers can distribute their LKMs, allowing customers to run them in a “meal kit” manner. The argument goes like this:

  • Customers all enjoy the rights of GPL, and therefore can use the GPL-licensed kernel freely.
  • Vendors enjoy the rights of GPL, and can distribute the kernel under GPL freely.
  • LKMs are separate files from the kernel, not built into the same executable except by reference at runtime.
  • This reference requires reference to a kernel API.
  • While the customer may create a derivative work by combining the kernel and LKM at runtime, creating a derivative work does not trigger source code sharing–only distribution does that.
  • The LKM’s use of the API does not create a derivative work.
  • Therefore, neither the vendor nor the customer is violating GPL.

The opposite position is that an LKM is a derivative work, regardless of whether it is distributed separately. This turned on the idea that in order to be interoperable, the LKM library had to include a portion of the kernel–its relevant API–which made it derivative.

This new opinion puts paid to the idea that an LKM is necessarily a derivative work, at least in the 9th Circuit. But it leaves open the question of whether an LKM might contain substantial portions of the kernel and therefore still be derivative. Indeed, any LKM would contain some elements identical to its interoperable code–such as interface definitions (APIs) or header files. No US court has created precedent on whether those are necessarily substantial.

However, we got close, once. Functional elements of code enjoy no copyright protection. In the past, many have reasoned that an API is, by definition, functional. This question was at the forefront of the Oracle v. Google lawsuit, where Google was accused of copying portions of the API for Java. The trial court (Judge Alsup, also in the 9th Circuit) held that APIs are not protectable, and therefore copying an API was not infringement, However, that ruling was later overturned (on appeal in the Federal Circuit), causing that case to go into a different doctrinal direction.

It’s my view–and I doubt I will be alone–that it is functionally identical to say that (a) per Remini Street, mere interoperability does not create a derivative work, and (b) per the Alsup opinion in Oracle v. Google, APIs standing alone are not protectable. The two opinions yield the same result, via different doctrinal paths. So perhaps, this new opinion restores the conclusion that many believed correct in the Alsup opinion.

Will it Matter?

Oracle v. Google was the last time a US court of appeals issued a meaningful opinion on the topic of software derivative works. That the Federal Circuit and 9th Circuit would come to different conclusions will surprise no keen follower of copyright law. The Federal Circuit, specially created to hear patent cases, is known to be exceptionally friendly to broad interpretations of intellectual property rights; the 9th Circuit, less so.

While this new opinion might put added wind into the sails of those making the time-honored LKM argument, it may not ultimately matter. This question has been a legal unknown for decades. In the meantime, most developers of LKMs have released them under GPL for practical reasons. That’s because, in all these years, the ambiguity of this legal question allowed GPL advocates, and community pressure, to settle the issue de facto. Most people stopped bothering with the LKM argument years ago.

Moreover, the Oracle v. Remini Street case is not over. The opinion does not dispose of the whole case, and is only precedent in the 9th Circuit. But opinions on topics like these are so rare that this one is likely to influence later courts who may examine the definition of software derivative works.

(Note: This blog post quotes heavily from the court’s opinion. Items with quotation marks are from the opinion, but other portions of this post are not marked with quotation marks. I did this for ease of reading. Also, the opinion contains related discussion of Lanham Act claims and license interpretation, which I have omitted.)

Anthropic Settles a Small Part of an AI Copyright Claim

This week, there was some interesting movement in one of the many copyright claims relating to AI.

Anthropic, the maker of Claude (my personal favorite AI for coding!) reached an agreement to settle parts of an ongoing lawsuit, Concord Music Group, Inc. v. Anthropicfiled October 18, 2023. The case has a long list of music publishers as plaintiffs. The current docket is available in RECAP here.

The court approved a settlement, but it is limited in scope. The settlement resolves only one aspect of the injunctive remedies sought by the plaintiffs. Anthropic agreed to:

  • Maintain certain “already implemented Guardrails” in its current products that are designed to prevent the model from reproducing lyrics
  • Apply these Guardrails on its new releases
  • A process allowing the plaintiffs to notify Anthropic that Guardrails are not effectively preventing the output

The plaintiffs’ demand for a preliminary injunction to bar Anthropic from training future models on their song lyrics is still pending.

JavaScript Trademark Challenge

This week, as a Thanksgiving present to the tech industry, Deno Land petitioned the US Patent and Trademark Office to cancel Oracle’s JavaScript trademark.

JavaScript–which helpfully has little to do with Java–is a programming language for client-side web applications. Most of the interaction between you and websites, like filling in forms in e-commerce sites, uses JavaScript. It is a ubiquitous language. JavaScript was developed by Sun Microsystems and Netscape Communications, in the 1990s. The trademark was acquired by Oracle Corporation in its 2009 acquisition of Sun Microsystems.

Deno Land is the company behind the Deno project, a modern JavaScript runtime. A JavaScript runtime is the environment where JavaScript code is executed, including outside of a web browser. The most famous JavaScript runtime is Node.JS, which was developed by Ryan Dahl, the founder of Deno Land.  Node.JS is also wildly popular.

Programming languages usually don’t enjoy much trademark protection, because they tend to be enabling technologies rather than commercial products. Most popular programming languages have accepted standards for their development that are set by standards organizations rather than single owners. After all, programming languages are much more useful if many developers use them, instead of a few. A trademark is the way to distinguish the official source of products, and for successful programming languages, this role typically falls to standards bodies rather than any one organization. JavaScript, for example, is governed by the ECMA-262 standard for ECMAScript. 

Brandon Eich, one of the key developers of JavaScript, quipped that the name “ECMAScript” was a compromise between the organizations involved in standardizing the language, especially Netscape and Microsoft, whose disputes dominated the early standards sessions. “ECMAScript was always an unwanted trade name that sounds like a skin disease.” If the mark is declared generic or abandoned and no longer owned by Oracle, there is hope that the standard could finally be named after JavaScript.

The petition alleges primarily that the trademark has been abandoned, and that JavaScript is now a generic term, and not associated with Oracle. Neither Sun nor Oracle ever really exploited JavaScript as a commercial product, but Oracle has, according to the petition, engaged in enforcement activity to prevent others from using the name–thus endearing itself to the tech community in its usual fashion.

The petition says:

An open letter to Oracle discussing the genericness of the phrase “JavaScript,” published at https://javascript.tm/, was signed by 14,000+ individuals at the time of this Petition to Cancel, including notable figures such as Brendan Eich, the creator of JavaScript, and the current editors of the JavaScript specification, Michael Ficarra and Shu-yu Guo. There is broad industry and public consensus that the term “JavaScript” is generic.

It’s time for Oracle to do the right thing and relinquish its hegemony here. JavaScript has enough trouble with its “THIS” keyword, without a trademark burden to make it worse