AI Training Ruled Fair Use

This week, in Bartz v. Anthropic, Judge Alsup (Northern District of California) ruled that training AI large language models (LLMs) on lawfully acquired works of authorship is fair use.

This is a landmark ruling by the highly respected judge, who handled the Oracle v. Google case.

Infringement claims regarding AI come in two basic flavors: that the act of training is infringement, and that the AI producing output similar to the input is infringement. This ruling is only about the first flavor–the training stage.

Two Acts of Copying

In this case, the defendant purchased copyrighted books, tore off the bindings, scanned every page, and stored them in digitized, searchable files. (This is called destructive scanning, which is faster and easier to do than non-destructive scanning that preserves the original book.) It used selected portions of the resulting database to train various large language models. But Anthropic also downloaded many pirated copies of books, though it later decided not to use them for training. These copies were retained in a digital library for possible future use.

The plaintiffs are authors of some of the books.

Anthropic moved to dismiss the claims based on fair use, and Alsup found the act of training to be transformative, one of the key factors in modern fair use doctrine. Regarding transformation, Alsup cited the Google Books case, one of the key decisions on fair use in the digital age. (Authors Guild v. Google, Inc., 804 F.3d 202, 217 (2d Cir. 2015)).

The Fair Use Analysis

Fair use is analyzed according to four non-exclusive factors set out in 17 USC 107. On the first factor of fair use, the court distinguished between scanning and pirating activities. The court called the destructive scanning of the books a “mere format change,” which supported a finding of fair use. The purpose of the copy was to support searchability. Anthropic only ended up with the digital copies, not the books.

Before buying the physical books, Anthropic “downloaded over seven million pirated copies of books, paid nothing, and kept these pirated copies…even after deciding it would not use them to train its AI.” The court viewed this differently from the scanning: “Such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.” The court was not convinced by Anthropic’s argument that the use would ultimately be transformative. Citing the recent Warhol case, the order says, “what a copyist says or thinks or feels matters only to the extent it shows what a copyist in fact does with the work.”

The last of the factors in a fair use analysis–usually considered the most important factor–is the effect of the otherwise infringing activity on the market for the original work. The court said, “The copies used to train specific LLMs did not and will not displace demand for copies of Authors’ works, or not in the way that counts under the Copyright Act.” But this was only for the purchased copies; the court reached the opposite conclusion for the pirated copies.

What’s Next?

The case can now proceed to trial only for the pirated copies. For the purchased books that were destructively scanned, the claims were dismissed.

This case is a class action, and the motion for class certification is still pending. If the class is not certified, plaintiffs often give up or settle for small amounts. Law firms that specialize in bringing class actions depend on a class certification of a large class to increase damages, and accordingly, their fees.

There are about 40 pending cases in the US on AI and copyright, and many of them may have suffered a setback with this opinion. Alsup’s opinion is in line with what many copyright commentators (including me) have proposed: that training is lawful if done with lawful access to the training material. The decision of a district court will not bind cases pending in other districts. However, because Alsup is a well-respected jurist, his analysis may persuade other courts to follow suit.

The court did not reach the second flavor of infringement claims regarding output, because it was not at issue here. But many commentators are skeptical that such claims will be successful for properly trained models. ML models typically do not produce “copies” in the sense intended by the copyright law. Claims regarding output may therefore be relegated to trademark, publicity and trade dress claims, which are outside of the ambit of copyright law.

Postmodern Art and Cannabis Law

I’m intrigued as to why an article about cannabis law cites to an article I wrote over thirty years ago about copyright fair use in postmodern art. But the Tulsa Law Review has paywalled their prestigious journal, forcing me to pay if I want to find out, and honestly, I’m not quite that intrigued.

By the way, you can download my article for free, and there is even an update here.

AI Could Be Your Next Team for Clean Room Development

Clean room developments are necessary when a developer wants to “cleanse” the intellectual property burden of third party software. The need arises when third party software is provided under unacceptable license terms, or not licensed at all. This is one of the trickiest tasks in software development, but it has a long history of best practices.

The canonical clean room development seeks to avoid trade secrets of proprietary software. But the rise of open source has resulted in the need to do a different kind of clean room project, meant to avoid the copyright in open source software–usually for GPL licensed packages. The two situations call for a slightly different approach. A clean room process for proprietary code seeks to avoid trade secrets and copyright burdens, whereas clean room development in open source is entirely about copyright–because there are no trade secrets in open source software. In either case, a team of developers seeks to write new implementing code from scratch, so that code will perform the same tasks, with the same inputs and outputs, as the original or “target” code.

A traditional clean room development process looks something like this:

  • Separate Development Teams: Create two teams of developers: a specification team that works on specification development for the target code, and an implementation team that writes the new implementing code. 
  • Create a Specification: The specification team, which has access to the target code, extracts the specifications for the software’s requirements and expected behavior. Software, at the end of the day, is a set of inputs and outputs, and its specifications state what outputs you should expect when certain inputs are used.
  • Reimplementation: The implementation team writes the new software according to the specification developed by the specification team. This must be done in an environment that is “cleansed” of the target code. Ideally, the implementation code has never read the target code.
  • Verification: The development team tests the newly implemented clean code. If there are bugs, the specification team can only confirm the accuracy of the specification. The specification team cannot suggest bug fixes, because that might result in inadvertent copying. Bug fixes are done by the implementation team.
  • Iterate: Repeat until the development is done.

Of course, there are far more complex processes for clean room development. Some have three teams, and most have a lot more steps. I have seen guidelines so many pages long they have a table of contents. But the above is the essence–not to mention the most my clients have the patience to read.

Not Enough Humans

The problem most companies have when performing a clean room development is that they don’t have the resources to create two separate teams. Even if they do, they usually cannot create an implementation team that has never been exposed to the target software–and doing so is particularly difficult when the target software is open source, because there is no way to prove lack of access to publicly available materials. For an open source clean room process, we usually make do with developing implementing code in an environment that does not have local access to the target code.

But now, with the advent of AI, we have an alternative way to approach clean room development.

I pause here to note that while there are those who think that all generative AI is prima facie copyright infringement, I don’t agree. As long as the model has been trained on enough inputs, it should not parrot any one input. (More on that here.) So let’s set that issue aside, because if you disagree with me, you shouldn’t be using AI coding tools at all and you should just put this article aside.

An AI that writes code (like Claude or Co-Pilot) has probably been exposed to almost all the open source code ever written. But via that training process, it is unlikely to focus on specific target code. So, companies struggling to staff a clean room development might consider replacing one or both of the teams with AI.  As always, some human oversight is necessary to check that an AI generative process has been done correctly. But using it would still greatly reduce the headcount necessary to implement the clean room process.

  • Specification Team. AI is better at some tasks than others, but I have found that AI is quite good at summarizing text. If you ask it to write the specifications for target software, it will probably do a good job. You could use an AI for your specification team, and that would help avoid “contaminating” your implementation team with access to the target software.
  • Implementation Team. AI is quite good at writing code, though more human oversight would probably be necessary to use AI for this purpose. AI-assisted coding still requires human curation, and also usually requires human debugging. Debugging is a complex logical task, and the current flavor of AI–transformation based models–are better at text generation than logic. But in a pinch, you might use AI as your implementation team and use the specification team for quality control.

Neither of these suggestions should be surprising. AI code generation greatly reduces the human effort necessary to produce code, and clean room projects are human-intensive. For an open source target, I think the use of AI as a specification team is quite interesting. For proprietary code, using AI as the implementation team may be particularly interesting, because AIs are mostly not trained on proprietary code, making the cleansing more reliable.

Always remember: wash your hands before you code!

How to Get Lawyers to Use AI

Lawyers love to talk about using new technology to make their practice more efficient. All during my career, I’ve heard law firms touting the great benefits or knowledge management, AI (generative and non-generative) research and writing tools, and of course, getting lawyers to do their own administrative work. But unfortunately, they are better at talking about it than actually doing it. Perhaps it is the billable hour model, which rewards inefficiency, or just that most lawyers aren’t all that tech-savvy.

So of course, when ChatGPT and generative AI snatched the attention of the world a couple of years ago, lawyers started speculating about how sophisticated LLMs would change legal practice. Some opined we would all be made irrelevant and rendered unemployable. (Well, not yet.) Some announced grand partnerships with AI providers–though I would be skeptical that has reduced any client bills much. Some think we can use AI to do better, faster work. In that respect, faster is more likely than better. AI, it turns out, is not so good at quality control, and lawyers are all about quality control.

Technological breakthroughs can have profound changes in our world, but they tend to happen more gradually than click-bait articles would have us believe. I remember, once, wondering how people would ever accept self-driving cars. But we didn’t exactly. We accepted cruise control, then automatic parking, then lane-keeping assistance, and automatic braking. The autonomous features we accepted just keep expanding.

AI in legal practice is the same. Slowly, AI has crept into our practices. I doubt that I am much different from most lawyers in this way. Here are some of the tasks for which I have used AI in the past couple of years:

  • Creating first drafts of articles and memos. Even if I end up re-writing the entire thing, a first draft is useful and can be thought-provoking. I would equate ChatGPT, in this respect, with a first year associate. (But sometimes, I am sad to say, with better grammar.) But a first draft is about as far as it goes. The LLMs have no savvy, no creative ideas, and no experience. They can’t put two and two together and make five–well, they can, but they think that’s accuracy, rather than synergy.
  • Summarizing contracts or other documents. LLMs are useful for creating summaries of documents. In this respect, they are usually quite accurate. So, I can use them as a double-check for my own reviews or summaries, or as a basis to draft them. A word to the wise: if you do this, be sure not to feed confidential information into the model, unless you have a paid access account that keeps in the input confidential and prohibits training on it. Disclosing confidential documents to an LLM would usually be a violation of client confidentiality.
  • Legal research. General purpose LLMs are not very good at most conventional legal research–finding court opinions, statutes, and similar sources. That is because they have no sense of what is authoritative, other than consensus of their training data. LLMs can be useful for factual research; these days I find myself using the Google AI-driven search quite a bit. I particularly appreciate that it cites to outside sources so I can check the legitimacy and reliability of the information.

Now…the Part I Didn’t Expect

Recently, a client sent me an agreement to review. It was a simple agreement, and not badly written. I asked where the agreement came from, and my client admitted they had used an LLM to write it.

But this did not bother me. Clients send their lawyers form agreements all the time, saying, can we use this? They usually seem to think this saves legal fees, even though it does not really do that. Until now, the documents they sent were usually something they copied from the web–often without even changing the names–and often wildly inappropriate for the situation.

In contrast, the LLM-generated agreement sent to me was on point, contained no typos, and had most of the relevant terms. It took me about half an hour to fix it up for use–probably about the same time it would have taken me to start from one of my own forms, and probably less time than fixing up some piece of dreck that was codged from the web. Perhaps I am unusual in that I would rather not spend my hours, nor to charge my clients for the hours, spent fixing up badly written stuff from the web.

But it did make me realize how lawyers will finally be dragged into using AI, rather than just pontificating about it. It won’t happen from their own volition, or even due to pressure from clients to be more efficient. It will because their clients will start using auto-generated legal documents, and the lawyers will be expected to fix them, or explain why they are not good enough to both fixing.

Welcome to 2025!

Thomson Reuters Wins a Victory Against Both AI and Public Access to Law

Recently, an opinion was published by a magistrate judge in Delaware, Stephanos Bibas, in Thomson Reuters v. Ross Intelligence, a case filed in 2020 about AI training.

For most of us who have been following the spate of ongoing cases about generative AI, this case was not too high on the radar, because it’s about regular AI: substantive number crunching. Moreover, the case was filed in 2020, long before all the hoo-hah about ChatGPT started. Apparently, in the depths of a societal shutdown from COVID-19, Thomson Reuters decided it needed to shore up its putative rights in public domain legal texts.

I have to confess that it took quite a while to sit down and read this opinion, because I just didn’t want to. I think the legal reasoning in this opinion is not correct, but that’s not why it took me so long. The effort of the legal publishers to restrict access to primary law sources causes me great dismay. So, if you want objective legal analysis, you should probably read some of the other commentators who have written about this opinion.

Legal Opinions are Freely Available

In the US, legal opinions written by courts are in the public domain–they enjoy no copyright protection. That is as it should be. Everyone should be able to read the law and decide for themselves what it means. And while it’s true that access to primary information is usually not enough, and those with the money to hire the best-dressed lawyers often win legal battles–more about that below–free access to primary legal sources, like court opinions and statutes, is table stakes for political liberty.

I graduated law school in 1994, which was just on the cusp of widespread access to the Internet. So, I learned to research the law in books–tree-killing, heavy, expensive hardback books called case reporters. While the online legal research tools, Westlaw and LEXIS, were available to us then, they were viewed with skepticism. And they were rudimentary, expensive, and often more confusing than the books. So, a library, like the one at UC Berkeley where I attended, merely had to buy the case reporters, and then anyone was free to read them.

Because legal opinions are usually long and tedious to read, case reporters include micro-summaries printed at the front of the case text, called headnotes. And with all due deference to whatever poor underemployed lawyers create the headnotes, they are not what you would call poetry. They are summaries of individual nuggets of law, intended to point a researcher to actual language in the case. They are like a very wordy index. No competent lawyer would use a headnote standing alone. Headnotes are merely entry points to the official legal text. And for the most part, they either quote or slightly rephrase the source material.

Here is a (made up) example from the opinion:

Headnote:

Originality, for copyright
purposes, means that the
work was independently
created and has some minimal degree of creativity.

Opinion:

Original, as the term is
used in copyright, means
only that the work was
independently created by
the author (as opposed to
copied from other works),
and that it possesses at
least some minimal degree
of creativity.

Although this is a hypothetical example, it is fairly typical of what headnotes contain: a verbatim or slight rephrasing of what the court says in the opinion.

A Doomed Business

Since the mid-1990s, the legal publishers have struggled. After all, their business model is almost completely obsolete. Most legal text is available online, free of charge. Since that time, legal publishers have been on a campaign to squeeze the last remaining value out of their businesses, in part by suing people for infringing their so-called copyrights in what they create: headnotes and page numbers.

But the truth is this: no one needs headnotes anymore. With sophisticated pattern-matching search, not to mention AI-enhanced search, most research can be done on the web. If you want to find legal opinions on the web, for example, there is the Free Law Project, which even allows access via REST APIs. (I am a member for the princely sum of $10 a month.) And there is Google Scholar, which is free, and even recommended by the Library of Congress. Not to mention FindLaw and Justia. And today, search technology is good enough so that it’s pretty easy to find the law online free of charge.

The opinion in this case says, “The law is no longer a brooding omnipresence in the sky; it now dwells in legal research platforms.” This portends the opinion’s ill-advised conclusions. In truth, the law is indeed an omnipresence in the sky–it’s on the world wide web. And it is not, and never should be, confined to “legal research platforms.” The advent of the web should have freed us from the firewalls of legal research and democratized access to the law.

So, the legal publishing business, which once carved out a living by filling a practical gap–the difficulty of the body politic getting access to paper legal documents–is now simply unnecessary. And the entire business of paywalling access to primary legal sources should have died a graceful death.

If you are of a mind to lament businesses toppling due to the advance of technology, I refer you to this blacksmithing video. Blacksmithing is fascinating to watch, but I don’t want to fire up a forge every time I want a new kitchen utensil, and I don’t want to pay a thousand bucks for a hand-made skillet. Also, just imagine what an outcry there would be if a big, bad tech business like Google tried to firewall access to public domain legal documents. Sacrilege! Techno-oligarchy! But when a dying business like Westlaw tries to do this, it’s perfectly fine, because…authors’ rights!

…Nevermind

The opinion begins, “A smart man knows when he is right; a wise man knows when he is wrong.” The judge proceeds to overturn his own opinion on the exact same topics from 2023.

For those of you who are not lawyers, I want to convey how bizarre it is for a federal judge to reverse an existing opinion without one of the parties appealing the decision. It just doesn’t happen–and for good reason. The job of judges is to arbitrate disputes. Judges don’t just decide what they think we ought to hear. They answer motions brought by litigants. But in this case, the judge just literally changed his mind of his own accord, between 2023 and 2025. Otherwise known as the ChatGPT era.

This is the first signal that something is rotten in Denmark. But then, we get to the substance of the opinion.

Headnotes are Copyrightable Material?

If you have your doubts about headnotes being protectable under copyright when reading the above example, you are not alone.

This court, in an ominous foreshadowing of its misguided conclusion, says “A headnote is a short, key point of law chiseled out of a lengthy judicial opinion.” It proceeds to analogize the creation of headnotes with the work of a sculptor–like Michelangelo, who famously (and probably apocryphally) said, that to carve an elephant, you just chip away the part of the marble that doesn’t look like an elephant.

More than that, each headnote is an individual, copyrightable work. That became clear to me once I analogized the lawyer’s editorial judgment to that of a sculptor. A block of raw marble, like a judicial opinion, is not copyrightable. Yet a sculptor creates a sculpture by choosing what to cut away and what to leave in place. That sculpture is copyrightable. 17 U.S.C. §102(a)(5). So too, even a headnote taken verbatim from an opinion is a carefully chosen fraction of the whole. Identifying which words matter and chiseling away the surrounding mass expresses the editor’s idea about what the important point of law from the opinion is….So all headnotes, even any that quote judicial opinions verbatim, have original value as individual works. That belated insight explains my change of heart. In my 2023 opinion, I wrongly viewed the degree of overlap between the headnote text and the case opinion text as dispositive of originality…. I no longer think that is so. [Emphasis added]

But creating a headnote is not like carving a work of art from a block of stone. It’s more like taking a core sample from stone to see if it has gold in it. Sure, there is skill in deciding where to drill. But pointing to the gold is not expression. And neither is the sample that results.

The 2023 opinion in this case said this issue should go to a jury, “to decide whether the headnotes … were original enough” for copyright protection. That was the right result. This new result takes this conclusion away from the fact-finder, and hands the plaintiff a potentially huge victory.

What Will Happen?

One thing that seems clear here is that AI may die an ignominious death if the decision is used as precedent for other cases. Don’t get me wrong–I am not a supporter of this particular defendant. I hope that machine learning models to analyze the law are developed and offered free of charge to everyone–not by a defendant that couldn’t figure out how to train an AI without using headnotes. I care more about the next defendant in the next case, who tries to train AI to analyze public domain material, and gets slapped down by an aging business sector that makes its living aggregating that material and is only trying to forestall its inevitable death at the hands of innovation.

Do you think the plaintiff has the resources to create great AI for legal research? That’s unlikely. They are an aging business, and AI training is expensive. But this opinion says that “it does not matter whether Thomson Reuters has used the data to train its own legal search tools; the effect on a potential market for AI training data is enough” to deny the defendant the right to bring a fair use defense to a jury.

Unfortunately, this particular defendant probably will not have the resources to keep defending this claim. Ross Intelligence closed down its product in 2021 because of this lawsuit. Contrast the fact that Thomson Reuters used Kirkland and Ellis, one of the top IP firms in the country–and one of the most expensive–on this case. So the outcome does not even benefit the plaintiffs much here, it just threatens AI development everywhere.

Perhaps some other AI developers with deep pockets might step in to help fund a legal war chest to overturn the decision. This decision not only suggests that AI training is not fair use, but that the question of fair use is a matter for a judge, and would not even go to a jury. Fair use is likely to be a significant defense in many of the cases currently pending about AI. In Oracle v. Google, a fair use case that went on for over a decade and costs many millions, we saw how vulnerable fair use is to rights-holders with big war chests. In that case, similarly, the Federal Court of Appeals tried to take the decision making on fair use away from the jury. Luckily, the Supreme Court thwarted that attempt, and we were all lucky that the defendant, Google, had the money and the gumption to fight that case for over ten years. Otherwise, most APIs would be paywalled now.

Do not celebrate this as a victory for authors’ rights. Headnotes are not fine art. This opinion is a mistake, and I hope it is corrected.

And All the Rest…

There is a lot more to say about this opinion, and I have not attempted to analyze it all here. For example, the copyright protection of the Key Number System even more absurd than protection for headnotes, and yet this opinion seems to allow for it. The opinion glosses over countervailing law in the Oracle v. Google case, saying dismissively that “those cases are all about copying computer code. This case is not.” It also comments, with little analysis, that AI training is not transformative–which is probably is. (The name of the current method for machine learning training is in fact the transformer method, and the name is apt. Machine learning models look nothing like their inputs.) The transformation test usually wins the fair use question. Also, the opinion comments that each headnote is an individual copyrightable work, a scary piece of dictum that could result in precedent for eye-watering statutory damages, for which a court can award up to $150,000 per work.

Also, those trumpeting this opinion as a victory for authors will no doubt ignore that this case only involves one side of the question: the input side. For generative AI, the main issue is probably the output side. The opinion itself says, “Because the AI landscape is changing rapidly, I note for readers that only non-generative AI is before me today.” But that statement will be roundly ignored. By writing “for readers,” this judge knew very well that this decision would create an avalanche of conclusions that AI training is infringing. And the media wants a win, and so that’s what it will report.

9th Circuit Clarifies Derivative Works of Software: Oracle v. Remini Street

A new opinion from the 9th Circuit, filed December 16, 2024, may change the way we think about copyleft licensing. It’s rare for the courts to opine on what constitutes a derivative work under copyright law, even less so for software. The interpretation of copyleft licenses–GPL and AGPL in particular–turn heavily on the notion that integrating code creates a derivative work. This idea is particularly bound up in the text of GPL version 2, which uses the term derivative work liberally (albeit inconsistently) as a basis for exerting control over licensing of integrated libraries.

Two Software Giants

Remini Street and Oracle are direct competitors engaged in a long and expensive legal war. Remini Street offers support services for various Oracle software, notably PeopleSoft software (which handles HR and similar tasks). Oracle wants to stop them.

Oracle first sued Rimini for copyright infringement in 2010. In a prior leg of this legal journey, the trial court found that Rimini infringed on Oracle’s copyrights by engaging in “cross use.” (This term is not common, but it apparently means applying software updates to multiple users.) Rimini then updated its business practices and sought a declaratory judgment that its revised “Process 2.0” did not infringe Oracle’s copyrights. Oracle counterclaimed for copyright infringement, seeking more than one billion dollars in damages.

At summary judgment, the trial court held that Rimini had infringed Oracle’s PeopleSoft copyrights and that an update created for the City of Eugene’s PeopleSoft implementation was a “derivative work” of Oracle’s software. The district court then entered a permanent injunction against Rimini, ordering it to delete certain software files.

The trial court held that Rimini-written files and updates developed were infringing derivative works because they only interact and are useable with Oracle software. (Oracle Int’l Corp., 2023 WL 4706127, at *66.) In effect, the district court adopted an “interoperability” test for derivative works—if a product can only interoperate with a specific copyrightable work, then it must be derivative of that work.

However, the 9th Circuit disagreed, saying “Mere interoperability isn’t enough.” The court distinguished translations, abridgments, and other examples of derivative works, named in the copyright statute’s definition, from interoperable software that does not substantially incorporate the original work, noting, “Whether a work is interoperable with another work doesn’t tell us if it substantially incorporates the other work.” It recited that “Rimini claims that thousands of its files fall into this category—programs that are interoperative with Oracle’s PeopleSoft but do not contain Oracle’s copyrighted code. Without more, mere interoperability isn’t enough to make a work derivative….Instead, a derivative work must actually incorporate Oracle’s copyrighted work, either literally or nonliterally. …[S]imply being an extension or modification of a copyrighted work without any incorporation is not enough to create a derivative work.”

The court declined to reach a conclusion on two other interesting points of copyright law in software licensing: whether Oracle’s customers are ultimately owners or licensees of their copies of Oracle’s software, and whether Rimini could rely on the “essential step” defense under § 117(a)(1), which allows running of software without a copyright license. Both of these are key to enabling third parties to host and manage software for customers who have purchased licenses from software vendors.

Effect on Copyleft

This new opinion is potentially relevant to the interpretation of GPL, particularly GPL version 2. One premise of copyleft is that the license can only control activity that requires a copyright license. Otherwise, there is no need to adhere to the license terms. But when GPL 2 was written, exactly what copyright covers was unclear. So, downstream developers of GPL have always pondered this question: if I create a library that interoperates with the GPL program, but is a separate dynamically linked library, must it also be provided under GPL or a compatible license?

This question was first brought into focus by developers who wanted to create proprietary loadable kernel modules (LKMs) for the Linux kernel. LKMs are, by definition, within the same executable process as the kernel, which is covered by GPL, but separate, dynamically linked, library files. (Similar questions exist for, say, libraries in user space that operate with GPL programs in user space, but the LKM question is the quintessential embodiment of this question, so I use it here as a proxy for the larger question.)

In particular, the practical question is whether proprietary LKM developers can distribute their LKMs, allowing customers to run them in a “meal kit” manner. The argument goes like this:

  • Customers all enjoy the rights of GPL, and therefore can use the GPL-licensed kernel freely.
  • Vendors enjoy the rights of GPL, and can distribute the kernel under GPL freely.
  • LKMs are separate files from the kernel, not built into the same executable except by reference at runtime.
  • This reference requires reference to a kernel API.
  • While the customer may create a derivative work by combining the kernel and LKM at runtime, creating a derivative work does not trigger source code sharing–only distribution does that.
  • The LKM’s use of the API does not create a derivative work.
  • Therefore, neither the vendor nor the customer is violating GPL.

The opposite position is that an LKM is a derivative work, regardless of whether it is distributed separately. This turned on the idea that in order to be interoperable, the LKM library had to include a portion of the kernel–its relevant API–which made it derivative.

This new opinion puts paid to the idea that an LKM is necessarily a derivative work, at least in the 9th Circuit. But it leaves open the question of whether an LKM might contain substantial portions of the kernel and therefore still be derivative. Indeed, any LKM would contain some elements identical to its interoperable code–such as interface definitions (APIs) or header files. No US court has created precedent on whether those are necessarily substantial.

However, we got close, once. Functional elements of code enjoy no copyright protection. In the past, many have reasoned that an API is, by definition, functional. This question was at the forefront of the Oracle v. Google lawsuit, where Google was accused of copying portions of the API for Java. The trial court (Judge Alsup, also in the 9th Circuit) held that APIs are not protectable, and therefore copying an API was not infringement, However, that ruling was later overturned (on appeal in the Federal Circuit), causing that case to go into a different doctrinal direction.

It’s my view–and I doubt I will be alone–that it is functionally identical to say that (a) per Remini Street, mere interoperability does not create a derivative work, and (b) per the Alsup opinion in Oracle v. Google, APIs standing alone are not protectable. The two opinions yield the same result, via different doctrinal paths. So perhaps, this new opinion restores the conclusion that many believed correct in the Alsup opinion.

Will it Matter?

Oracle v. Google was the last time a US court of appeals issued a meaningful opinion on the topic of software derivative works. That the Federal Circuit and 9th Circuit would come to different conclusions will surprise no keen follower of copyright law. The Federal Circuit, specially created to hear patent cases, is known to be exceptionally friendly to broad interpretations of intellectual property rights; the 9th Circuit, less so.

While this new opinion might put added wind into the sails of those making the time-honored LKM argument, it may not ultimately matter. This question has been a legal unknown for decades. In the meantime, most developers of LKMs have released them under GPL for practical reasons. That’s because, in all these years, the ambiguity of this legal question allowed GPL advocates, and community pressure, to settle the issue de facto. Most people stopped bothering with the LKM argument years ago.

Moreover, the Oracle v. Remini Street case is not over. The opinion does not dispose of the whole case, and is only precedent in the 9th Circuit. But opinions on topics like these are so rare that this one is likely to influence later courts who may examine the definition of software derivative works.

(Note: This blog post quotes heavily from the court’s opinion. Items with quotation marks are from the opinion, but other portions of this post are not marked with quotation marks. I did this for ease of reading. Also, the opinion contains related discussion of Lanham Act claims and license interpretation, which I have omitted.)

Anthropic Settles a Small Part of an AI Copyright Claim

This week, there was some interesting movement in one of the many copyright claims relating to AI.

Anthropic, the maker of Claude (my personal favorite AI for coding!) reached an agreement to settle parts of an ongoing lawsuit, Concord Music Group, Inc. v. Anthropicfiled October 18, 2023. The case has a long list of music publishers as plaintiffs. The current docket is available in RECAP here.

The court approved a settlement, but it is limited in scope. The settlement resolves only one aspect of the injunctive remedies sought by the plaintiffs. Anthropic agreed to:

  • Maintain certain “already implemented Guardrails” in its current products that are designed to prevent the model from reproducing lyrics
  • Apply these Guardrails on its new releases
  • A process allowing the plaintiffs to notify Anthropic that Guardrails are not effectively preventing the output

The plaintiffs’ demand for a preliminary injunction to bar Anthropic from training future models on their song lyrics is still pending.

JavaScript Trademark Challenge

This week, as a Thanksgiving present to the tech industry, Deno Land petitioned the US Patent and Trademark Office to cancel Oracle’s JavaScript trademark.

JavaScript–which helpfully has little to do with Java–is a programming language for client-side web applications. Most of the interaction between you and websites, like filling in forms in e-commerce sites, uses JavaScript. It is a ubiquitous language. JavaScript was developed by Sun Microsystems and Netscape Communications, in the 1990s. The trademark was acquired by Oracle Corporation in its 2009 acquisition of Sun Microsystems.

Deno Land is the company behind the Deno project, a modern JavaScript runtime. A JavaScript runtime is the environment where JavaScript code is executed, including outside of a web browser. The most famous JavaScript runtime is Node.JS, which was developed by Ryan Dahl, the founder of Deno Land.  Node.JS is also wildly popular.

Programming languages usually don’t enjoy much trademark protection, because they tend to be enabling technologies rather than commercial products. Most popular programming languages have accepted standards for their development that are set by standards organizations rather than single owners. After all, programming languages are much more useful if many developers use them, instead of a few. A trademark is the way to distinguish the official source of products, and for successful programming languages, this role typically falls to standards bodies rather than any one organization. JavaScript, for example, is governed by the ECMA-262 standard for ECMAScript. 

Brandon Eich, one of the key developers of JavaScript, quipped that the name “ECMAScript” was a compromise between the organizations involved in standardizing the language, especially Netscape and Microsoft, whose disputes dominated the early standards sessions. “ECMAScript was always an unwanted trade name that sounds like a skin disease.” If the mark is declared generic or abandoned and no longer owned by Oracle, there is hope that the standard could finally be named after JavaScript.

The petition alleges primarily that the trademark has been abandoned, and that JavaScript is now a generic term, and not associated with Oracle. Neither Sun nor Oracle ever really exploited JavaScript as a commercial product, but Oracle has, according to the petition, engaged in enforcement activity to prevent others from using the name–thus endearing itself to the tech community in its usual fashion.

The petition says:

An open letter to Oracle discussing the genericness of the phrase “JavaScript,” published at https://javascript.tm/, was signed by 14,000+ individuals at the time of this Petition to Cancel, including notable figures such as Brendan Eich, the creator of JavaScript, and the current editors of the JavaScript specification, Michael Ficarra and Shu-yu Guo. There is broad industry and public consensus that the term “JavaScript” is generic.

It’s time for Oracle to do the right thing and relinquish its hegemony here. JavaScript has enough trouble with its “THIS” keyword, without a trademark burden to make it worse

Puter, the Web-based OS, and My New LEDES Program

This post is mostly about Puter, a great new open source project that is building a web-based operating system.

I decided to try it out. Now, I used to be a software engineer long ago, but occasionally I like to try my hand at coding in the 21st century. Puter apps are written in HTML/Javascipt, which is the basic coding for all web pages. They are not hard languages to learn, and there are plenty of online tutorials, development environments, and resources to write programs in these languages. Puter also provides a nice little playground to try out your programs as you write and debug them. I also made use of Claude from Anthropic, which is a pretty amazing AI to write code, and has a free tier.

Using all this, and some help from nice friends at Puter, I cobbled together a very simple program for my law practice.

The LEDES Problem

If you are a lawyer, you may already know this, but many companies require–or at least prefer–legal invoices to be delivered in a format called LEDES. LEDES is an outdated, clunky, deeply horrible standard. If you have used any of the biggest legal invoicing platforms, like Legal Tracker, CounselGo, Coupa, or Brightflag, you have probably used this standard, whether you know it or not.

LEDES is a pain because it is nearly impossible to create an invoice in this format by hand. It uses plain text with an odd delimiter (try to find | on your keyboard), requires complete invoice information on every line item, and has various unnecessary fields and other clunky features. So if you have to produce a LEDES invoice by hand, you are in for an unpleasant hour or two. Why would you need to do that? Well, some clients refuse to accept anything else, or use billing platforms that don’t allow you to do manual input.

You can get a service to prepare a LEDES invoice for you, and it costs about $100 per invoice. Yes, that’s right. If you are a solo lawyer sending out invoices for a few thousand dollars, that’s a bite into your pocket money. I never want to pay such a fee again, and I never want anyone else to, either. So I decided to make my program available under an open source license. That means you can run it locally if you prefer, but the easiest way is to run it on Puter here.

Puter is Liberating

Deploying on Puter was easy and quick. It’s free for developers. And my little program does not even scratch the surface of what you can do. For example, you can use file storage, databases, GPT-4, DALL-E, and other features that would otherwise be very annoying to access with Javascript.

My little LEDES program is just the first version. It is functional but clunky. It surely has bugs. I plan to improve it in the near future. For example, I intend to improve the input screen, and help with useful defaults. But in the meantime, enjoy, and don’t let the billing platforms get you down!

(Note: I am an advisor to Puter, so please consider that I am not unbiased in my praise for the platform.)

Bungie Wins Against Cheat Providers for Destiny 2

Recently, Bungie, the developer of the Halo and Destiny video game series, won a judgment of just over $60,000 against AimJunkies, a cheat and mod site. The lawsuit arose because AimJunkies was selling cheats for Destiny 2, a free-to-play online first-person shooter with over 10 million players. In the complaint, Bungie explained the problem with the cheat codes:

Destiny 2 rewards players for their gameplay skills with items, seals, and titles, and these rewards are visible to other players.  Cheaters earn the same rewards without the requisite gameplay skill.  When cheating occurs, or when there is a perception that players are cheating, then non-cheating players become frustrated that cheaters obtain the same rewards and stop playing.”

From the Complaint filed 06/15/21

The complaint was filed based on copyright and trademark infringement, breach of contract, as well as DMCA (Circumvention of Technological Measures to control copyright). 

The copyright claim may sound like a bit of a stretch. A cheat code, standing alone, probably isn’t a copy of anything protectable. I would have expected the defendant to raise this issue, but it’s hard to tell what happened from the public documents. There was a motion for preliminary injunction in June of 2022. It explains that the defendant was actually creating copies of the game–or at least part of it:

to create cheat software that includes these features, Defendants necessarily copied the Destiny 2 software code that corresponds to key attributes in the Destiny 2 video game, such as the data structures for player and combatant positioning 

Well, perhaps. Data structures are at the bleeding edge of copyright law. But if you find the copyright claim unconvincing, the complaint had plenty of other ammunition as well.

The defendants had downloaded Destiny 2, and therefore agreed to the terms of its end user license, which expressly prohibits the user to:

hack or modify Destiny 2, or create, develop, modify, distribute, or use any unauthorized software programs to gain advantage in any online or multiplayer game modes.

Cheating might seem harmless on face, but keep in mind that Destiny 2 is a multiplayer online game that is free to play, but charges for in-game purchases. In contrast, cheat codes in single player games don’t usually concern the game publishers as much.  If you buy a mod that helps you grow cash crops faster in Stardew Valley, no one really suffers, other than perhaps your friends who envy your lush and beautiful farm. In fact, Stardew Valley, a hugely popular game, supports a wide library of cheats, which only seem to make the game more popular. 

But cheats in multi-player online games like Destiny 2 can give you an advantage against other players, and that messes with the developer’s business model. In other words, the developer wants to be the only one able to sell advantages, and to control how much advantage they provide. But to be fair, for Destiny 2, there seems to be disagreement among players as to whether the legitimate in-game purchases provide you with any real advantage. Most of them are cosmetic only.

The damages awarded in this case represent the revenue AimJunkies earned from selling cheats for Destiny 2. In the world of copyright lawsuits, $60,000 is a very small award, and doubtless Bungie spent quite a bit more than that in legal fees to prosecute the lawsuit. 

But the purpose of the suit was clearly to prove a point, and establish a basis for suing other cheat providers. And it turns out that Bungie is not afraid of spending legal fees. There is an excellent article in Axios about Bungie’s various legal campaigns. 

The gaming world can be a dangerous place indeed. In addition to other lawsuits about cheating, Bungie has brought suits against player trolls who have made threats to the company’s employees. An article in the New York Post said that one player “allegedly threatened to burn down Bungie headquarters in Seattle. On July 4, 2022, in response to a tweet asking if anyone was willing to commit arson in Seattle in exchange for payment, [the player] replied that he was, and that the poster would receive a discount if the target was Bungie.”

And in a somewhat less disturbing, but also concerning, case, Bungie sued a fan who impersonated Bungie for the purpose of sending a spate of DMCA takedown notices–the kind that content creators send when someone is copying their content on a site like YouTube or TikTok. The notices were directed at fan-created content, but Bungie allows fan content. The fraudulent takedown notices were part of a soap opera of conflict between a disgruntled fan and the company. More info here.

Meanwhile, it’s getting more popular, and easier, to sue the cheat providers. There will probably be more lawsuits to come–from Bungie and others–as they try to shut down this kind of business.

Watch my video here.