The Chardet Controversy: Open Source and the AI Clean Room

In early March 2026, a dispute broke out over a Python library that most developers use every day. Chardet, a character encoding detection tool that powers a large portion of the Python ecosystem, became the unlikely center of a licensing debate. At its core, the controversy asks: can an AI rewrite of copyleft-licensed software be legally released under a more permissive license?

A Library, a License, and Lags

Chardet was created by Mark Pilgrim in 2006 and released under the GNU Lesser General Public License (LGPL). The project was a port of a Mozilla’s universal charset detection library. Then, in 2011, Pilgrim’s various online projects began returning 410 errors. A 410 status code (Gone), unlike a 404 (Not Found), a signals that a resource has been permanently removed. Then he deleted his social media accounts, apparently unplugging from the online world entirely. After this, various maintainers stepped in, most notably Dan Blanchard, who has been maintaining Chardet for over a decade.

Chardet is a very popular, basic library for Python, but due to licensing issues, it is not an official Python standard library. Being in the standard library means the library ships with every Python installation, and is implicitly endorsed by the Python core team as the canonical solution to a problem. According to Python policy, inclusion in the Python standard library requires permissive licensing, in order to maximize integration possibilities.

If your code is going to end up in Python or the standard library, the PSF will require you to: License your code under an acceptable open source license. These currently include only the Academic Free License 2.1 … and the Apache License 2.0…. https://wiki.python.org/moin/PythonSoftwareFoundationLicenseFaq

Of the two choices, the Apache license is much more common, and the likely choice.

This made Chardet’s LGPL terms an obstacle to the goal of getting Chardet accepted into Python’s standard libraries. In 2015, Chardet was considered for inclusion in the Python standard library, but the effort died because the LGPL license turned out to be a gating item.

Blanchard wanted to fix that, and also address some performance problems with the library. But his efforts to do so encountered some serious backlash.

The Rewrite and the Relicensing

On March 2, 2026, Blanchard released Chardet 7.0.0–a “ground-up, MIT-licensed rewrite” with the same package name and the same public API as prior versions of Chardet. The rewrite was apparently a success, with huge increase in speed. Blanchard’s detailed notes on how he accomplished the rewrite are here. These notes are both thoughtful and transparent. To show that he had not copied the code, Blanchard provided the results of a plagiarism-detection analysis showing that version 7.0.0 shared less than 1.3% structural similarity with any prior release, suggesting that nothing protectable from the old codebase had been retained.

Just two days after the release, however, Pilgrim suddenly resurfaced. He opened GitHub issue #327, entitled “No right to relicense this project,” identifying himself as the original author and calling for licensing under LGPL.

“Their claim that it is a ‘complete rewrite’ is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a ‘clean room’ implementation). Adding a fancy code generator into the mix does not somehow grant them any additional rights.” GitHub

Legal and Community Questions

The issue on GitHub quickly escalated into a controversy. In typical open source fashion, what followed was a war of words among many commentators, who–shall we say–did not engage in the tenor of analysis favored by those who actually know the law.

The best practices for a clean-room development to manage copyright issues requires a separation between two teams of developers: one that creates the specifications and another that writes the reimplementation. The reimplementation developers are blocked from access to the original source code, creating a “clean” environment.

Blanchard, however, had spent over a decade maintaining Chardet’s codebase. His knowledge of the library could not be cleansed. This is an issue endemic to all open source reimplementation projects: there is no way to completely insulate any developer of exposure to public code. In this case, the issue applied in spades.

Nevertheless, using AI to perform clean room development can help manage this issue by interposing a neutral actor–the AI–between the reimplementation developer and the specification developer.

Richard Fontana, an expert on open source licensing, commented that he saw no current basis for concluding that Chardet 7.0.0 was required to be released under the LGPL, noting that no one had identified copyrightable material from earlier versions in the new code. This is probably the right legal analysis with respect to copyright. Still, the controversy raged on, probably because it was only partly about copyright and also about open source community expectations.

Blanchard later switched the license from MIT to 0BSD (a permissive license with no notice requirement) in an attempt to deflect the question of whether AI-generated code is copyrightable in the first place. As a side note, neither MIT nor 0BSD seem to be expressly approved by PSF for standard libraries in the FAQ quoted above, though it would make sense if such permissive licenses posed no hurdle to adoption. But all of this is a red herring for the infringement issue surfaced by Pilgrim.

Takeaways

The Malus Question. What makes this controversy significant is the larger question of AI-assisted clean room rewrites. The same technique could theoretically be applied to any open source project: feed a copyleft-licensed codebase to an LLM, ask it to rewrite the functionality, and publish under a permissive license. Open source advocates point out that if such a technique is legal, the entire copyleft model has begun to disintegrate. That’s an interesting question, and if you want to read more about it, you should read my post about MALUS, the not-quite-April-Fool’s-joke posted recently about using clean room implementations to avoid copyleft. Spoiler: I agree that copyleft is moribund, but I think it’s demise has been a slow, lingering malaise rather than a sudden death.

Clean Room is Rarely a Real Solution. Moreover, in my view, a clean room development undertaken solely to avoid copyleft will usually create a lot of technical debt. Engineering around copyleft effectively creates a fork to the project. Whether that fork is free of the copyright burden of the original is one question. The bigger question is whether anyone will trust a fork enough to use it. And everyone knows that maintaining a fork is almost always more expensive and difficult than complying with open source licenses. However, compared to the average clean room effort, the reasons for the Chardet rewrite were more complex: avoiding a stringent licensing policy and improving performance.

Naming Issues. There is one aspect of the Chardet controversy that most people miss.

That’s the mistake – not the rewrite, not the AI. Not even the license change in isolation. The mistake is claiming your code is “an independent work, not a derivative” while simultaneously shipping it as the next version of the thing you say it’s independent from.https://shiftmag.dev/license-laundering-and-the-death-of-clean-room-8528/

A key issue here was that the clean room reimplementation used the same project name and API, which could cause confusion with the prior version. Although this may have been a strategic misstep, perhaps we should cut Blanchard some slack for the many years he put into maintaining the project. Naming conventions in open source are usually considered trademark issues. Trademark is, in turn, about the source, nature and quality of products. What constituted Chardet in 2026 may be as much Blanchard’s doing as anyone else’s, so perhaps he effectively controls the name. Other than that, this clean room effort is one of the more conscientious ones I have seen.

The Great Open Source Wealth Transfer. This controversy also raises the familiar (but largely unaddressed) issue of what happens when the original authors of open source code–or their heirs–start to exert what the law calls “dead hand control” over their intellectual property. There is a tension between the right to enforce licenses based on old copyright interests, on one hand, and the need to keep projects useful, secure, and up-to-date, on the other. The truth is that copyright duration–which is largely dictated by the lobbying of the media industry–is just too long for the world of software. As the first generation of open source developers goes to software heaven, goes “410” or just hands the baton to other maintainers, their interests diverge from the community of active users. And worst of all, as they die, their copyright interests will pass to their heirs–who might well have apathy, or even antipathy, for the open source community. Today, we stand at the brink of a “great wealth transfer” of copyright interests from this Boomer generation, and we have not done enough to address it. This dispute, between an ostensibly retired developer and a current maintainer, is a portent of things to come.

“Malus”: Is Copyleft Dead?

I can’t tell you how many times I have heard people predict the death of open source software over the years. Each time some new tech trend arrives–crypto, source available licensing, SaaS–people predict the death of open source. Lately, people have been predicting the death of open source because of AI code generation, and while some of the facts they cite are undeniable, their conclusions are not supportable.

This week, many people sent me the link to the Malus “Clean Room as a Service” announcement. I don’t think I’ve ever had so many people send me a single link. Now, this website is fake, but you have to wade into it a bit to figure that out. No real company would be foolish enough to represent anything as “legally distinct,” for example. The name “Malus” suggests a bad actor, and I guess it’s a pun on Manus, a popular AI agent. So, it seems site is a troll.

(And to the trademark lawyers: Gentlemen, start your engines.)

The Malus post says:

Finally, liberation from open source license obligations. Our proprietary AI robots independently recreate any open source project from scratch. The result? Legally distinct code with corporate-friendly licensing. No attribution. No copyleft. No problems.

The implication is clear: AI coding agents are bad actors that are killing free software.

I wrote about the AI use case of clean room development nearly a year ago. I didn’t cast it as a catastrophic prediction of the death of open source, so no one got excited. But it’s interesting to look beyond the political theater and analyze what is really going on here.

Copyleft Has Died Many Deaths

There are two kinds of open source licenses–the permissive and the copyleft. Permissive licenses don’t require you to do much, so nobody would ever bother re-engineering software that is available under a permissive license. Ergo, the Malus website is about copyleft. Copyleft licenses require you to share source code when you share binaries. Copyleft licenses like GPL were the original free software licenses, and were extremely popular in the early days of the free software movement.

The death of copyleft has been predicted for a long time, and in fact it is already dying. Copyleft as a licensing paradigm has been experiencing a steady decline over the past two decades. This is because of licensor choice, not an international conspiracy to kill copyleft. Over time, more and more software has been released under permissive licenses instead. The software under permissive licenses has become more and more important in computing. All that happened long before the advent of AI coding.

Why did it happen? In a way, copyleft became a victim of its own success. Back in the late 1990s, copyleft was a catalyst. The only way to get large organizations to collaborate on developing software was by forcing them to do so with a legal mechanism. That mechanism was GPL. It compelled them to share their source code, under threat of copyright infringement lawsuits. Before the early 2000s, large technology organizations would never have voluntarily shared their improvements, rather than keeping their improvements secret and proprietary. But over time, technology companies learned the value of collaboration. As developers decided to voluntarily share their improvements over time, the legal threat of GPL became an artifact.

The truth, I think, is that we’ve been living in a post-copyleft world for quite a while. Copyleft licenses are expensive and difficult to enforce, and in fact are rarely enforced. Almost all enforcement in the open source world is done via moral suasion and education about the benefits of collaboration.

“We view legal action as a last resort, to be initiated only when other community efforts have failed to resolve the problem.”–From the Linux kernel enforcement statement, which was signed by over 100 major kernel contributors.

Overall, voluntary collaboration has worked extraordinarily well; open source software is ubiquitous. Proprietary software, in this sense, is like cigarettes. In the 1960s, smokers were everywhere and smoking was cool. Now, smoking is considered trashy. That was a huge shift in attitude. And yes, there were laws that supported that change: workplaces and public buildings became non-smoking. But that wasn’t what make the change so successful. It was a collective realization that smoking was a really bad idea.

Similarly, there was a flurry of enforcement of GPL in the 1990s and early 2000s, but then it tailed off. Simultaneously, the Linux Foundation grew into a $300 million-a-year organization, with all the world’s biggest companies supporting collaborative development. We are now so accustomed to that “new normal” that we have have forgotten what a sea change it represents. That sea change didn’t happen because these companies feared the legal might of GPL. It happened because they figured out that it made more sense to collaborate on infrastructure than to individually re-invent the wheel.

Some free software advocates argue there’s a great deal of noncompliance for copyleft licenses. That’s true, but that doesn’t mean open source is not working. It just means it’s not working perfectly–and nothing works perfectly. In fact, today, compliance with open source licenses is better than it ever has been. It took many years for the culture of compliance to permeate the industry. It did so because open source devotees grew their ranks in private industry, and because license compliance goes hand-in-hand with security management. Meanwhile, the killer app for copyleft, the Linux kernel, is extraordinarily successful–to the point that it has eclipsed all other operating systems. And open source now runs everything.

Don’t Embarrass Yourself by Celebrating the Opportunity to Fork

The Manus hoax imagines developers celebrating their freedom from open source license requirements. Let’s take an extreme case for illustrative purposes. Imagine a fictional developer who is faced with the task of developing an operating system for an embedded system. Most developers would start with the Linux kernel. It’s freely available, it’s high-quality, and there are lots of engineers in the talent pool who are familiar with it. That reduces the developer’s cost in important ways. Easier recruiting. Less debugging. Community support. Better security.

But it’s licensed under GPL, and GPL has requirements.

Now, suppose this developer decides that it needs to keep its customizations proprietary–which would violate GPL. So it does a clean room implementation of the Linux kernel using an AI coding tool. Would that be a good idea?

Seriously?

That would be a terrible idea. Once you reimplement in order to avoid the license, you have essentially created your own fork of the project. Everyone knows it’s a terrible idea to create custom forks of open source projects. Your maintenance cost increase. Your security costs increase. Your talent recruitment and training costs increase. You have earned yourself a mountain of technical debt. The Linux Foundation recently released a timely report estimating the technical debt for maintaining private forks to “an average of 5,160 labor hours, or $258,000, per release cycle.” That’s a lot of expense.

Only ignorant and sub-optimizing tech managers–or maybe tech investors–would think it’s a good idea to incur those kind of costs in order to “protect” their IP. These are the same people who sit in board meetings parroting fears about IP value instead of figuring out how to build good products. They have the mindset of patent trolls, not innovators. No self-respecting developer would fall for this thinking error.

Copyleft is not dead–at least, no deader than it already was. Forking projects is too costly, and complying with open source licenses is too easy. Yes, there will be scofflaws and there will be those who want to re-invent the wheel in order to hold onto their perceived IP advantage, but they can travel their own path, and see their businesses struggle as a result. The rest of us will be whistling as we keep traveling on on the open source road.

What Truly Dies Without Copyleft?

There is one thing that truly is threatened by the putative death of copyleft, and that is the pure dual licensing business model.

Dual licensing, which was pioneered by MySQL in the 1990s, is a business model where you release something under GPL, then sell alternative licenses to those who can’t comply. In the open source world, this is sometimes referred to as selling exceptions. It only works when the project steward has the right to choose alternative licenses, either because it owns all the copyright in the project, or uses contribution licenses to clear rights in outside contributions.

In fact, true dual licensing has become quite rare in the last decade. Almost all businesses today that use the threat of GPL enforcement are owned by private equity, and the others who build businesses around open source have morphed into open core models. Open core provides a core of open source software for free, then sells enterprise features like compliance certifications, collaboration features, managed SaaS, or SSO. And almost all these businesses use permissive licenses, not copyleft licenses. So, the death of copyleft is not even a blip to them.

A Post Copyright/left World

Taking the long view, AI coding might mean that we are in a post-copyright world. But copyright was always an awkward fit for software. Copyright protects expression, not function, and the value of software is that it is functional, not beautiful.

It was always possible to clean-room software to avoid its copyright. It just used to be labor-intensive, and now it’s not. But I invite you to consider that this is not such a bad thing. Software now needs to prove its value proposition the way other products do, rather than via threats of IP infringement.

And the threats have been a drag on software development. The last few decades have seen developers spend millions of dollars on open source compliance, down to the level of trying to ferret out 5-line snippets from Stack Overflow in their code. Who really benefitted from that, other than Black Duck? The Malus site says that companies doing development “spend millions on software composition analysis tools like Snyk and Black Duck.”–and that is entirely true. This is what the legal threat of copyleft bought us: decades of unnecessary remediation and negotiating draconian open source terms in deals to allocate the risk for a specter of copyleft enforcement.

This waste of energy came from over-emphasis on threats of enforcement, which, over the years, came to be a goal itself, instead of a way to educate developers on the benefits of collaboration. Over time, the free software community seemed to convince itself that GPL as a legal weapon is the goal of free software, not just a tool to support open development models. Proprietary software developers and free software advocates alike have sown Fear, Uncertainty and Doubt (FUD), and derailed what should have been the lesson of the open source movement: that collaboration produces better code.

People love open source software not because it’s free, but because it is open. That openness brings all sorts of benefits. Fear of lawsuits has skewed the discussion toward the cost of acquisition: Is it better to buy proprietary software or incur the cost of GPL compliance? But the much more interesting conversation is about the benefits, and the benefits of open source software have proven themselves, so repeatedly and decisively that they hardly bear explaining anymore.

Death, or Afterlife?

I find it curious that people seem to enjoy constantly predicting the death of all good things. Anyone who reads the news today can’t help but be appalled by the predictions of apocalypse that bombard us at every turn, all to garner clicks and views and waste our day with doomscrolling. It’s a shame that so much of our society today is about generating fear via misinformation. It’s also a danger, because focusing on doom derails the conversation from the right focus–like how can AI help generate and refine software so we can do more and better things with it. We will accomplish nothing with visions of HAL 9000 dancing in our heads.

It is not so easy to kill something like open source. Free and open paradigms are self-actuating. The invisible hand of the market makes them so. Too many people benefit from these paradigms to let them die, and humans are very clever at protecting what works for them.

Copyleft will survive, to the extent it deserves to survive, and open source is stronger than ever.

Actors Trademarking Themselves

An article (sorry for the paywall) appeared recently in the Wall Street Journal under the title “Matthew McConaughey Trademarks Himself to Fight AI Misuse–
Actor plans to use trademarks of himself saying ‘Alright, alright, alright’ and staring at a camera to combat AI fakes in court
.”

The WSJ article said, “McConaughey’s lawyers believe that the threat of a lawsuit in federal courts would help deter misuse more broadly, including for AI video that isn’t explicitly selling anything.” It also quoted the lawyer as saying: “I don’t know what a court will say in the end. But we have to at least test this.”

I think the more accurate statement is: Mr. McConaughey’s lawyers are generating fees (or possibly appeasing a demanding client with speculative legal work) by doing trademark filings on something that is not properly the basis for a trademark. But I guess a client who is a wealthy, successful actor can easily fund speculative trademark registrations, so…everybody wins. Except maybe the PTO. Perhaps I need more clients like that.

To me, trademark law covering a man saying “alright” sounds much more speculative than publicity rights–which are already designed to protect a personal likeness and image. On the plaintiff’s side, the problem is that trademarks are intended to cover the source or origin of goods and services, and a guy–real or fake–saying something in a video clip is neither of those. On the defendant’s side, if the AI truly “isn’t selling anything” then a trademark claim is weak. Trademark infringement is a commercial tort. Publicity rights, in contrast, are designed to redress claims about personal, rather than commercial, reputation. Both are also vulnerable to first amendment limitations. Mr. McConaughey is, after all, a public figure who has voluntarily placed himself in public view.

The better legal policy would be to advocate for consistent publicity rights via federal law, instead of the crazy-quilt of state law that currently covers it in the US.

I suspect also that one element of the fitting-a-square-publicity-rights-peg-into-a-round-trademark-hole strategy is to leverage international treaties about trademark that might not extend to publicity rights.

Looking for the Registration

I took a look on the USPTO site, because I was curious to see how to claimed goods and services would be described, and found this:

All of the above were filed for products, but have been abandoned, though it’s not clear whether the abandonment may have been “suggested” by the actor’s lawyers.

Now, there are many registrations at the PTO by J.K. Living Brands, which is apparently McConaughey’s trademark holding company. But I could not find any application of the kind described in the WSJ article.

A couple of the live registrations are for “entertainment services” in category IC 041.

The goods listed for this last one are “Entertainment services, namely, personal appearances by an actor and celebrity; entertainment services, namely, acting services in the nature of live performances and personal appearances by an actor and celebrity; entertainment services, namely, acting services in the nature of live visual and audio performances by a professional entertainer; entertainment services, namely, film and television show production service.”

But a meme of someone saying “Alright” would not be in this goods description.

So, is this just another example of the press drumming up a headline without any particular regard for how IP law works, or a brilliant new legal tactic that IP lawyers need to learn? That’s unclear.

I will update if I find the registration or more explanations of why this tactic should work.

Is AI the re-Democratization of the Web?

For a few years now, the news has been full of prognosticators screeching about the dangers of AI. And while some of it is potentially concerning, we all know that the news tends to lean into the catastrophic. So, I’ve been thinking about one aspect of the advent of AI that might actually be great – at least for the time being.

Once upon a time, the web was a level playing field. I remember my delight in being able to use algorithmic search results. In those results, even small webpages sometimes came up before big ones.

Then the commercialization of search started–and never stopped.

Don’t get me wrong, there were some things about the commercialization of search that were great. The theory was that people who were willing to pay to show search results typically had more resources and therefore offered better products or more interesting information. And those who complain about targeted ads have surely forgotten the early days where every ad was for Viagra.

Once Upon a Query

For a while, search engines like Google clearly separated algorithmic and paid search results–whereas some search engines leaned more heavily into paid results without identifying them as paid. And each of us used the search engine that fit our needs best. I was an Altavista fan until it got acquired by Yahoo and mothballed. Altavista was the algorithmic search engine beloved by nerds everywhere.

But eventually, paid search took over the web experience. These days, you can’t even search for information about hotels without getting an entire page of results from aggregators–so much so that the official sites of the hoteliers are actually hard to find. And don’t get me started about trying to file government documents; the actual government sites are buried in a slew of ads by charlatans who want to charge you money to file something that is usually just as easy to file yourself.

For Now, AI is Better

Now, recently, we’ve seen some hue and cry in the press about AI taking over search. Let me remind you that, a few years ago, the same hue and cry was about videos taking over search. All these articles seemed to imply that anything taking over search was a danger, because (reading between the lines) search yielded up purer, more factual, or less brain-rot results. These articles bemoanthat the golden days of search were over, and possibly that Google’s ad-related business model is doomed–though given the Google-hating so common in media, it wasn’t clear why that was supposed to be a cause for alarm. 

Recently, OpenAI announced a browser called Atlas. Again, the alarm bells sounded for the death of search.

Then I started thinking, is that really a bad thing? When I ask AI a question, the AI answers based on what it knows. And mostly, it knows facts, not the potential for ad revenue. I also get web links as references in the answer. Those references seem to be more like the old days of search, where information took precedence over advertising.

Here’s an example, I searched for a flight to Samarkand. With Google Search, the entire first page was paid results. It found Turkish Air, which was good, but the first hit was Delta.

Now, Delta and Lufthansa are not the best flight to anywhere, in my experience, but guess what? Delta–the top result–apparently doesn’t even go there.

Meanwhile, Claude gave me a lot of useful information. But even AI is at the mercy of what is on the web, so it pointed me to an aggregator instead of an airline.

And so, exactly who is surprised that AI is replacing search? I mean, AI is helpful, but the problem is that search is broken. 

Waiting for the Other Shoe to Drop

Now the question is: where will the search ads go? What will be the next business initiative to divert my attention from what I want to see, to what advertisers want me to see? Ads aren’t in AI results yet, because the AI providers are getting paid for using their models. In that sense, Google search is more like the old over-the-airwaves TV model: the service is free, but the ads pay for it. Now, for AI, we seem to be in the equivalent of the early streaming days: pay for the service, but no ads. But we all know what happens next: pay for the service, and see ads, as well.

Meanwhile, let’s enjoy this time, which we might later look back on as a golden age of ad-free AI search results.