Site icon Copyleft Currents

AI Could Be Your Next Team for Clean Room Development

Clean room developments are necessary when a developer wants to “cleanse” the intellectual property burden of third party software. The need arises when third party software is provided under unacceptable license terms, or not licensed at all. This is one of the trickiest tasks in software development, but it has a long history of best practices.

The canonical clean room development seeks to avoid trade secrets of proprietary software. But the rise of open source has resulted in the need to do a different kind of clean room project, meant to avoid the copyright in open source software–usually for GPL licensed packages. The two situations call for a slightly different approach. A clean room process for proprietary code seeks to avoid trade secrets and copyright burdens, whereas clean room development in open source is entirely about copyright–because there are no trade secrets in open source software. In either case, a team of developers seeks to write new implementing code from scratch, so that code will perform the same tasks, with the same inputs and outputs, as the original or “target” code.

A traditional clean room development process looks something like this:

Of course, there are far more complex processes for clean room development. Some have three teams, and most have a lot more steps. I have seen guidelines so many pages long they have a table of contents. But the above is the essence–not to mention the most my clients have the patience to read.

Not Enough Humans

The problem most companies have when performing a clean room development is that they don’t have the resources to create two separate teams. Even if they do, they usually cannot create an implementation team that has never been exposed to the target software–and doing so is particularly difficult when the target software is open source, because there is no way to prove lack of access to publicly available materials. For an open source clean room process, we usually make do with developing implementing code in an environment that does not have local access to the target code.

But now, with the advent of AI, we have an alternative way to approach clean room development.

I pause here to note that while there are those who think that all generative AI is prima facie copyright infringement, I don’t agree. As long as the model has been trained on enough inputs, it should not parrot any one input. (More on that here.) So let’s set that issue aside, because if you disagree with me, you shouldn’t be using AI coding tools at all and you should just put this article aside.

An AI that writes code (like Claude or Co-Pilot) has probably been exposed to almost all the open source code ever written. But via that training process, it is unlikely to focus on specific target code. So, companies struggling to staff a clean room development might consider replacing one or both of the teams with AI.  As always, some human oversight is necessary to check that an AI generative process has been done correctly. But using it would still greatly reduce the headcount necessary to implement the clean room process.

Neither of these suggestions should be surprising. AI code generation greatly reduces the human effort necessary to produce code, and clean room projects are human-intensive. For an open source target, I think the use of AI as a specification team is quite interesting. For proprietary code, using AI as the implementation team may be particularly interesting, because AIs are mostly not trained on proprietary code, making the cleansing more reliable.

Always remember: wash your hands before you code!

Exit mobile version