I am thrilled to announce that, with generous support from Mozilla and the Apache Foundation, we have officially launched FOSSDA, the Free and Open Source Stories Digital Archive. It’s time to tell the story of the free and open source movement! This project is now officially underway, thanks to all those who have helped make it happen.
Is Copyright Eating AI?
Marc Andreessen famously said that software is eating the world. But the latest and greatest software trend–generative AI–is in danger of being swallowed up by copyright law. Like a cruise ship heading for a scary iceberg, AI is in trouble, and the problems are mostly below the surface.
We now have a pair of lawsuits claiming that GitHub’s Copilot model is stealing open source code from its authors, and that companies using Stable Diffusion or other models (including Stability AI, DeviantArt, and Midjourney) are stealing images from visual artists. Both lawsuits are being prosecuted by Matthew Butterick (best known as the author of Typography for Lawyers) along with the Joseph Saveri Law Firm, a class action firm.
The Co-Pilot lawsuit is widely touted in the press as a copyright infringement case, but in fact it doesn’t claim copyright infringement. It does claim a litany of other wrongs based on torts like removal of copyright information, breach of contract, and fraud. The Stable Diffusion suit is in fact a copyright infringement suit. More importantly, and sadly, these lawsuits are probably a bellwether of more to come.
The Co-Pilot suit is ostensibly being brought in the name of all open source programmers. Yes, that’s right, people crusading in the name of open source–a movement intended to promote freedom to use source code–are now claiming that a neural network, designed to save programmers the onus of re-inventing the wheel when they need code to perform programming tasks, is de facto unlawful. The open source movement is wonderful in many ways, but its tendency to engage in legal maximalism to “protect” open source is sometimes disappointing.
The Stable Diffusion suit alleges copyright infringement, stating that, “The resulting image is necessarily a derivative work, because it is generated exclusively from a combination of the conditioning data and the latent images, all of which are copies of copyrighted images. It is, in short, a 21st-century collage tool.” That characterization is the essence and conclusion of the lawsuit, and one with which many AI designers would disagree.
So, all neural network developers, get ready for the lawyers, because they are coming to get you.
Fair Use or “Fair & Ethical”?
The crux of the problem is that US copyright law, despite many landmark cases, still gives us little or no guidance on how copyright applies to the defense of fair use. The Oracle v. Google case, the biggest fair use case of this century, ambled on a lengthy and astonishingly expensive road to a Supreme Court decision. As Larry Lessig famously quipped, “fair use is the right to hire a lawyer,” and the Supremes proved that true by issuing an opinion that provided little guidance outside of the specific facts of the case.
However you may feel about Google, it’s lucky that Google has the determination and resources to have spent astronomical legal fees defending the right of fair use–from books, to thumbnail photos, to news headlines, to software interface specifications. Users of the web benefit from that. If the AI industry avoids this iceberg, it will be partly because of Google’s historical unwillingness to roll over on fair use cases.
Let’s hope Microsoft (which funded OpenAI and owns GITHUB) has the Google-like intestinal fortitude and money to win this battle. But if Oracle v. Google is any measure, the answer might not come for 10 years, by which time the neural network industry may have been litigated out of existence–or worse yet, limited to those large players who can fund an expensive legal defense. For startups, having a lawsuit hanging over their heads is usually a death knell, between expensive legal bills siphoning off their development resources, and investors shying away from the risk.
Tell Me What You Want, What You Really, Really Want
One perplexing aspect of the lawsuits–and likely all that will follow in its footsteps–is what best practices the plaintiffs actually would want the AI industry to adopt going forward. Butterick says his class action cases are “another step toward making AI fair & ethical for everyone.” But other than netting a hefty fee for the lawyers who bring the suit, what is the endgame, exactly?
Both lawsuits ask for permanent injunctive relief, which would essentially shut down the use of the accused models, but that is part of the playbook for litigation and probably not the result they would prefer. And even for most people who sympathize with the lawsuits, that is not the preferred endgame. Though there are lots of memes out there about Skynet, most people do not want AI to shut down, and if they do, it’s not because of copyright law.
One possible best practice would be to allow authors to specifically opt out of use of their output for ML training. (In fact, Stability has suggested this approach.) This type of approach can work when technical development bumps up against the limits of copyright law. For example, there is a “do not index” mechanism (robots.txt) for web sites that is broadly honored by large scale search engines. But such a convention would have a prodigious backlog to tag, and also, for software authors, prohibiting ML training would be antithetical to the Open Source Definition. So that probably won’t work.
Another possibility is compensation for those who wrote the original material used to train the models. Over the years, there have been various attempts to compensate authors for numerous and small contributions to copyrightable works. This is primarily an information problem, and those who try to solve it usually propose a blockchain based approach, lest payment transaction costs outweigh the compensation. None has been successful yet.
Even if there were a technical solution to the information problem, it would be difficult to allocate compensation to a broad set of creators in a fair way. In the music business, there are artists’ rights organizations like ASCAP and BMI that amalgamate the power to grant blanket music performance licenses to consumers of music, like restaurants that play music over their sound systems. In fact, these rights amalgamation organizations enjoy a limited safe harbor from antitrust law, because they facilitate what would otherwise require millions of small, individual licensing deals.
But this will not work for generative AI. Performance rights organizations reward their authors roughly according to the popularity of their songs. For generative AI, it would be functionally impossible to track which work had been used, because the output is not, in fact, a copy of the original, nor even a collage–but a new work synthesized from a model trained using the original work. If compensation is not tracked to the images actually used, then we would likely see a spate of garbage images being thrown into the mix to grab some of the proceeds. It would be easier to set up a grant fund for artists generally than to track the contributions among millions of artists to a single AI-generated image.
The problem is that neural network models, and their outputs, are not copies of the original works. They are a set of probabilities (weights) that are trained based on thousands or even millions of data points. And at least as of now, it is not possible to look at ML output and determine which inputs, nodes and weights created it. ML, for now, is mostly a “black box” whose inputs and outputs are impossible to connect. In fact, the lack of reproducibility of ML has already been tagged as a social issue: if you build a model that discriminates in its output, how do you audit it? Eventually, the ML industry may solve this problem, but for now, it means there is a usually disconnect between the inputs and outputs, and that probably means that copying could never be inferred in a way that could reliably allocate compensation to the authors of inputs. That, in turn, should mean there is no copyright infringement, but the lawsuits posit otherwise.
Moreover, there is a notice problem. Each of the lawsuits alleges a claim under the Digital Millennium Copyright Act (DMCA) 17 USC §1202(c) of the DMCA) (“CMI”), which prohibits removal of copyright information such as copyright notices. But even assuming that some license notice, or copyright notice, would have to be communicated whenever an AI output was generated, how exactly would that happen? Would each resulting image require thousands or millions of notices? Even now, conventional users of open source code struggle greatly with management and delivery of license notices–anyone who has worked on open source compliance knows how difficult that can be. But these lawsuits make that problem look like child’s play.
If AI Dies, Who Wins?
Both of the Butterick suits are being brought as class actions–a type of lawsuit popularized in the US and still relatively unusual elsewhere. You may have gotten notices from class action lawyers asking you to opt in to a settlement class to which you belong. If you’re like me, you toss them out, because your reward for joining the class will probably be a coupon or a princely settlement of $20.
And so, who benefits from class action suits? Well, class action lawyers. When you hear that a class action suit has resulted in $6 million in damages, the lawyers probably get about $2 million (one-third). Because a class can consist of thousands of members (or in the case of the Butterick suits, probably millions), the damages allocated to the individual class members can be tiny indeed. Sometimes, the lawyers actually get bigger payouts than all the plaintiffs combined. The US class action model has been strongly criticized for being a vehicle for enrichment of plaintiff’s lawyers that provides relatively little real compensation to the plaintiffs they represent. Class action proponents use populist rhetoric and anecdotes to justify their suits, but empirical studies are relatively few, and sometimes, damning. (See for example: https://instituteforlegalreform.com/research/do-class-actions-benefit-class-members/ and https://www.tortreform.com/news/study-class-action-lawyers-often-take-more-money-from-settlements-than-class-members/)
If the AI industry is to survive, we need a clear legal rule that neural networks, and the outputs they produce, are not presumed to be copies of the data used to train them. Otherwise, the entire industry will be plagued with lawsuits that will stifle innovation and only enrich plaintiff’s lawyers. Matthew Butterick has stated that these lawsuits are an attempt to set a precedent in favor of artists, because the law is unclear. Lack of clarity causes people to act conservatively to avoid liability, and that stifles innovation. Given that the courts are unlikely to come up with a common-law rule in this decade, clarity probably needs to come in the form of a legislative amendment to the copyright law. Unless it comes soon, the generative AI industry may be in trouble.
It’s unclear whether Butterick’s suits are mostly a publicity stunt and a ploy for the plaintiff’s lawyers to make a windfall, or a selfless attempt to provide equity for authors, or somewhere in between. But one thing is sure: they will spark a cottage industry for plaintiff’s lawyers, cause crippling expenses for AI developers, and thwart innovation in the generative AI field. As the tech industry celebrates the frothy emergence of machine learning in a time of economic doom and gloom, let’s hope this nascent field doesn’t sink because of the copyright iceberg looming ahead.
Note: Since I started preparing this article for publication yesterday, an additional case was threatened in London by Getty Images regarding Stability AI. Because Getty is the single owner of so many images, and outside the US, this is not a class action suit, and may be more likely to result in a settlement.
Update February 6, 2023: The other shoe drops: Getty filed a complaint in Delaware against Stability AI.
Also, this blog post is a personal opinion, and nothing I have written here should be attributed to any of the parties involved.
Securing Open Source Software Act of 2022
A bill was recently introduced in the US Senate, entitled the Securing Open Source Software Act of 2022.
I don’t usually write much about pending legislation, because it often does not ever become law, or changes substantially before it becomes law. This bill is unlikely to be passed this year because of its timing. But it has a few interesting characteristics.
- It is a bipartisan bill, introduced by Gary Peters (D-Mich.) and Rob Portman (R-Ohio, both members of the Senate Homeland Security Committee.
- It defines both “open source software” and “open source software community.”
- It focuses on requirements for software bills of materials (SBOMs), and security concerns, drafting on the Executive Order, from earlier this year, about software security Executive Order on Improving the Nation’s Cybersecurity (EO 14028).
- It establishes “the duties of the Director of the Cybersecurity and Infrastructure Security Agency regarding open source software security” and requires the Director to regularly assess open source software used by the federal government. So it establishes a process, more than substance.
If I had to guess, I would say the bill seems likely to pass in some form, next year. If it does, it’s unclear exactly how it will interact with the recent EO. Also, improved security assessment is good for all software, not just open source — open source security breaches get a lot of press, but all software has potential security issues, and the government should be concerned about its use of proprietary software as well. Finally, to the extent new law establishes requirements for government, or even other customers of software, the private market is mostly ahead of these requirements already. Most software vendors know that customers are already very demanding regarding security requirements. The effect of new law could be to normalize those market demands in private sectors and government — but we will have to wait and see.
Victory for FireTek in the PyroTechnics Case
About a year ago, I wrote about a copyright case involving fireworks firing codes. This case did not get a lot of attention at the time, and it was yet another example of a plaintiff using copyright law as unexploded ordinance (if you will forgive the pun) to harass its competitors, rather than to protect works of authorship.
Fortunately, the Third Circuit recently vacated a prior injunction in the case, for lack of likelihood of success on the merits, and remanded to the district court with an order to to dismiss the claim with prejudice.
The court analyzed the copyright protection of both Pyrotechnics’ digital message format, and the digital messages created with it. The opinion linked above provides interesting detail on how the messages worked.
The court said, “Pyrotechnics’s digital message format is an uncopyrightable idea and the individual digital messages described in the [copyright registration] are insufficiently original to qualify for copyright protection.” Regarding the message format, it concluded:
Pyrotechnics admits that there is no way for the control panel to communicate with the field module without using the digital message format. Because there are no other “means of achieving the [protocol’s] desired purpose” of communicating with the devices, the digital message format must be part of the uncopyrightable idea and not a protectable expression.Citing Whelan, 797 F.2d at 1236.
As to the messages using the format, the court said, “The digital message format provides rules for constructing messages with particular meanings, and individual messages are generated by applying those rules mechanically.” Because there was insufficient human creativity in creating the format, the messages were not protected by copyright. It further noted that even assuming the messages were creatively produced, there was no creativity in the structure, and ordering, because “using leading header bytes for synchronization and a trailing byte as a cyclic redundancy check are standard communication practices, not creative sequencing.” The court relied on its prior decision in Southco, Inc. v. Kanebridge, 390 F.3d 276, 282 (3d Cir. 2004) (en banc), regarding a numbering system for fasteners, from which the court drew many parallels.
Fighting a David vs. Goliath Copyright Battle
I got a chance to correspond with the owner of FireTEK. Here is what he had to say about the case:
How did you become interested in fireworks systems? How did you learn to engineer them?
The first time I was in the backstage of a fireworks display it was with my system. I started this project because someone asked me if I could do this kind of system to control fireworks, in 2009. I said no a few times, but he insisted, so I decided to try to do it. But then he decided he did not need the system anymore. Also, I had hired some people to do the work, but they quit. So I started researching and learning, and did almost everything myself, and then I became passionate about developing and innovating.
Even though the project started with a lot of problems, now I could see almost all the problems I had, and that helped me make a better product. For example, a main component went out of production and I was forced to find a replacement, but the replacement I found was even better. Because I could not afford someone to do the project, I was forced to learn myself. Even this copyright trial was a learning experience, and a good thing in the end.
Copyright cases can be complicated. What encouraged you to fight the claim?
I run a very small company in Romania. Maybe the plaintiff thought I would not be able to defend myself, but that was not true. This was a difficult and expensive case for me to defend. I was also concerned that, because I was outside the US, I would not get justice from a US court. I knew I had the law on my side, but I didn’t know whether that meant I would win. I kept going, though, because this is my business and I need to protect it.
What do you think tipped the case in your favor?
To be honest I thought it would be an easy win at first, from what I have read about copyright and compatibility. I don’t think the district judge understood the differences between my product and theirs. It seemed to me that the district court opinion mostly copied the plaintiff briefs. That opnion never mentioned my main argument under 17 USC 102(b) and US Supreme court decision in Baker v. Selden. Furthermore, the district court found the work equivalent to an object code and found fixation even though one of the authors clearly states it is “not source code that resides in a computer or in a microprocessor somewhere.” The deposit material for the plaintiff’s copyright was just a simple text created after infringement from memory of someone which was not even listed as author and which briefly describes the protocol.The judge in the appeal understood this. That opinion shows the judge studied it carefully. But I don’t think the district court understood it though I found that hard to believe.
After reading the district court opinion and some other rulings made against me, I was feeling as the main character “The Trial” by Kafka.
What advice do you have for small businesses fighting legal claims?
It will not be an easy fight, but if you know you are right, go to fight. Very important in my case was my involvement in the strategy, briefs, arguments, and of course learning from mistakes. You know your case better than your attorney, so you should read, understand and correct the mistakes your attorney may write in the briefs.
What is next for you in your business?
Even though I won this case my goal is not to get clients with cheaper compatible products. I try to innovate and convince them to buy my product because of all it offers. The compatibility is only an argument to convince the potential client to try my product, because I know they can not switch so easily to another product. It is like you have an entire company which runs a CRM software and you decide to switch to another one if it has some advantages but you find out the cost of the switch is much higher if there is no compatibility to migrate databases for example.
Congratulations to FireTEK for winning a battle against copyright maximalism, just in time for Independence Day!
Open Source and ESG
Here are my latest musings about why open source is so much like ESG — both in legal risk assessment and investment consideration.
New Jurisdictional Ruling in Vizio Case
On May 13, 2022, the US District Court of the Central District of California issued a decision in Software Freedom Conservancy v. Vizio.
For more about the initial complaint see my previous post. In brief, SFC had sued Vizio in California state court, alleging that a violation of GPL was a breach of contract, and seeking declaratory judgment and specific performance (release of source code). SFC is not the author of the GPL code at issue, and the authors of the code are not party to the suit.
The issue in the removal motion was whether there was federal jurisdiction, sufficient to shift the lawsuit from state court to federal court. Federal courts in the US have exclusive subject matter jurisdiction over copyright claims, and diversity jurisdiction over other claims that are between parties in different states, and where damages are sought in excess of $75,000. 28 USC § 1332. But SFC’s complaint was neither a copyright infringement claim nor a claim for damages. This neatly avoided federal subject matter or diversity jurisdiction. The court said, “There is no dispute that SFC’s complaint alleges only state law claims, and the Parties agree that the action is removable only if SFC’s claims are completely preempted.”
The remaining question was whether the state law claims were pre-empted by copyright law. Preemption is a legal doctrine that reserves certain kinds of matter to the exclusive regulation of federal law. Under 17 USC 301a, federal law clearly pre-empts all claims that are within the scope of copyright law. Here, however, the court said the state law claim pled by SFC was different from a copyright claim. Therefore, the court in this case did not take jurisdiction over the matter away from the state court.
To support its preemption argument, Vizio argued unsuccessfully (relying on MDC v. Blizzard) that all violations of license conditions should only result in copyright infringement claims. But that was converse logic. MDC was about what cannot be a copyright claim, not what must be one.
Through the Looking Glass
The parties’ positions were unusual, because defendants generally prefer defending contract claims to copyright claims. Open source enforcers almost always bring the actions as copyright claims rather than (or in addition to) contract claims. Copyright generally affords more generous remedies than state law contract claims, including the possibility of statutory damages. Also, injunction is often available as a remedy under copyright law, but rarely under contract. So, SFC v. Vizio is an anomaly, where the plaintiff argued for contract claims and the defendant argued for copyright claims.
Moreover, there is a history of open source licensing pundits theorizing that open source licenses are “not contracts.” It has long been my view that this theory was misplaced. (I think, originally, that theory was intended to avoid contract formation arguments under the law of the 1990s, which arguments were largely sidelined in 1996 by the ProCD case (86 F.3d 1447 (7th Cir. 1996). But I am reading between the lines there.) GPL2 famously says, “You are not required to accept this License, since you have not signed it.” But the existence of a contract depends on the facts surrounding contract formation, and not so much on what the document says. It is more accurate to say that open source licenses need not be contracts to be enforceable, because violating their conditions provides the basis for a copyright infringement claim, as clearly established in US law, mainly under Jacobsen v. Katzer. Contract claims are therefore usually a belt-and-suspenders tactic in open source enforcement.
The law has never clearly articulated the answer to the “license or contract” question, but Jacobsen v. Katzer suggested contract claims were possible, as did Artifex v. Hancom. So in sum, there is case law saying copyright claims are possible, but none saying contract claims are impossible.
Neither a Win Nor a Loss, Yet
Some open source advocates are celebrating, but that celebration is probably premature. This result does not mean the federal court is saying there is a valid state contract claim, or that the remedy sought is available, only that there is no basis for federal jurisdiction.
Assuming the ruling stands and the case goes back to state court, there are a few possibilities:
- there is a state law claim for which the remedies sought are available
- there is a state law claim but the remedies sought are not available
- there is no sustainable state law claim
SFC still has a lot of bridges to cross–the highest and widest being the availability of specific performance as a remedy for breach of contract. Most legal commentators have long believed there is no basis for such a remedy under licenses like GPL, only damages for breach. And while injunctions are often available for copyright infringement, those injunctions are negative injunctions–orders to cease and desist using the infringed material. Specific performance, sometimes called a positive injunction, is a very rare remedy under contract law, and a court forcing a technology company to disclose proprietary information due to a breach of contract is virtually unheard of. Courts tend to order people to stop doing things, not to do them. The reason for this is deeply woven into our notions of civil liberty, granting courts only limited powers. Injunctive relief is an exceptional remedy, because it impinges on civil liberty. Courts can order you to go to prison (for crimes) or pay money (for torts), but they don’t usually order you to do much else. That’s in part because they don’t have the resources to police positive injunctions. When you read in the news about a court ordering someone to do something, it is usually part of a negotiated settlement or consent decree.
This begs the question of what will happen to open source adoption if SFC is successful. The resulting risk of everyone–not just authors–having authority to sue for enforcement of GPL may significantly disrupt the open source ecosystem. Back in the late 1990s, companies were extremely reluctant to use GPL code (like Linux) due to the perceived legal risks–which were in part trumpeted as FUD by anti-open-source actors like Microsoft. But over the years, the tech industry grew comfortable that the only risk of lawsuits was from authors, and most authors preferred informal enforcement to legal process. That process of acclimation took decades. A change in this assumption could cause companies to stop using the GPL software they have grown comfortable using over the years, and it’s not clear that would be a win for the open source ecosystem.
Crypto for the Rest of Us, Part III: Smart Contracts
Welcome to another installation of me teaching myself about crypto by writing about it! In past posts, I have discussed topics like Bitcoin mining and proof-of-stake. I’m not an expert on crypto, but I like learning about technology, and these posts are an attempt to share what I have learned at a level that is not too basic, but not too advanced, for readers who are curious, like me. Actually, this topic has a lot to do with contract law, so I am more comfortable writing about it than those prior posts.
The thing I hear repeatedly about smart contracts is that they are neither smart nor contracts. Let’s figure out why people say that, and whether it is correct.
What is a Smart Contract?
A smart contract is a transaction that executes automatically on a blockchain, according to a set of programmatic rules. The most commons kinds of smart contracts today are trading, investing, lending, and borrowing–financial transactions. Also, non-fungible tokens (NFTs) are created via a minting process embodied in a smart contract.
The term smart contract was coined by Nick Szabo in 1998. He wanted to apply the principles of distributed ledgers to executing transactions.
Not all blockchains support smart contracts. The most popular one that does is Ethereum. On Ethereum, smart contracts are executed via the Ethereum Virtual Machine (EVM). The EVM is isolated, in the sense that it has no access to network, file system or other processes on which the blockchain is running. That is important so that bugs in smart contracts cannot affect the entire blockchain, or other transactions.
Are they Smart?
Not in the lay sense. The quintessential example of a smart contract, suggested by Szabo, is a vending machine. If you put in the money, and press a selection button, the snack drops down. That is not very smart, but it is automatic, and eliminates the need for a human to fulfill the transaction on the seller side.
Smart contracts are smart in that they are self-executing without human intervention. In contrast, traditional legal contracts are almost always promises to do things in the future, so they need to be performed by humans, and tracked, to be sure the parties actually perform them.
Actually, smart contracts are usually nowhere near as complicated as most traditional written contracts, which often have pages of terms and conditions. At this point, smart contracts contain rules that are “fairly rudimentary.” Here is an example of a simple smart contract:
This example creates an Ether wallet. As you can see, the language looks similar to C++. The first lines of the program name the license for the code, and the language it is written in. The program logic after that is quite simple. There are many other examples here.
Smart contracts work because the EVM can send a transaction from one account to another account, and that transaction can include binary data (the payload) and Ether (the cryptocurrency). If the receiving account contains logic (the smart contract), that logic is executed with the payload as input data.
To execute a smart contract requires the payment of a transaction fee, called a gas fee. (Because it’s Ether.) A smart contract that is more complex will cost more gas to execute. The maximum gas price is set by the creator of the transaction. If the authorized gas is insufficient, the transaction will not execute. If some gas is left after the execution, it is refunded to the creator.
You can see some examples of smart contracts on Ethereum here.
Are they Contracts?
Yes, but the bar for formation of a contract is pretty low. And smart contracts are usually not what lawyers call executory contracts. Under law, a contract is formed when there is an offer, acceptance, and consideration (an exchange of value). Once the contract is formed, the parties must perform the terms of the contract, and until they are done performing, the contract is called an executory contract.
Smart contracts are often touted has having the advantage of not requiring a third party to authorize them. But in fact, in most modern economies, traditional contracts do not require a third party to authorize them, exactly. Or perhaps only by implication, because parties perform contracts because of the threat of a third party’s involvement. If one party to a contract breaches, the other party can sue for damages, and a neutral third party (like a court or arbitrator) can enforce the contract, based on contract law. But most contracts are never litigated. So, while traditional contracts can be executed between the parties with no third party intervention, the more accurate accurate distinction is to say that smart contracts are concluded on the blockchain with no additional human effort.
Smart contracts happen immediately–or at least, as soon as the blockchain can process them. Of course, traditional contracts can be executed quickly, too. If you walk up to a fruit stand, hand over a dollar and take an apple, that’s a contract, and it executes immediately. But those kinds of contracts are not usually written in advance.
Unlike traditional contracts, smart contracts are transparent and irreversible, because they execute on a blockchain. Everyone can see the smart contract if they look at the blockchain. In contrast, the terms of traditional contracts are often confidential, and they can be amended or voided by the parties.
Also, traditional contracts are rarely pseudonymous. In fact, it’s important, in a traditional contract, to clearly identify the parties–which is why we include corporate names, addresses, and other identifying information when we define parties in a contract. That’s to help enforce the contract against the right person, if necessary. But blockchain transactions are by definition pseudonymous. The contracting parties do not have to know each other’s true identities. Depending on your viewpoint, this is a benefit or a drawback.
Smart contracts can stand on their own, or be paired with a traditional contract and execute only some of its provisions. For lawyers who are accustomed to traditional contracts, it may be easier to think of a smart contract as the execution of an escrow or a closing process. Money is paid upon the occurrence of a specific trigger. And like escrow agents, smart contracts need specific and unambiguous direction as to what is to be done to effectuate the transaction. And complicated instructions are generally not a good idea.
Like traditional contracts, smart contracts can suffer from ambiguity or “bugs.” Any program can fail to account for contingencies in the real world. But smart contracts are mostly quite simple, at this point. In comparison, traditional contracts are complicated, and the reason people hire lawyers to write them is that they need lawyers to consider the myriad things that can go wrong, and whether and how to account for them in the contract.
What’s Next for Smart Contracts?
First of all, they will not be replacing traditional contracts any time soon. But they may make inroads into replacing escrows or closing processes. However, pseudonymous contracting is probably insufficient for some transactions, such as when the parties have know-your-customer (KYC) requirements or require complex closing conditions.
The applications of smart contracts are likely to to expand over time. While today, smart contracts mostly involve payments and similar financial transactions, they are already being used for investment transactions, automated asset management, and self-paying loans. In the future, they might be used to establish pseudonymous digital identities, transfer assets such as land title, manage capitalization tables, verify supply chains, or conduct public elections.
It’s also likely that new blockchains will support smart contracts. The market for different blockchains and cryptocurrencies is highly competitive, and the additional functionality of smart contracts can help make a blockchain more popular.
Here are some additional resources, if you want to learn more about smart contracts:
- Ethereum and smart contracts
- Solidity tutorial
- Lots of ideas about the use of smart contracts
- Cryptokitties smart contracts
Crypto for the Rest of Us, Part II: What is Proof of Stake and Why Should You Care?
In a prior post I wrote about the basics of blockchain and proof-of-work Bitcoin mining. One topic omitted from that post was the alternative consensus mechanism, proof-of-stake. Last week, the EU turned down a proposal purporting to outlaw the use of proof-of-work cryptocurrencies in the EU. Also, Ethereum–the world’s second most popular crypto blockchain–is in the process of moving from proof-of-work to proof-of-stake. So this week, I am looking at the difference between these methods of consensus, and why proof-of-stake is likely to eclipse proof-of-work in the future.
Problems with Proof of Work
The main problems with proof-of-work are that it requires a great deal of power, as well as specialized computing equipment. Power requirements are an environmental problem, and specialized equipment is an access problem–particularly during a time when semiconductors are in short supply. If you want to know more about how proof-of-work mining happens, take a look at my last post.
But proof-of-work is also designed to have a limited throughput and pace. New blocks on Bitcoin are designed to be created about once every 10 minutes, and that means transactions take 10 minutes to happen. That may sound fast, but in the world of globalized currency exchange, it is not fast enough for everyday transactions.
These, and other concerns about proof-of-work, have led cryptocurrency communities to consider an alternative called proof-of-stake.
What is Proof of Stake?
Proof-of-stake is a method of reaching consensus on the validity of new blocks in a blockchain. Adding new blocks must happen via consensus because blockchains are distributed databases with no central authority to certify them. Proof-of-stake validates new blocks via a process known as staking. Staking is similar to voting, but because different validators can stake different amounts, voting power is not equally distributed among validators. In this way, it’s more like the way voting works in corporate shares than in democratic elections.
As with proof-of-work, proof-of-stake consensus is a two-step process. First, validators compete to propose a new block based on the transactions in the queue to be cleared. Then, once a validator wins the right to propose the new block, all the validators must reach consensus that the new block is correct.
In a proof-of-stake system, any validator can propose to add a new block to add to the blockchain, and must choose the amount of coins to stake. There is a minimum stake required, like an ante to play a round of poker. (Ethereum’s new proof-of-stake system will require a minimum stake of 32 ETH, which is about $3,000 as of this writing in March 2022.) The system chooses a validator at random to propose the new block, but the random choice is weighted by the amount of coins staked by each validator. This means those offering a larger stake are more likely to win the validating competition. The winning validator then proposes a new block to add to the chain.
All validators are responsible for attesting the blocks created by other validators. When the minimum required validators attest that the block is accurate, the block is added to the chain. (For ETH, the approval of 128 validators will be necessary to validate a new block.) The winning validator then adds the block and populates it to the nodes on the blockchain, and the system moves on to the next block.
Any validator who proposes a bad transaction or block would be slashed, meaning the validator would forfeit the stake. Also validators will lose their entire stake if they later attempt a 51% attack to the validity of the chain. This means validators have a high incentive to maintain the blockchain and protect its integrity. Validators can also be penalized for failing to fulfill their validating duty for other validators, but these penalties are relatively smaller.
Why is Proof of Stake Important?
Every technology needs a killer app, and for proof-of-work, Bitcoin was the killer app. But there are lots of cryptocurrencies these days, and each has its own rules and validation requirements. The killer app for proof-of-stake is likely to be Ethereum. Ethereum is different from Bitcoin in many ways. Ethereum is capable of executing smart contracts, and significantly, is the main blockchain on which NFTs are minted and sold.
Ethereum is currently mined via proof-of-work, but is in the process of transition to validation via proof-of-stake. It will do this as part of its lengthy 2.0 release process, and initial efforts to make the change have been ongoing for some time now.
Which is Better, Proof-of-Work or Proof-of-Stake?
Keep in mind that the purpose of each consensus paradigm is to create scarcity of the minted currencies, and ensure the integrity of the blockchain. That means the consensus requirements must provide incentives for participants to validate, and disincentives for validators to corrupt the blockchain. (For a tutorial on different consensus mechanisms, here is a good video.) Here is how the two methods compare:
- Security. There seem to be differing opinions as to which method is more secure to avoid 51% attacks or spoofed chains. But most of the opinions I found say proof-of-stake is better, because the risks for validators of violating the rules are higher, so the expected costs of an attack will outweigh the expected benefits. Others call proof-of-work more secure. So, it seems the jury is still out on this point. Proof-of-work is battle-tested via its current use in Bitcoin and Ethereum. Proof-of-stake is at a more experimental stage, though some cryptocurrencies already use it.
- Speed. Proof-of-stake is potentially faster. Blockchains like the one for Bitcoin are designed to create new blocks about every ten minutes. This is actually so slow, in computing terms, that it’s considered a reason why Bitcoin is not suitable for everyday transactions. Ethereum’s current proof-of-work pace is much faster than this, due to its different design, but all proof-of-work systems are limited by the time it takes to mine the block. Moreover, proof-of-stake architectures enable a technique called sharding, which allows the blockchain to be split into shard chains, each of which is capable of processing blocks in parallel. This increases the potential speed of proof-of-stake systems.
- Environmental Impact. Proof-of-stake consumes much less energy. Staking doesn’t require the extensive computing resources of proof-of-work, which is specifically designed to be difficult and expensive to do.
- Barriers to Entry. Proof-of-stake can be viewed as having lower barriers to entry, because the minimum stake is less than the cost of equipment required to successfully perform proof-of-work. However, each system favors validators with more resources, who are more likely to win the rewards for validation.
- Concentration. Proof-of-stake can more easily become concentrated. Validators with the most can afford to risk more, and thereby get more rewards. However, proof-of-work has also become concentrated, as currently more than half the mining rewards for both Bitcoin and Ethereum are won by a few mining pools. So the relative level of concentration remains to be seen.
The EU Proposal
The EU recently considered a proposal to create a broad legislative framework for digital assets. The Markets in Crypto Assets (MiCA) proposal was a detailed plan for regulating crypto in the EU. Among many other topics, it sought to prohibit those in the EU from using proof-of-work currencies. That element of the proposal was highly controversial, given the two most popular cryptocurrencies still use a proof-of-work consensus. As soon as the proposal was made, cryptocurrency advocates started lobbying against it. On March 14, 2022, the EU Parliament voted against the proposal. However, EU is expected to create a revised proposal by January 2025.
Regardless of the ultimate success of a revised EU proposal, and whether it includes a ban on proof-of-work cryptocurrencies, the existence of this plank in the proposal shows that the winds are changing. Environmentalist criticism of the power requirements for proof-of-work have been eroding the reputation of cryptocurrencies, already tarnished for enabling illegal transactions such as ransomware. If Ethereum’s new proof-of-stake is successful, it seems likely to eclipse proof-of-work as the consensus mechanism of choice for new cryptocurrencies.
Blockchain for the Rest of Us
Blockchain and crypto get a lot of press these days, but most people still struggle to understand them. As with many technical topics, the blogosphere is little help to the curious reader. Sometimes it uses analogies so abstract as to be useless, such as extremely complex math problems, or a highly complex computing process, which are glosses implying that the reader–or perhaps the writer as well–can’t understand the facts. Or sometimes it delves into so much detail as to be useless to most readers unless they already understand–a chicken and egg problem. This post is an attempt to strike a happy medium and explain how blockchain works, for people who are curious, but not technical experts. (Like me!)
To demystify blockchain, we need to understand a few concepts:
- What is blockchain?
- What are cryptocurrencies?
- What is mining?
- Why is the blockchain trustworthy?
What is Blockchain?
A blockchain is a kind of ledger. Most people don’t use physical ledgers, like the one pictured above, these days, but your bank statement is a ledger. A ledger is just a sequential list of transactions. Ledgers are designed to be auditable, so they allow additions but not deletions or changes. Once a transaction (like a deposit or withdrawal) is recorded, it can’t be erased or changed — it is in the ledger forever. So, for example, if your bank makes a mistake and pays the wrong amount from your account, the bank will not erase that transaction. It will credit an amount back to your account in a balancing transaction.
A distributed ledger or decentralized ledger is just a series of electronic transaction records, each of which contains details like the amount transferred and the date.
But a blockchain is different from your bank statements because it is a chain–the transaction records are linked together in a sequence that cannot be changed, even if the storage location of the records is separated. Each transaction record is called a block. (Actually, each block usually contains multiple transactions, so this is a simplification.) Unlike a paper ledger, or even your bank statement, each block also contains a cryptographic pointer to find the previous block. That way, the chain can always be reconstructed from its pieces. Anyone who wants to verify the chain can do so by following the links back one at a time. If a link is not in the sequence, it’s not legitimate.
The pointer is created with a method called a hash. Hashes have all kinds of uses computing. In blockchain, a hash is a unique number (or series of characters) that is generated automatically based on the information in the block. It is like a fingerprint: If you have the fingerprint, and you have the block, you can tell easily whether the two of them match, and the fingerprint corresponds to the record.
A decentralized ledger exists on various computers known as nodes. These computer nodes all work independently in a peer-to-peer network. A node often maintains a local copy of the entire blockchain since its beginning.
So in sum:
- A blockchain is an electronic ledger
- Each block in the chain is connected to the last one using a hash
- It is decentralized because it exists independently on many computer nodes
How do Blockchains Work?
Most blockchains are permissionless and public, meaning they are not controlled by a central authority, like a bank or government, and all participants can access a copy of every transaction. That means all participants can verify the chain for themselves, without relying on a central authority. Even though blockchains are not encrypted, their information can still be secured. Bitcoin, for example, uses pseudonyms to identify parties conducting transactions, so their personal information will not be publicly available. But the pseudonymous information is accessible to anyone. You can find some examples here.
The key characteristic of public decentralized ledgers is that they can be trusted by participants without the need to trust a central authority, like a bank. This quality makes them extremely resistant to tampering, because all the copies stored across the network need to be attacked at the same time for an attack to be successful. As an analogy, suppose your identity was secured by a passport only, and anyone who had your passport could claim they were you. That would be a single point of failure in security. But in real life, many people know you, and you have multiple IDs, so even if someone stole your passport, you would be able to prove they were not you. For your identity to be truly stolen, there would need to be a vast conspiracy to remove most of the traces of your identity. Blockchains work like that. For the information to be corrupted, so many people would have to collaborate that it is highly unlikely to happen. In other words, they work by consensus, and with enough participants, consensus is quite reliable.
So in sum:
- Most blockchains are unpermissioned.
- Unpermissioned blockchains rely on consensus and transparency, instead of trust in a single authority, for legitimacy.
What is Cryptocurrency?
Blockchains have lots of uses, but one of the most popular is to legitimize and track private currency like Bitcoin.
Currency is a unit of value that can be exchanged for goods and services or saved for future use. Currency is fungible–one dollar is as good as another. The auditable and sequential qualities of ledgers, and their ability to make balancing transactions, work for currency because currency is fungible.
Currency is also scarce. Scarcity is a basic economic principle. Currencies don’t represent value if they have an infinite supply. If you could print up money on your computer printer, it would immediately lose all its value. No one would accept it for goods or services that have real-world value, because they could print up their own. In a way, currency is like a group hallucination. If we all behave as if currency has value, then it does. Once people lose confidence in the stability of currency, it loses value quickly, as in episodes of hyperinflation.
You have heard of cryptocurrencies like Bitcoin and Ether and Dogecoin. These are private currencies that are transacted on a blockchain. The blockchain is the method of making transactions, and the cryptocurrency is what is being transacted. A blockchain is not a cryptocurrency, any more than a bank ledger is money.
Unlike Dollars and Pounds and Rupees, cryptocurrencies are not usually authorized by a government. Government-authorized currencies are called fiat currencies, because the government uses its legal authority (fiat) to require that its citizens accept that currency to pay off debts. A currency meeting this requirement is called legal tender. The value of a fiat currency is based largely on the reputation of the government that issues it. While almost anything can be used as a currency — gold, stamps, or poker chips– a fiat currency is usually more stable, because the government works to manage its stability.
At least in the US, it is not exactly illegal to create your own currency. In fact, the US has a long history of private money. There are plenty of currencies issued by local governments, banks, or private citizens. These are sometimes called scrip or community currencies. Sometimes, scrip can be exchanged for anything of value, and sometimes, it can only be exchanged for specific things. A frequent flyer mile, for example, is a kind of currency that can be exchanged for airline tickets or other things of value, depending on what the issuing airline allows. Anyone can invent a currency. And anyone can invent a cryptocurrency.
But governments can issue cryptocurrencies, too, just like they do coins or paper money. At least one country (El Salvador) has adopted Bitcoin as its legal tender, and many countries are expected to create fiat cryptocurrencies in the near future. Some governments have banned private cryptocurrencies (like China), but even in the case of China, some expect a fiat currency to eventually replace the banned community currencies.
Also, not all blockchains are created for cryptocurrencies. Blockchains have many applications other than currency transactions. NFTs, for example, are managed on blockchains. Blockchains can be used for secure records of real estate deeds or voting in elections.
So, in sum, Bitcoin is a cryptocurrency, managed on a blockchain and not a fiat currency. But:
- Currency is a generally accepted unit of value.
- All cryptocurrencies are managed on blockchains, but not all blockchains are for cryptocurrencies.
- There are many cryptocurrencies other than Bitcoin.
- Not all cryptocurrencies are community currencies. They can be fiat currencies as well.
What is Mining?
This section discusses verification of new blocks for Bitcoin. Bitcoin uses a proof of work system to verify transactions. (Others, like Ether 2.0, use a proof of stake system.)
You have probably read that mining requires a lot of computing time and energy, but what exactly are those computers doing?
At a high level, Bitcoin miners compete to verify new transactions on the blockchain. The miner to first successfully verify a block wins a reward for doing the work. The reward is currently a fraction of a Bitcoin (and by design, will decrease over time until there is no remaining incentive). Currently, the reward for verifying a transaction is 6.25 bitcoins–which as of January 2022 was worth more than $260,000. Miners also earn transactions fees based on the size and content of the transactions.
But verifying takes a lot of work–though this is where the metaphor starts to wobble, because it is computer work, not work by people with picks and shovels. Because Bitcoin mining requires intensive computing work, that work is not sensible to do via your average desktop computer. You could mine on your home computer if you wanted to try, and used the right software (for example the open source CGMiner), but you would probably not win any Bitcoin, because another miner with faster equipment would likely win the competition instead of you. Also, your electricity cost would probably be greater than your yield. So, professional Bitcoin miners use special hardware, such as Graphics Processing Units (GPUs). A GPU is a kind of computer chip that was designed to process graphics, mainly for video gaming. But because of their fast processing power, they are now popular for number-crunching applications like AI and Bitcoin mining. At this point, due to the expense of mining, many miners work in pools, and split the proceeds.
How Does Mining Work?
Bitcoin miners compete to do the proof of work to verify each new block on the chain.
The bitcoin system resets the level of difficulty — how hard it is to verify a block–after each 2016 blocks, which happens about every two weeks. The system is designed so that a new block is expected to be created about every 10 minutes, but that is just an average target. Sometimes blocks are found more quickly, and sometimes less quickly, depending on how lucky the miners are. The current difficulty setting, and time to the next calculation, can be found here. (Difficulty is expressed in a format where the first two are the exponent and the next six hexadecimal digits are the coefficient.) Every full node re-calculates difficulty automatically and independently.
The puzzle Bitcoin miners are trying to solve, in order to win their reward, is to generate a number called a nonce that produces a hash within the difficulty tolerance set by the blockchain system. Nonce is an abbreviation for “number only used once.” A nonce is a 4-byte number. It is one of the inputs for the hash.
The most important quality of a hash algorithm is that if you use identical input, you will always get identical output. So, you can easily check to see if the input is the same input you expected, but you can’t tell from the hash what the input actually is. The fingerprint analogy holds up here — you can verify the identity of a person with a fingerprint, but fingerprints don’t tell you what the person is like; it’s an identity, not a blueprint.
The Bitcoin hash is done using a variant of the SHA-2 (Secure Hash Algorithm 2), developed by the National Security Agency (NSA). This kind of hash takes in data of any length, and spits out a 256-bit (32 bytes) hash value, which is usually represented as a hexadecimal (base 16) number of 64 digits. The SHA-2 family of algorithms are patented in US patent 6829355. The United States has released the patent under a royalty-free license. If you want to see how the algorithm works, take a look at the patent disclosure. It is a complex mathematical formula, but the algorithm isn’t secret. This is what most articles and blogs mean when they refer to “math puzzles” or “complex algorithms.”
And no matter how long your input is–one character or thousands–the resulting hash using SHA-2 will be 64 characters. Also, the SHA-256 algorithm is designed to produce output that will appear to be a random sequence. So, the only way to get the specific hash for specific input data is to perform the hash.
The implication is that the fastest solution to create a hash within the difficulty level is “brute-force,” or trying solutions at random. The SHA-256 algorithm cannot be reverse engineered. So brute force is what miners do. They make their computers generate and test as many different hashes as fast as possible, until they find a value that fits the difficulty target.
When Bitcoin miners compete to verify a new block, they use the following as input to create their hashes:
- The new blocks on the blockchain for the 10 minute period to be verified
- The hash for the previous block (which has already been verified)
- The nonce, which is generated randomly
The puzzle is solved when the resulting hash value is less than or equal to the current difficulty target value. If a nonce doesn’t work to create a hash that meets that condition, the miner moves on to the next nonce–that’s why the nonce is only used once.
Professional miners submit thousands of hashes to the system per second. The more hashes they can submit in the 10-minute time period, the more likely they will win the reward. That means the miners with faster computers win more often.
The winner of the mining contest then updates the blockchain ledger by adding a newly mined block covering all of the newly verified transactions to the chain. The winning miner claims the block mining reward by adding it as a transaction on the new block. This reward come from new coins, whereas all of the other verified transactions on the block come from existing coins.
The system then moves on to the next block to be verified. This happens about every 10 minutes.
How is the Blockchain Validated?
Blockchain works because of three concepts: mining (scarcity), validation, and trust.
Validation is not the same as mining. Validation of the chain happens at two checkpoints. First, on input: when someone wants to makes a transaction on the blockchain, the transaction is sent to a node. Remember that the blockchain is stored on many independent nodes, and these nodes communicate with each other in a distributed network. So, when a transaction is sent to one node, that node shares the transaction with other nodes connected to it on the network. Those nodes, in turn, populate the transaction to other nodes in the network, until the entire network includes the same transactions.
A node validates the transaction to ensure it is in the proper form and adds the transaction to a transaction pool, which is like a clearing house where transactions await mining of the block that will include them. The pending transactions become part of a candidate block. Miners can choose to construct and mine a candidate block for some or all of the transactions in the pool, but the more they include, the higher their reward will be. (The way transactions are chosen to be included in candidate blocks depends on their age, size, priority and other factors, explained in the above link.)
When the mining proof of work has been completed, the winning miner transmits the new block to other nodes on the network. Each node validates the new block to ensure it is in the correct form, and that the winner produced a hash with the proper difficulty, and then adds it to the chain. The new block is then populated throughout the network.
Why is the Blockchain Trusted?
You may be wondering, at this point, what is the point of all this trouble. Mining via proof of work is not the same as trust–it is an incentive for maintaining the chain. It also creates scarcity for new coins. In other words, proof of work is an arbitrary task that is designed to be difficult to perform. This means new coins, created via mining, will enter the system at a regular rate.
But the proof of work, and the Bitcoin reward for doing it, is also an incentive for nodes to maintain the chain. Every mining node must maintain an entire copy of the chain. And because the reward for mining is in Bitcoin, the miners have an incentive to maintain the integrity of the chain. If the chain fails, their rewards will have no value.
How do all the nodes trust the new block? They trust it because it would be virtually impossible to mine a new block without doing the proof of work. While mining is hard, validating the mining is relatively trivial. The proof of work is how miners know they’ve spent enormous resources and reached consensus on a particular sequence of blocks, and are worthy of the reward they got.
The proof-of-work process has two important consequences. The first is consensus: all nodes in the decentalized network can easily agree on which blocks are valid, via their hash. The second is immutability: due to transparency, it is virtually impossible to fool an honestly run node into accepting any blockchain but the true one.
You may have read that blockchains are vulnerable to 51% attacks, which could happen if one party, or group of collaborating parties, control 51% of hash power of the chain. This kind of attack can lead to something called double spending, and other problems. But even if that occurred, anyone could identify the false transactions, and the price of Bitcoin would probably immediately plummet. For cryptocurrencies like Bitcoin, transparency is a deterrent to malfeasance, holding the purchasing power of Bitcoin as collateral for the integrity of the ledger.
- Transparency is why the blockchain is trusted.
- Transparency and proof-of-work create the incentives to maintain a legitimate chain.
Long-Term Viability of Bitcoin
You have certainly read in the news that Bitcoin mining uses huge amounts of energy, and is therefore ecologically unsustainable. But there are other reasons why Bitcoin faces sustainability challenges.
Because the total number of coins is limited, the incentives to mine will eventually dwindle. As the difficulty gets harder and the rewards lower, only an astronomical and sustained increase in speculative value would provide enough incentive to mine. So it’s likely that, eventually, that incentive will fail. When the mining incentive fails, the incentive to maintain the chain may also fail, and transactions will likely become too slow to be of use.
Also, in practice, Bitcoin has become less decentralized than it appears. At this point, a small number of mining cooperatives do most of the mining. The top 0.1% (about 50 miners) control nearly 50% of mining capacity. Control of the blockchain is in effect becoming centralized, and that centralized control is not transparent. As difficulty increases, this centralization will likely continue. This does not necessarily mean that the chain is likely to be corrupted, but it does tarnish the ideal of Bitcoin as a currency run by a community.
Also, the original lure of anonymous trading is waning. Anonymity has made Bitcoin notorious for illegal activity like ransomware and drug trades. As illustrated by the FBI’s recent seizure of Bitcoins, pseudonymous trading is not a failsafe to achieve true anonymity. As a pseudonym “becomes enmeshed in the public web of transactions, maintaining anonymity takes more operational security than most users can manage.”
Adding all this up, Bitcoin is likely to lose utility and value over time. In other words, the design is not truly scalable or sustainable. Perhaps owning Bitcoin will become, over time, more like owning rare coins than owning currency. But when that happens to Bitcoin, there will be many other cryptocurrencies standing ready to take its place.
- Bitcoin has a lifespan, and unlike traditional fiat currencies, that lifespan is designed to be finite.
Do You Want to Know More?
I hope this article has helped you understand more about crypto and blockchain. If you have suggestions or corrections, please contact me.
If you want to know more, here are some of the best resources I found when researching this topic:
- Coinbase’s Crypto Basics.
- The Federalist Society’s video on Bitcoin mining and on Bitcoin in monetary policy .
- Here is an example of a block, showing the nonce, the transactions it covers, and other details.
- Patrick Boyle’s video on legal tender and private money. I highly recommend Boyle’s videos on finance and economic topics.
- O’Reilly’s Mining and Consensus. This contains a wealth of detail about how validation and mining work.
- David Rosenthal presentation on the future of cryptocurrency. This presentation is very detailed, but extremely insightful on the sustainability of Bitcoin and other non-permissioned blockchains.
- Is blockchain “open source”? Not exactly, although all blockchains use open source software elements. (That link goes to an article on this topic that I co-wrote a few years ago on this topic, when everyone seemed to be asking me this question.)
Twelve Months of Open Source in 2021
With year two of the COVID era drawing to a close, here is a look back on some of the most interesting open source developments of 2021. Let’s hope the new Omicron wave of working from home creates some amazing new projects — and ends soon.
- January – Open source developer and open standards advocate David Recordon is named the White House Director of Technology by the transition team of incoming President Joe Biden.
- February – Mars becomes the second planet on which Linux is the dominant operating system.
- March – The community wars on with no resolution over the role of Richard Stallman in the FSF and GNU projects. Stallman was expected to resign over his comments about the Jeffrey Epstein scandal, but later refused to step down. Several members of the executive team resign in protest. The credibility of FSF is eroding.
- April – Google wins its epic battle with Oracle over copyright and APIs in the US Supreme Court, and software developers everywhere breathe a sigh of relief. Because without free rights of re-implementation, many open source projects could probably not exist.
- May – The US Government issues an executive order recognizing the importance of the software supply chain to national security and prosperity. It implies an endorsement of open source development: “The development of commercial software often lacks transparency, sufficient focus on the ability of the software to resist attack, and adequate controls to prevent tampering by malicious actors.”
- June – A Huawei dev is shamed for useless Linux contributions he submitted to meet a work performance goal. It’s telling that open source contributions have become part of corporate performance assessment. This happens on the heels of the scandal over University of Minnesota students intentionally submitting faulty patches to Linux for a research project.
- July – Weirdness follows the Audacity handover, when two forked projects, created in a froth of overreaction to the transfer of the project to Muse, start warring with each other, and bad behavior ensues.
- August – Sexy Cyborg, a YouTube influencer, metes out her own style of GPL Enforcement. This video shows her shaming a large corporation about GPL violations. Don’t let the scanty clothes fool you; she’s a savvy tech commentator.
- September – Oracle adjusts licensing for OracleJDK, allowing limited free use, and tweaking the licensing differential between OracleJDK and OpenJDK (and the community-supported fork of OpenJDK stewarded by the Eclipse project). The new license for OracleJDK now allows free internal use, including for developing and testing applications, and distribution “provided that You do not charge Your licensees any fees associated with such distribution or use of the Program, including, without limitation, fees for products that include or are bundled with a copy of the Program or for services that involve the use of the distributed Program.” For most companies, this only kicks the can down the road on the decision to use OpenJDK, or pay the piper.
- October – SFC sues Vizio for GPL Violations, in a lawsuit that attempts to rewrite the rules of open source enforcement, by initiating a non-copyright claim in state court without the participation of the software authors.
- November – Trump’s Truth Social, having run afoul of AGPL before it even launched, tries to fix the license violation, but seems unclear on the concept.
- December – LOG4J is involved in a major security breach. Open source software security breaches always get a lot of press, out of proportion to proprietary software security issues–which doesn’t mean to say they aren’t a danger. The real problem, of course, is lack of a sustainable model to keep the open source software updated and secure.
- All Year – Commercial Open Source Software continues to be awesome. Just a few examples of companies that are prospering and moving forward: Redis, Grafana, Starburst. For a rundown of the big deals, check out the ongoing news about financings, acquisitions and IPOs at COSS Community.
Happy New Year, everyone!