Blockchain for the Rest of Us

Blockchain and crypto get a lot of press these days, but most people still struggle to understand them. As with many technical topics, the blogosphere is little help to the curious reader.  Sometimes it uses analogies so abstract as to be useless, such as extremely complex math problems, or a highly complex computing process, which are glosses implying that the reader–or perhaps the writer as well–can’t understand the facts. Or sometimes it delves into so much detail as to be useless to most readers unless they already understand–a chicken and egg problem. This post is an attempt to strike a happy medium and explain how blockchain works, for people who are curious, but not technical experts. (Like me!)

To demystify blockchain, we need to understand a few concepts:

  • What is blockchain?
  • What are cryptocurrencies?
  • What is mining?
  • Why is the blockchain trustworthy?

What is Blockchain?

Accounting - Wikipedia
https://en.wikipedia.org/wiki/Accounting

A blockchain is a kind of ledger. Most people don’t use physical ledgers, like the one pictured above, these days, but your bank statement is a ledger. A ledger is just a sequential list of transactions. Ledgers are designed to be auditable, so they allow additions but not deletions or changes. Once a transaction (like a deposit or withdrawal) is recorded, it can’t be erased or changed — it is in the ledger forever. So, for example, if your bank makes a mistake and pays the wrong amount from your account, the bank will not erase that transaction. It will credit an amount back to your account in a balancing transaction.

A distributed ledger or decentralized ledger is just a series of electronic transaction records, each of which contains details like the amount transferred and the date.

But a blockchain is different from your bank statements because it is a chain–the transaction records are linked together in a sequence that cannot be changed, even if the storage location of the records is separated. Each transaction record is called a block. (Actually, each block usually contains multiple transactions, so this is a simplification.) Unlike a paper ledger, or even your bank statement, each block also contains a cryptographic pointer to find the previous block. That way, the chain can always be reconstructed from its pieces. Anyone who wants to verify the chain can do so by following the links back one at a time. If a link is not in the sequence, it’s not legitimate.

The pointer is created with a method called a hash. Hashes have all kinds of uses computing. In blockchain, a hash is a unique number (or series of characters) that is generated automatically based on the information in the block. It is like a fingerprint: If you have the fingerprint, and you have the block, you can tell easily whether the two of them match, and the fingerprint corresponds to the record.

Peer to peer network nodes

A decentralized ledger exists on various computers known as nodes. These computer nodes all work independently in a peer-to-peer network. A node often maintains a local copy of the entire blockchain since its beginning.

So in sum:

  • A blockchain is an electronic ledger
  • Each block in the chain is connected to the last one using a hash
  • It is decentralized because it exists independently on many computer nodes

How do Blockchains Work?

Most blockchains are permissionless and public, meaning they are not controlled by a central authority, like a bank or government, and all participants can access a copy of every transaction. That means all participants can verify the chain for themselves, without relying on a central authority. Even though blockchains are not encrypted, their information can still be secured. Bitcoin, for example, uses pseudonyms to identify parties conducting transactions, so their personal information will not be publicly available. But the pseudonymous information is accessible to anyone. You can find some examples here.

Here is an example of what you would see at the above link–the latest blocks and transactions for Bitcoin.

The key characteristic of public decentralized ledgers is that they can be trusted by participants without the need to trust a central authority, like a bank. This quality makes them extremely resistant to tampering, because all the copies stored across the network need to be attacked at the same time for an attack to be successful. As an analogy, suppose your identity was secured by a passport only, and anyone who had your passport could claim they were you. That would be a single point of failure in security. But in real life, many people know you, and you have multiple IDs, so even if someone stole your passport, you would be able to prove they were not you. For your identity to be truly stolen, there would need to be a vast conspiracy to remove most of the traces of your identity. Blockchains work like that. For the information to be corrupted, so many people would have to collaborate that it is highly unlikely to happen. In other words, they work by consensus, and with enough participants, consensus is quite reliable.

So in sum:

  • Most blockchains are unpermissioned.
  • Unpermissioned blockchains rely on consensus and transparency, instead of trust in a single authority, for legitimacy.

What is Cryptocurrency?

Blockchains have lots of uses, but one of the most popular is to legitimize and track private currency like Bitcoin.

Currency is a unit of value that can be exchanged for goods and services or saved for future use. Currency is fungible–one dollar is as good as another. The auditable and sequential qualities of ledgers, and their ability to make balancing transactions, work for currency because currency is fungible.

Currency is also scarce. Scarcity is a basic economic principle. Currencies don’t represent value if they have an infinite supply. If you could print up money on your computer printer, it would immediately lose all its value. No one would accept it for goods or services that have real-world value, because they could print up their own. In a way, currency is like a group hallucination. If we all behave as if currency has value, then it does. Once people lose confidence in the stability of currency, it loses value quickly, as in episodes of hyperinflation.

You have heard of cryptocurrencies like Bitcoin and Ether and Dogecoin. These are private currencies that are transacted on a blockchain. The blockchain is the method of making transactions, and the cryptocurrency is what is being transacted. A blockchain is not a cryptocurrency, any more than a bank ledger is money.

Unlike Dollars and Pounds and Rupees, cryptocurrencies are not usually authorized by a government. Government-authorized currencies are called fiat currencies, because the government uses its legal authority (fiat) to require that its citizens accept that currency to pay off debts. A currency meeting this requirement is called legal tender. The value of a fiat currency is based largely on the reputation of the government that issues it. While almost anything can be used as a currency — gold, stamps, or poker chips– a fiat currency is usually more stable, because the government works to manage its stability.

At least in the US, it is not exactly illegal to create your own currency. In fact, the US has a long history of private money. There are plenty of currencies issued by local governments, banks, or private citizens. These are sometimes called scrip or community currencies. Sometimes, scrip can be exchanged for anything of value, and sometimes, it can only be exchanged for specific things. A frequent flyer mile, for example, is a kind of currency that can be exchanged for airline tickets or other things of value, depending on what the issuing airline allows. Anyone can invent a currency. And anyone can invent a cryptocurrency.

But governments can issue cryptocurrencies, too, just like they do coins or paper money. At least one country (El Salvador) has adopted Bitcoin as its legal tender, and many countries are expected to create fiat cryptocurrencies in the near future. Some governments have banned private cryptocurrencies (like China), but even in the case of China, some expect a fiat currency to eventually replace the banned community currencies.

Also, not all blockchains are created for cryptocurrencies. Blockchains have many applications other than currency transactions. NFTs, for example, are managed on blockchains. Blockchains can be used for secure records of real estate deeds or voting in elections.

So, in sum, Bitcoin is a cryptocurrency, managed on a blockchain and not a fiat currency. But:

  • Currency is a generally accepted unit of value.
  • All cryptocurrencies are managed on blockchains, but not all blockchains are for cryptocurrencies.
  • There are many cryptocurrencies other than Bitcoin.
  • Not all cryptocurrencies are community currencies. They can be fiat currencies as well.

What is Mining?

This section discusses verification of new blocks for Bitcoin. Bitcoin uses a proof of work system to verify transactions. (Others, like Ether 2.0, use a proof of stake system.)

You have probably read that mining requires a lot of computing time and energy, but what exactly are those computers doing?

At a high level, Bitcoin miners compete to verify new transactions on the blockchain. The miner to first successfully verify a block wins a reward for doing the work. The reward is currently a fraction of a Bitcoin (and by design, will decrease over time until there is no remaining incentive). Currently, the reward for verifying a transaction is 6.25 bitcoins–which as of January 2022 was worth more than $260,000. Miners also earn transactions fees based on the size and content of the transactions.

But verifying takes a lot of work–though this is where the metaphor starts to wobble, because it is computer work, not work by people with picks and shovels. Because Bitcoin mining requires intensive computing work, that work is not sensible to do via your average desktop computer. You could mine on your home computer if you wanted to try, and used the right software (for example the open source CGMiner), but you would probably not win any Bitcoin, because another miner with faster equipment would likely win the competition instead of you. Also, your electricity cost would probably be greater than your yield. So, professional Bitcoin miners use special hardware, such as Graphics Processing Units (GPUs). A GPU is a kind of computer chip that was designed to process graphics, mainly for video gaming. But because of their fast processing power, they are now popular for number-crunching applications like AI and Bitcoin mining. At this point, due to the expense of mining, many miners work in pools, and split the proceeds.

How Does Mining Work?

Bitcoin miners compete to do the proof of work to verify each new block on the chain.

The bitcoin system resets the level of difficulty — how hard it is to verify a block–after each 2016 blocks, which happens about every two weeks. The system is designed so that a new block is expected to be created about every 10 minutes, but that is just an average target. Sometimes blocks are found more quickly, and sometimes less quickly, depending on how lucky the miners are. The current difficulty setting, and time to the next calculation, can be found here. (Difficulty is expressed in a format where the first two are the exponent and the next six hexadecimal digits are the coefficient.) Every full node re-calculates difficulty automatically and independently.

The puzzle Bitcoin miners are trying to solve, in order to win their reward, is to generate a number called a nonce that produces a hash within the difficulty tolerance set by the blockchain system. Nonce is an abbreviation for “number only used once.” A nonce is a 4-byte number. It is one of the inputs for the hash.

The most important quality of a hash algorithm is that if you use identical input, you will always get identical output. So, you can easily check to see if the input is the same input you expected, but you can’t tell from the hash what the input actually is. The fingerprint analogy holds up here — you can verify the identity of a person with a fingerprint, but fingerprints don’t tell you what the person is like; it’s an identity, not a blueprint.

File:Hash function.jpg
https://commons.wikimedia.org/wiki/File:Hash_function.jpg

The Bitcoin hash is done using a variant of the SHA-2 (Secure Hash Algorithm 2), developed by the National Security Agency (NSA). This kind of hash takes in data of any length, and spits out a 256-bit (32 bytes) hash value, which is usually represented as a hexadecimal (base 16) number of 64 digits. The SHA-2 family of algorithms are patented in US patent 6829355. The United States has released the patent under a royalty-free license. If you want to see how the algorithm works, take a look at the patent disclosure. It is a complex mathematical formula, but the algorithm isn’t secret. This is what most articles and blogs mean when they refer to “math puzzles” or “complex algorithms.”

And no matter how long your input is–one character or thousands–the resulting hash using SHA-2 will be 64 characters. Also, the SHA-256 algorithm is designed to produce output that will appear to be a random sequence. So, the only way to get the specific hash for specific input data is to perform the hash.

The implication is that the fastest solution to create a hash within the difficulty level is “brute-force,” or trying solutions at random. The SHA-256 algorithm cannot be reverse engineered. So brute force is what miners do. They make their computers generate and test as many different hashes as fast as possible, until they find a value that fits the difficulty target.

When Bitcoin miners compete to verify a new block, they use the following as input to create their hashes:

  • The new blocks on the blockchain for the 10 minute period to be verified
  • The hash for the previous block (which has already been verified)
  • The nonce, which is generated randomly

The puzzle is solved when the resulting hash value is less than or equal to the current difficulty target value. If a nonce doesn’t work to create a hash that meets that condition, the miner moves on to the next nonce–that’s why the nonce is only used once.

Professional miners submit thousands of hashes to the system per second. The more hashes they can submit in the 10-minute time period, the more likely they will win the reward. That means the miners with faster computers win more often.

The winner of the mining contest then updates the blockchain ledger by adding a newly mined block covering all of the newly verified transactions to the chain. The winning miner claims the block mining reward by adding it as a transaction on the new block. This reward come from new coins, whereas all of the other verified transactions on the block come from existing coins.

The system then moves on to the next block to be verified. This happens about every 10 minutes.

How is the Blockchain Validated?

Blockchain works because of three concepts: mining (scarcity), validation, and trust.

Validation is not the same as mining. Validation of the chain happens at two checkpoints. First, on input: when someone wants to makes a transaction on the blockchain, the transaction is sent to a node. Remember that the blockchain is stored on many independent nodes, and these nodes communicate with each other in a distributed network. So, when a transaction is sent to one node, that node shares the transaction with other nodes connected to it on the network. Those nodes, in turn, populate the transaction to other nodes in the network, until the entire network includes the same transactions.

A node validates the transaction to ensure it is in the proper form and adds the transaction to a transaction pool, which is like a clearing house where transactions await mining of the block that will include them. The pending transactions become part of a candidate block. Miners can choose to construct and mine a candidate block for some or all of the transactions in the pool, but the more they include, the higher their reward will be. (The way transactions are chosen to be included in candidate blocks depends on their age, size, priority and other factors, explained in the above link.)

When the mining proof of work has been completed, the winning miner transmits the new block to other nodes on the network. Each node validates the new block to ensure it is in the correct form, and that the winner produced a hash with the proper difficulty, and then adds it to the chain. The new block is then populated throughout the network.

Why is the Blockchain Trusted?

You may be wondering, at this point, what is the point of all this trouble. Mining via proof of work is not the same as trust–it is an incentive for maintaining the chain. It also creates scarcity for new coins. In other words, proof of work is an arbitrary task that is designed to be difficult to perform. This means new coins, created via mining, will enter the system at a regular rate.

But the proof of work, and the Bitcoin reward for doing it, is also an incentive for nodes to maintain the chain. Every mining node must maintain an entire copy of the chain. And because the reward for mining is in Bitcoin, the miners have an incentive to maintain the integrity of the chain. If the chain fails, their rewards will have no value.

How do all the nodes trust the new block? They trust it because it would be virtually impossible to mine a new block without doing the proof of work. While mining is hard, validating the mining is relatively trivial. The proof of work is how miners know they’ve spent enormous resources and reached consensus on a particular sequence of blocks, and are worthy of the reward they got.

The proof-of-work process has two important consequences. The first is consensus: all nodes in the decentalized network can easily agree on which blocks are valid, via their hash. The second is immutability: due to transparency, it is virtually impossible to fool an honestly run node into accepting any blockchain but the true one.

You may have read that blockchains are vulnerable to 51% attacks, which could happen if one party, or group of collaborating parties, control 51% of hash power of the chain. This kind of attack can lead to something called double spending, and other problems. But even if that occurred, anyone could identify the false transactions, and the price of Bitcoin would probably immediately plummet. For cryptocurrencies like Bitcoin, transparency is a deterrent to malfeasance, holding the purchasing power of Bitcoin as collateral for the integrity of the ledger.

In sum:

  • Transparency is why the blockchain is trusted.
  • Transparency and proof-of-work create the incentives to maintain a legitimate chain.

Long-Term Viability of Bitcoin

You have certainly read in the news that Bitcoin mining uses huge amounts of energy, and is therefore ecologically unsustainable. But there are other reasons why Bitcoin faces sustainability challenges.

Because the total number of coins is limited, the incentives to mine will eventually dwindle. As the difficulty gets harder and the rewards lower, only an astronomical and sustained increase in speculative value would provide enough incentive to mine. So it’s likely that, eventually, that incentive will fail. When the mining incentive fails, the incentive to maintain the chain may also fail, and transactions will likely become too slow to be of use.

Also, in practice, Bitcoin has become less decentralized than it appears. At this point, a small number of mining cooperatives do most of the mining. The top 0.1% (about 50 miners) control nearly 50% of mining capacity. Control of the blockchain is in effect becoming centralized, and that centralized control is not transparent. As difficulty increases, this centralization will likely continue. This does not necessarily mean that the chain is likely to be corrupted, but it does tarnish the ideal of Bitcoin as a currency run by a community.

Also, the original lure of anonymous trading is waning. Anonymity has made Bitcoin notorious for illegal activity like ransomware and drug trades. As illustrated by the FBI’s recent seizure of Bitcoins, pseudonymous trading is not a failsafe to achieve true anonymity. As a pseudonym “becomes enmeshed in the public web of transactions, maintaining anonymity takes more operational security than most users can manage.”

Adding all this up, Bitcoin is likely to lose utility and value over time. In other words, the design is not truly scalable or sustainable. Perhaps owning Bitcoin will become, over time, more like owning rare coins than owning currency. But when that happens to Bitcoin, there will be many other cryptocurrencies standing ready to take its place.

In sum:

  • Bitcoin has a lifespan, and unlike traditional fiat currencies, that lifespan is designed to be finite.

Do You Want to Know More?

I hope this article has helped you understand more about crypto and blockchain. If you have suggestions or corrections, please contact me.

If you want to know more, here are some of the best resources I found when researching this topic:

  • Coinbase’s Crypto Basics.
  • The Federalist Society’s video on Bitcoin mining and on Bitcoin in monetary policy .
  • Here is an example of a block, showing the nonce, the transactions it covers, and other details.
  • Patrick Boyle’s video on legal tender and private money. I highly recommend Boyle’s videos on finance and economic topics.
  • O’Reilly’s Mining and Consensus. This contains a wealth of detail about how validation and mining work.
  • David Rosenthal presentation on the future of cryptocurrency. This presentation is very detailed, but extremely insightful on the sustainability of Bitcoin and other non-permissioned blockchains.
  • Is blockchain “open source”? Not exactly, although all blockchains use open source software elements. (That link goes to an article on this topic that I co-wrote a few years ago on this topic, when everyone seemed to be asking me this question.)

Twelve Months of Open Source in 2021

With year two of the COVID era drawing to a close, here is a look back on some of the most interesting open source developments of 2021. Let’s hope the new Omicron wave of working from home creates some amazing new projects — and ends soon.

  • January – Open source developer and open standards advocate David Recordon is named the White House Director of Technology by the transition team of incoming President Joe Biden.
  • February – Mars becomes the second planet on which Linux is the dominant operating system.
  • March – The community wars on with no resolution over the role of Richard Stallman in the FSF and GNU projects. Stallman was expected to resign over his comments about the Jeffrey Epstein scandal, but later refused to step down. Several members of the executive team resign in protest. The credibility of FSF is eroding.
  • April – Google wins its epic battle with Oracle over copyright and APIs in the US Supreme Court, and software developers everywhere breathe a sigh of relief. Because without free rights of re-implementation, many open source projects could probably not exist.
  • May – The US Government issues an executive order recognizing the importance of the software supply chain to national security and prosperity. It implies an endorsement of open source development: “The development of commercial software often lacks transparency, sufficient focus on the ability of the software to resist attack, and adequate controls to prevent tampering by malicious actors.”
  • JuneA Huawei dev is shamed for useless Linux contributions he submitted to meet a work performance goal. It’s telling that open source contributions have become part of corporate performance assessment. This happens on the heels of the scandal over University of Minnesota students intentionally submitting faulty patches to Linux for a research project.
  • July – Weirdness follows the Audacity handover, when two forked projects, created in a froth of overreaction to the transfer of the project to Muse, start warring with each other, and bad behavior ensues.
  • August – Sexy Cyborg, a YouTube influencer, metes out her own style of GPL Enforcement. This video shows her shaming a large corporation about GPL violations. Don’t let the scanty clothes fool you; she’s a savvy tech commentator.
  • September – Oracle adjusts licensing for OracleJDK, allowing limited free use, and tweaking the licensing differential between OracleJDK and OpenJDK (and the community-supported fork of OpenJDK stewarded by the Eclipse project). The new license for OracleJDK now allows free internal use, including for developing and testing applications, and distribution “provided that You do not charge Your licensees any fees associated with such distribution or use of the Program, including, without limitation, fees for products that include or are bundled with a copy of the Program or for services that involve the use of the distributed Program.” For most companies, this only kicks the can down the road on the decision to use OpenJDK, or pay the piper.
  • October – SFC sues Vizio for GPL Violations, in a lawsuit that attempts to rewrite the rules of open source enforcement, by initiating a non-copyright claim in state court without the participation of the software authors.
  • November – Trump’s Truth Social, having run afoul of AGPL before it even launched, tries to fix the license violation, but seems unclear on the concept.
  • DecemberLOG4J is involved in a major security breach. Open source software security breaches always get a lot of press, out of proportion to proprietary software security issues–which doesn’t mean to say they aren’t a danger. The real problem, of course, is lack of a sustainable model to keep the open source software updated and secure.
  • All Year – Commercial Open Source Software continues to be awesome. Just a few examples of companies that are prospering and moving forward: Redis, Grafana, Starburst. For a rundown of the big deals, check out the ongoing news about financings, acquisitions and IPOs at COSS Community.

Happy New Year, everyone!

Open Source Compliance for SaaS Vendors

After a few inquiries about this topic, I thought I would share some suggestions for best practices for SaaS vendors. Open source compliance for SaaS vendors is not difficult, but there are a few important nuances to keep in mind.

As a baseline, SaaS is not a high-risk business model for open source compliance. That is because most open source licenses do not have any conditions for SaaS server-side use. Generally, running software that powers a SaaS offering is not distribution, and most compliance conditions in open source licenses are triggered by distribution. (For more on that topic, see here.)

But that doesn’t mean SaaS vendors can simply abdicate responsibility for open source compliance processes. Here are a few reasons why SaaS vendors need to pay attention to open source compliance.

Client Side Software

While most software in a SaaS platform runs on the vendor’s servers, some software always runs on the user’s computer. This is sometimes called “client-side” software, because it runs on the client computer (the user’s computer) instead of the server computer. This software is pushed out to the client computer by the server, but it actually executes locally.

The best way to understand this is to look at the software. If you are using the Chrome Browser, for example, you can press Control-U or right click and chose “view page source.” There, you will see a lot of software code. Right now, as I am writing this in WordPress, the code for this authoring page is about 3,000 lines. But most of the software enabling me to write my blog is running on the WordPress.com server.

Client side code is particularly helpful for tasks like validating an entry in a form (such as a date or address) or executing simple logic on a web page (such as asking for more data if the user gives certain answers to questions). These small tasks need not ping the server and slow down processing.

One interesting thing about this client code is that it is almost always “high level” code or “scripting language” code delivered in source code format. Mostly, it consists of HTML (the scripting language for web pages), Javascript (a procedural language particularly useful for web pages) and CSS (a formatting language for web pages).

The great thing about scripting code is that it is already in source code form, so even if it is provided under a copyleft license like LGPL,(1) there is usually no need to offer additional source code.

The hidden issue, however, is that programmers usually strip licensing notices out of HTML/CSS/Javascript before sending it to the client device. That is done so the page will load faster.

So, the challenge is how to deliver notices for that code. Keep in mind that whenever you distribute software under a copyleft license like LGPL, you must not only make source code available, but you must also deliver a copy of the license. So the question is: where do you put those notices? That can be a head scratcher. One approach is to include, on the dashboard for your SaaS system, a page with the notices and links to the licenses.

But is that compliant? Clearly, most open source licenses were not written for client code. So it can be unclear where to place the notices in order to comply with the license terms. In fact, most open source notice provisions were written long before the advent of the web, and they assume that notices will be delivered in an installation folder — which only works for software in standard computing environments like Linux or Windows.

For example, the license notice requirement for LGPL 2.1 says you must “distribute a copy of this License along with the Library” and MIT requires that “this permission notice shall be included in all copies or substantial portions of the Software.” So, it seems at least plausible that notices on a separate web page are sufficient, but every license has its own language to describe where notices must be delivered.

One more nuance here: Programmers also often modify client side code in production environments to strip out “white space.” So, for example:

<script id=’wp-media-utils-js-translations’>
( function( domain, translations ) {
var localeData = translations.locale_data[ domain ] || translations.locale_data.messages;
localeData[“”].domain = domain;
wp.i18n.setLocaleData( localeData, domain );
} )( “default”, { “locale_data”: { “messages”: { “”: {} } } } );
</script>

Would become the far less readable:

<scriptid=’wp-media-utils-js-translations’>(function(domain,translations){varlocaleData=translations.locale_data[domain]||translations.locale_data.messages;localeData[“”].domain=domain;wp.i18n.setLocaleData(localeData,domain);})(“default”,{“locale_data”:{“messages”:{“”:{}}}});</script>

But when you right click and view the code, or when you view it in nearly any development tool, the tool automatically inserts white space for readability. So this is generally not considered a compliance problem. The user has perfectly feasible access to the software in the “the preferred form of the work for making modifications,” which is what LGPL requires. (See definition of Source Code here.)

Network Copyleft and Similar Licenses

Another potential issue for open source compliance in SaaS is network copyleft licenses. A handful of licenses still have compliance conditions in the absence of distribution. This issue applies to server-side code. The most common of these licenses is AGPL, which says:

“13. Remote Network Interaction…if you modify the Program, your modified version must prominently offer all users interacting with it remotely through a computer network … an opportunity to receive the Corresponding Source of your version by providing access to the Corresponding Source from a network server at no charge, through some standard or customary means of facilitating copying of software.”

AGPL Section 13, sometimes called the “Network Copyleft” provision

This means that if you use the software in such a way that a user can interact with it over a network, and you modify the software, you must make your modified source code available. In a way, this treats SaaS similar to distribution under GPL. And while there are still no source code sharing conditions if you don’t modify the software, it can be hard to keep track and trigger a new compliance review if the software is modified in the future. So most companies are extremely hesitant about using AGPL software in a SaaS platform.

AGPL is not the only network open source license out there, only the most common. Some others, for example are:

  • Server Side Public License
  • Open Software License
  • Non-Profit Open Source License
  • Artistic 2.0
  • Apple Public Source License
  • RealNetworks Public Source License
  • Reciprocal Public License
  • Honest Public License
  • Academic Free License [Note: this license is permissive. The others are copyleft.]

Most of these licenses are rarely used, and when they are, some — notably Artistic 2.0 — are dual licensed under GPL, meaning their network requirements are easy to avoid.

Most companies that have open source compliance policies “red light” these licenses and will not use them in SaaS development. So that means, at a minimum, you must still keep track of license terms and apply your compliance polices to SaaS software.

Distribution is Inevitable

Finally, an important reason to keep an eye on open source compliance, for server side code in a SaaS platform, is that SaaS code almost always gets distributed–someday. While the thrust of software sales today is toward cloud deployment, there are many reasons the deployment of software regresses to on-premises distribution. For example:

  • Your company sells off the division that sells the SaaS.
  • Your customer requires a local copy on its own server:
    • Due to regulatory requirements (e.g. in highly regulated businesses like finance or health)
    • Because of security concerns
    • To avoid privacy issues arising from cross-border data movement
    • Because AWS does not maintain a local hub
  • Your company productizes an internal SaaS tool.

…and so forth.

Because any of these things could happen, you should ensure that you can distribute the software in a compliant manner, when you need to. So, you should avoid integrating GPL (or AGPL) and proprietary code in a way that could not be distributed in compliance with the open source license. This, in turn, means avoiding the use of GPL libraries (like MySQL)–or at least having a replacement at hand.

No Customer Risk

One question that sometimes comes up in commercial negotiations for SaaS vendors is whether a customer of a SaaS vendor can be “infected” by open source code. In short, the answer is no.

A pure end user of code cannot incur liability for violating an open source license. Open source licenses can be violated only by re-distributing software — or in the case of network licenses like AGPL, by modifying it and making it available to others as a service. In fact, GPL2 specifically says “Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted.”

In contrast, if the vendor is distributing software to a customer for the customer to provide its own SaaS to others, customary open source diligence is appropriate. But this isn’t a SaaS deal, of course. It’s a distribution deal.

So, end user customers of SaaS need not conduct diligence into open source licensing issues. If, for example, the SaaS vendor is using modified AGPL code to provide its service to the customer, then the customer may have a right to receive source code from the vendor. But that is a benefit, not a problem. In other words, the purpose of due diligence is to reduce the user’s risk, so even AGPL is not the user’s issue.

An update: take a look at Kyle Mitchell’s follow up on this post. Client side Javascript code doesn’t have a binary form, exactly, but when deployed in actual practice, it is not exactly source code, either. It can take intermediate forms that would not be considered source code, for purposes of licenses like GPL. Point your engineers to that post!

SFC Files GPL Enforcement Suit Against Vizio Advancing Novel Legal Theories

Software Freedom Conservancy filed a lawsuit in late October 2021 against Vizio, claiming violation of the GPL and LGPL with respect to its SmartCast TVs. The complaint is here. The complaint is styled first as a claim of breach of contract, and then a claim for declaratory relief.

Lawsuits to enforce GPL are still quite rare, and among them, this one is radically different in its legal structure from those that have come before. In fact, it conflicts with much of the conventional wisdom about enforcement of licenses like GPL, even principles previously enunciated by the Software Freedom Law Center and the Free Software Foundation–who have had their disagreements with Software Freedom Conservancy in the past.

Some of the novel legal arguments include:

  • Breach of Contract and Specific Performance. In the past, almost all community enforcement of GPL has been pled as a claim for copyright infringement. This complaint asks for an order for Vizio to release the source code for the product, which is the main license condition of GPL and LGPL. However, compelling a defendant to comply with a license condition is not a remedy under copyright law. In contract law, any claim for relief other than money damages is a request for specific performance–which is a very rare remedy in contract law.
  • Claim Brought in State Court. As a corollary, almost all claims to enforce GPL in the past have been brought in federal court, which has exclusive jurisdiction over copyright claims in the US. The complaint was filed in Orange County, California. State court lawsuits yield less predictable and consistent outcomes than federal courts–and perhaps more likely to take an unexpected view of novel legal theories.
  • No Author as Plaintiff. The complaint is brought on behalf of SFC as a plaintiff, based on its purchase of Vizio TVs. This is the opposite of past enforcement actions, brought by the copyright owners (and authors) of GPL software. The complaint cites a huge list of GPL code, including the entire Linux kernel, Busybox, coreutils, BASH, and others. None of the authors of these software packages is named as a plaintiff in the suit.
  • Focus on Consumer Electronics. This is no a surprise, but it is worth noting. Consumer electronics have always been the highest risk sector for GPL enforcement. The SFC press release regarding the case emphasizes consumer rights, touching upon allegations of planned obsolescence and the environmental impact of having to replace devices instead of updating them. SFC had previously announced its intention to use its new funding to focus on embedded software and IoT.
  • Declaratory Relief. The request for declaratory relief asks the court, essentially, to declare that the GPL and LGPL are enforceable and have been violated by Vizio. Even if GPL were deemed a contract, it would be a contract between the licensor (i.e. code copyright owner) and the licensee, so SFC appears to be suing based on a theory that it–and everyone–is a third party beneficiary of the contract. This theory is extremely novel. Given the code authors are not parties, this is an end run around the standing requirements for copyright claims.

In sum, the complaint is an effort to re-write the rules of GPL enforcement. While many commentators are hailing it as a boon for free software, it could backfire. Most companies who have adopted GPL software for their products over the last 25 years have done so based on the comfort that enforcement is mostly done informally, and by authors–and that injunctive relief forcing the release of source code has never been ordered as a remedy. This comfort took many years to develop. During the 1990s and 2000s, many companies adopted GPL software with great hesitation, due to fears about the possibility of such remedies. At that time, these fears were mostly fueled by FUD promulgated by anti-GPL companies like Microsoft. But if this new means of enforcement is successful, the fear may re-ignite, and adopters may react by moving away from open source software.

Also, if any member of the public can enforce the GPL, there is a potential for multiple and conflicting lawsuits for each alleged violation–including trolls who do not have the interests of the community at heart, and–as in this case–organizations who do not necessarily have the support of the authors of the software.

At this early stage the result is hard to predict, but it seems unlikely that SFC will be able to succeed on such a great number of novel legal theories. Such a case could be complex, long and expensive. Also, most GPL enforcement claims do not result in substantial litigation, and are settled quickly– often just after the initial complaint. But if not, this lawsuit has the potential of making some very unprecedented law, and substantially disrupting expectations about GPL enforcement.

An update: Vizio moved on November 29, 2021 to remove the case to federal court, alleging preemption. SFC issued an outraged blog post. Neither is remotely surprising.

And I cannot help observing something about this statement from the SFC:

We believe in complete transparency of the copyleft compliance process, and so encourage everyone to read the filings. We’ve even paid the Pacer fees and used the Recap browser plugin, so that all the documents in the case are freely available via the Recap project archives.

According to my calculations, the PACER fees for that document for SFC are somewhere between zero and 80 cents. Here is more about PACER fees and how RECAP (which is very cool) works.

Trump’s Truth Social Platform Accused of Violating AGPL

Recently, accusations appeared in the press that the “Truth Social” platform are violating the terms of the Affero GPL (AGPL), which applies to the Mastodon software used to run the platform. Truth Social is run by the The Trump Media and Technology Group, which recently announced a SPAC.

AGPL is a network copyleft license that requires sharing of source code, where the licensed software allows users to interact with it via a network, and the code has been modified from its upstream source.

On October 21, 2021, Mastadon’s head developer, Eugen Rochko, stated that the software used to run Truth Social is “absolutely is based on Mastodon.” The Verge later reported that “Mastodon has sent former President Donald Trump’s company a formal notification” of breach. Tech Crunch also reported that Mastodon had issued a “30-day ultimatum.” These reports were apparently based on a Mastodon Blog post that said:

On Oct 26, we sent a formal letter to Truth Social’s chief legal officer, requesting the source code to be made publicly available in compliance with the license. According to AGPLv3, after being notified by the copyright holder, Truth Social has 30 days to comply or the license may be permanently revoked.

https://blog.joinmastodon.org/2021/10/trumps-new-social-media-platform-found-using-mastodon-code/

The terms of use for Truth Social state that its source code is proprietary “unless otherwise indicated,” and apparently the site does not provide proper AGPL license notices.

No formal legal action has yet been filed, but that is not surprising. Unlike GPL2, AGPL3 has a cure provision, so after receiving a notice of violation, a licensee has 30 days to comply, or risk losing its license permanently, after which a formal enforcement action could be filed.

The controversial nature of the Truth Social platform has generated a lot of press already. This points up the risk of violating open source licenses in a manner that conflicts with the political beliefs of the software’s author, who may be more likely to formally enforce the license against those with which the author does not agree.

Mastodon is a German non-profit. Germany has long been known as a plaintiff-favorable jurisdiction for software copyright claims, and a jurisdiction of choice for enforcing open source licenses.

An update: See PC Mag’s follow up from December 2, 2021. The Trump platform published the Mastodon code, but it’s not clear that is the code they are actually using. (In fact, there is no obligation to provide unmodified server-side code under AGPL.) The article says:

Trump’s “Truth Social” site now features a dedicated section labeled “open source,” which contains a Zip archive to Mastodon’s source code. “Our goal is to support the open source community no matter what your political beliefs are. That’s why the first place we go to find amazing software is the community and not ‘Big Tech,’” the site adds. …However, it appears the uploaded Zip archive is simply a barebones version of the existing Mastodon source code you can already find on GitHub.

So, it looks like the platform got some of the open source rhetoric right, but not the actual compliance.

Open Source: The Last Patent Defense?

This article appeared years ago in the OuterCurve blog, but the link to it is broken, now, so on request I have reproduced it — or at least an early draft of it — here. This was a companion piece to a conference at Santa Clara law school, but that link is broken, too! I am happy anyone cares to read one of my articles after so many years. I hope someone can use this information to go squish some patent trolls.

When Richard Stallman wrote in GPL2 “any free program is threatened constantly by software patents” he crystallized the ideological battle between open source software and the software patent business. In 1991 when GPL2 was released, that battle was in its nascent stages.  Today each of open source licensing and software patenting has come to its fullest flower, though their growth generally proceeds on orthogonal axes; most open source software is never accused of patent infringement and most software patent infringement suits don’t accuse open source software. In fact, they so seldom interact directly that the lawyers who practice in these areas do not overlap much. This means those defending patent infringement suits may not be thinking about the tactics open source patent licensing offers to patent defendants.

In patent litigation defense, every little bit helps. Today, patent defendants should be paying attention to open source licensing and its possible effect on patent infringement claims. When you are sued for patent infringement, by anyone other than a pure non-practicing entity (aka patent troll), one of your first lines of internal investigation should be the open source position of the plaintiff, and, if you are considering retaliatory patent claims, your own open source position as well.

When Open Source and Patents Mix

Patent lawyers may be surprised to know that while today, most companies today use open source software, most of them struggle greatly with implementing the internal controls to coordinate their use of open source software with their patent portfolio management. This means it is quite possible that a company is seeking patent protection, or seeking to enforce patents, that read on open source software the company is using or developing — a combination of activities that would often not be considered economically rational.

There have been at least two cases where defendants have successfully used open source license enforcement as a defensive tactic in a patent lawsuit. The first case is the one most often cited to support the enforceability of open source licenses; most people forget that the case started as a patent claim. In Jacobsen v. Katzer, both parties developed and distributed software for controlling model railroads — Jacobsen making his JMRI software available under an open source license free of charge, and Katzer (via his company Kamind Associates) selling commercial products under proprietary licenses. Jacobsen received a letter inviting him to license patents owned by Kamind, suggesting the patents were infringed by the JMRI software. Jacobsen filed a declaratory judgment action asking the court to rule that the patent was invalid due to prior art (or failure to disclose prior art including that of Jacobsen himself) or not infringed. As the patent case progressed, however, Jacobsen discovered that Katzer had copied some of Jacobsen’s open source software and used it in Katzer’s proprietary product, without the proper attributions and license notices. Jacobsen v. Katzer was finally settled in 2010, but only after becoming the seminal US case on open source licensing — not patent infringement — and resulting in a settlement payment by Katzer for violation of the open source license.

In Twin Peaks Software Inc. v. Red Hat, Inc., Twin Peaks Software (TPS), which made proprietary network backup software, sued Red Hat and Red Hat’s recently-acquired subsidiary Gluster. TPS claimed that the GlusterFS software — a network file system that aggregated multiple storage shares into a single volume — violated TPS’s patent covering TPS’s “Mirror File System” (MFS). Red Hat initially responded to the patent infringement suit denying the infringement and asserting that the patent was invalid, but later brought a counterclaim alleging the TPS products incorporated open source software from Red Hat’s product, without complying with GPL. Red Hat sought an injunction against the TPS products . The case ended soon in a settlement, suggesting that TPS thought better of pursuing its patent claims in light of the facts. 

In both these cases, the patent plaintiff was using open source software of the defendant, and the patent defendant discovered a violation of the applicable open source license that it used to turn the tables on the plaintiff. In this way, open source license enforcement can be a substitute for a more traditional retaliatory patent claim. In each case, the plaintiff and defendant were in similar product markets — a very common context for patent litigation — which made the use of the defendant’s open source code by the plaintiff likely. The moral of this story, for a patent plaintiff, is that one should have a robust open source compliance program in place before asserting a patent in a related space. 

Defensive Termination

There are other more subtle tactics as well. Open source licenses — particularly those written in the last 20 years — contain two kinds of provisions that bear upon patent litigation strategy. The first, and more straightforward, is the patent license. See for instance the license in Apache 2.0, which says:

 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. 

This license only applies to contributors, so a mere re-user or re-distributor of the software does not grant any rights. However, if a company has contributed to the software, under an open source license or under a similar contribution license, that company may have granted a license that can be used a defense to an infringement claim. 

For example, suppose company P (patent plaintiff) sues company D (defendant) for patent infringement. However, Company P has contributed software embodying the claims of the asserted patent to a project covered by this license The Apache 2.0 license is a permissive license, so it may be easy for D to claim it is using software under this license. Raising this as a license defense can avoid liability — or at least, create an unexpected defense that will add significant cost to prosecuting the suit.

Now consider the defensive termination provision of Apache 2.0:

If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.

This means that by filing the lawsuit, P may have given up any patent licenses it has received from any contributors to the software — which may include D or third parties who may be aligned with D. This provision applies to all licensees, not just contributors. Even if D is not a contributor with patent claims to bring, bringing the claim exposes P to potential liability. Pointing this out may shift the balance in favor of the defense. 

It’s important to understand that patent defensive termination provisions in different open source licenses have different terms. Some, like Apache 2.0, are triggered by defensive claims, but some are not. Some, like MPL, (or the corresponding “liberty or death” provision in GPL3) also trigger a termination of the copyright license, making them an even more powerful defense tool. 

Open Source Due Diligence — In Patent Litigation

So, the next time you are sued for patent infringement, you have not done your homework until you know:

  • Is the plaintiff using any open source software of yours (related to the patent or not) in violation of the license?
  • Does the asserted claim read on any open source software you are using? If so, would the complaint trigger a defensive termination provision that might apply to the plaintiff?
  • Did the plaintiff contribute to any open source project any code under terms that would include a patent license? If so, do you have a defense under that license?  

Investigating the last question can be an informational challenge, but it may not be as difficult as you think. Records regarding contributions may be available publicly, or open source projects may be willing to cooperate if it helps them defeat patent claims accusing their code. 

The drafters of open source licenses intended to use the terms of those licenses to win a war against software patents, and whether they can do that remains to be seen, but in the meantime, don’t pass up the opportunity to use the principles of open source licensing to win your battles as well.

FSF Drops Assignment Requirement for GCC

The Free Software Foundation announced on June 1, 2021 that it would no longer require and assignment of rights from contributors to GCC (GNU C Compiler) project. Instead, it will require a DCO (Developer Certificate of Origin), following the practice lead by the Linux Foundation for the kernel.

This move brings the GCC project into line with community practice, and it’s a welcome development. Over the years, various contributors had refused to agree to the FSF’s contribution assignment agreement, a document that is unusual in both substance and form. As to substance, while assignments for contributions were more common a couple of decades ago, today they are quite rare; most open source projects today either use license in=out (with or without a DCO), or a CLA with a non-exclusive license grant. As to form, the FSF’s assignment contains some truly unique language about patents* that patent licensing lawyers find perplexing, causing companies to balk at making contributions to FSF projects simply because they can’t parse the terms.

Given the widespread rejection by open source communities of CLAs, the FSF’s outlier stance on its contribution terms over the years has been surprising. Its premise that “Our ability to enforce the license on packages like GCC or GNU Emacs begins with a copyright assignment” was never exactly correct. It’s the kind of statement that looks good on paper, but doesn’t make so much sense in practice. It is true that only a copyright owner or exclusive licensee can enforce a copyright. See HyperQuest, Inc. v. N’Site Sols., Inc., 632 F.3d 377, 382 (7th Cir. 2011). But that’s because a court does not want to be asked by a plaintiff to enforce a copyright, when other parties, who are not before the court, have the right to grant licenses to the defendant and inoculate the defendant from the claim.

The FSF’s assignment document, like most (of the few) that are still used in open source contributions, grants a broad license back to the assignor. (“Thus, we grant back to contributors a license to use their work as they see fit. This means they are free to modify, share, and sublicense their own work under terms of their choice.”) So, the assignment does not solve the court’s problem. In effect, an assignment with a broad license back is less like an assignment, and more like a non-exclusive license (or like joint ownership, a structure universally detested by practicing IP lawyers, because it will often cause a court to refuse to hear the infringement claim).

In practice, owning most of the code is enough to bring a claim. Most open source enforcement takes place notwithstanding that one entity does not own every line of code in the code base. Assignments in CLAs, therefore, are not a best practice, because they are both discouraging to contributors, and not necessary to engage in enforcement.

This move should pave the way for more contributors to feel comfortable contributing to GCC.

*Note: The exact text of the FSF assignment document is not readily available online, though I have seen it before in my practice. If I find it, I will update this post and quote the odd patent language.

Muse Takes the Baton on the Audacity Project

Congratulations to the Audacity development team and Muse Group. In two significant developments, Audacity version 3 was released in March 2021 – its first major update in many years – and Muse Group announced that it has acquired the Audacity project and will take it forward as a free and open source project.

Audacity is a free and open source digital audio editing and recording application. Started by Dominic Mazzoni and Roger Dannenberg, it has clocked over 200 million downloads during its lifetime, and has been translated into dozens of languages. Eric Raymond once wrote of Audacity: “The central virtue of this program is that it has a superbly transparent and natural user interface, one that erects as few barriers between the user and the sound file as possible.” High praise, indeed.

Muse Group already runs the popular open source MuseScore notation software project and distributes the Tonebridge guitar effects app, as well as offering the Ultimate Guitar Tabs service – tools known to working musicians everywhere.

Here is more about the announcement from the Audacity site.

On a personal note, I had the pleasure to assist the Audacity team in this matter. Audacity has long been one of my open source favorite projects, one with an impressive technical quality and community. I am glad to see it get the resources and support it needs to continue to thrive and grow.