Snippets and Stack Overflow

I recently came across an online discussion that mentioned this very interesting article, Usage and Attribution of Stack Overflow Code Snippets in GitHub Projects, by Sebastian Baltes, Stephan Diehl, is a study of certain licensing issues in Stack Overflow, a discussion site for software developers.  Stack Overflow applies the CC BY-SA 3.0 license, a copyleft license for content, to contributions, and there is an ongoing debate as to the suitability of those license terms.

The study analyzes the attribution of “non-trivial” Java code snippets to estimate rate of usage that did not comply with CC-BY-SA notice requirements. The study found that “at most 1.8% of all analyzed repositories containing code from SO used the code in a way compatible with CC BY-SA 3.0. Moreover, we estimate that at most a quarter of the copied code snippets from SO are attributed as required.”

It is a fascinating topic, and it is refreshing to find a practical and empirical analysis of a licensing issue.  (This article by Chaiyong Ragkhitwetsagul, Jens Krinke, and Rocco Oliveto also reports the results of surveys of Stack Overflow answerers and visitors to assess awareness to outdated code and software licenses.)

Those who do M&A deals and other open source compliance efforts know that the average code audit usually turns up a handful of these items.  While many so-called snippets are short and may not enjoy copyright protection, that legal conclusion can be challenging to make, and unsatisfying to the risk-averse.  To avoid uncertainty, buyers often want such snippets removed, resulting in additional engineering costs that are expended to manage small but non-zero legal risks.  It is in economic terms a tax on development activity.

A project to convert Stack Overflow code contributions to a permission license, MIT, died on the vine in 2016.  That is unfortunate.  Discussion boards would better serve their community by requiring contributors to apply permissive licenses — or even public domain dedications or licenses with no attribution requirements — to small code examples.  At least that should be the default choice.  It seems doubtful that most contributors care enough about any copyright they may have in code snippets to apply — or enforce — significant conditions on what they contribute.  Given the choice, they would probably be happy with permissive terms.  Moreover, many of the contributions are taken from other sources and contributed without attribution of upstream license terms, which may or may not be compatible with CC-SA.

OSS Capital

I am thrilled to announce the launch of OSS Capital, a new venture capital fund focusing on commercial open source companies.  I will be acting as a Portfolio Partner.  OSS Capital invests in OSS startup companies.  Open source software is the future, and I am honored to be a part of OSSC, helping companies with open source business and licensing strategy.

And for those of you who are wondering…I will be continuing my law practice as well.


Ninth Circuit Affirms Thin Protection for Databases under Copyright

In Experian Information Solutions, Inc. v. Nationwide Marketing Services, Inc., No. 16-16987 (9th Cir. 2018), the Ninth Circuit affirmed the limited protection available for databases under copyright.

Plaintiff Experian Information Systems, Inc., created its ConsumerView Database, containing names and addresses of more than 250 million consumers.  This information was valuable, because marketers will pay significant fees for accurate pairings of names and addresses.  Experian expended significant efforts to collect the data from many sources, such as real estate deeds and warranty cards.  It also used both human and automated methods to maximize the reliability of the data.

Experian discovered the basis for its lawsuit when a broker tried to sell Experian its own data — at a low price — on behalf of defendant Natimark.  Experian sued for copyright infringement, and when that claim was dismissed, trade secret misappropriation.

Examining the issue of whether such data is protectable under copyright, particularly, given Experian’s significant effort to ensure the data was accurate, the Ninth Circuit held that the database was copyrightable as a compilation (disagreeing with the district court).  But it affirmed summary judgment dismissing the copyright claim because Experian did not show “bodily appropriation” of the work, in part because Natimark’s database was “materially smaller” than Experian’s.  However, it held that with proper efforts to keep the information confidential, Experian’s lists could be protected as trade secrets.  The court remanded on the trade secret issue only.

This case underscores one of the thorny doctrinal difficulties for “open data licensing.”  Data that is publicly available (and therefore not protected by any trade secret interest) has very limited copyright protection.  Absent a contract binding a recipient to limited use, it is hard to enforce a condition in a copyright license to data, because most uses other than wholesale copying would be non-infringing.  Recipients  can “engineer around” a thin copyright by supersetting, subsetting, or changing the data. Accordingly, licenses that attempt to apply a “copyleft” condition to “derivative works” of data have an even bigger challenge than corresponding software licenses.  Such conditions are premised on the power of copyright.  It is extremely difficult to tell whether one data set is “derivative” of another, and very difficult to preserve any copyright interest in the face of downstream modifications.


Open Source Project Closes as a Political Protest on Immigration, then Re-Opens

The Lerna project, a tool for managing JavaScript packages, previously licensed under MIT, recently (and briefly) added a statement purporting to revoke its license to “collaborators” with the US Immigration and Customs Enforcement (ICE).

At least one other contributor  protested by asking the project to remove all his contributions.  The license change was quickly retracted.

One of Lerna’s lead developers said in a comment, “All technology is political, open source is especially political. It would not exist if not for political reasons. Open sourcing something is in itself a political act.”

I’m not commenting here on whether it is right or wrong to apply license restrictions based on ethical or political views.  It’s not generally unlawful to set whatever license restrictions you want, with a few limitations as to enforceability.  It’s not the first time that has happened — the do-no-evil license of JSON (“The Software shall be used for Good, not Evil.”) is ubiquitous and widely tolerated.  And strange license conditions, like the Chicken Dance License, have cropped up periodically.

But color me impressed: if you wanted a lesson on how not to draft or release a license restriction, this would be it.  I think this gets the prize for most drafting mistakes in the least number of words.

  • The phrase “shall not be granted” ought to make any licensing lawyer wince.  Does this mean that the license was never granted?  That is won’t be granted by the licensor(s) in the future?  That a licensee must not grant it?  (Just joking — there’s no sublicensing in open source.)  The passive voice is a dangerous drafting technique.  Perhaps it’s use here was a stylistic homage to the passive-voice MIT license itself:  “Permission is hereby granted…” But seriously, the problem with passive voice is that it doesn’t identify the subject of the grant or covenant.  At least the MIT license says “hereby granted,” indicating that the grant is made immediately.  Unfortunately, there is case law saying that a covenant to grant (“shall grant”) is not a present grant of license, merely a promise to grant (or in this case, not to grant) later — and knowing the difference is one of the tricks of the licensing trade.
  • The copyright notice suffers from a common and unfortunate tendency of open source authors to apply notices that are so ambiguous and generic that they no longer constitute a notice of anything.  No specific date.  No specific owner.   The purpose of a copyright notice — assuming there is any remaining material purpose under current law — is to put the reader on notice of when the work was published and who claims the copyright.  If you can’t say either, it’s best to omit a notice.
  • What does it mean to “collaborate” with one’s own government?  If a company is on the list but did not actually “collaborate” would this limitation still be effective?  What if that collaboration is required by law?  Would this only apply to companies that collaborated beyond the requirements of law?  If not, is a license restriction that requires one to violate the law by, say, refusing a government subpoena or order, actually enforceable?
  • Defined term “ICE” is never used.  Just saying.

Laying aside pure drafting concerns, the re-licensing effort was criticized by developers for being done in the wrong way, at the wrong time in the development cycle.  There is an interesting practical discussion on license change and versioning here.

The request of the contributor to remove contributions is also interesting.  Presumably, any previous contributor to the project contributed under MIT (or less likely, another permissive set of terms).  So as a baseline, such a contributor has no legal right under copyright to revoke his license.  But perhaps the issue was that this new licensing statement purports to set terms on behalf of all “Lerna Contributors”?  At least, it claims a joint (?) copyright by all the contributors.  (Heaven help them if they try enforcing copyright terms imposed by a joint authorship team, one of which has left the fold.)  Of course, any contributor can ask for his contributions to be removed on personal grounds, but if the contributions have been placed under MIT before, they couldn’t be removed on copyright grounds.

Below is the text of the restriction, which I have reproduced here before it disappears from the web entirely:

Copyright (c) 2015-present Lerna Contributors

The following license shall not be granted to the following entities or any subsidiary thereof due to their collaboration with US Immigration and Customs Enforcement (“ICE”):

– “Microsoft Corporation”

– “Palantir Technologies”

– “, Inc.”

– “Northeastern University”

– “Ernst & Young”

– “Thomson Reuters”

– “Motorola Solutions”

– “Deloitte Consulting LLP”

– “Johns Hopkins University”

– “Dell Inc”

– “Xerox Corporation”

– “Canon Inc”

– “Vermont State Colleges”

– “Charter Communications”

– “LinkedIn Corporation”

– “United Parcel Service Co”

A note: I know there has been a furor over Commons Clause lately, and I will post more about that eventually, as soon as I catch up.  Some have suggested that the release of Commons Clause encouraged the Lerna project action.  As far as I know they are unrelated, and it seems to me highly unlikely that the Lerna project was taking cues from Commons Clause.




UC Releases Guidance on Open Source Licensing

University of California recently issued guidance on open source licensing, and provided resources on understanding open source licenses, particularly with a view to informing professors and other UC personnel about the process of releasing open source code developed at the university or using university resources.

Kudos to UC for doing this (go, Bears) — more university offices of technology licensing need to understand the open source paradigm and how it benefits their professors, students and staff.  Historically, universities have been laser focused on patents, and so there is a spectrum of sophistication among universities on software copyright alone, not to mention open source licensing.

These materials should be helpful to professors who — during or after their time in academia — want to start companies leveraging software and other technology developed at the university.  That can be a bit of a puzzle, as the rights of the university, the individual professor, and a newly formed company need to be sorted into buckets, and sometimes licensed from the university.  (If you want to know more about university technology transfer generally, see this article.)

The UC system promulgates some policies system-wide, and there are also policies for each campus.

(Thanks to Angus MacDonald, Senior Counsel, Intellectual Property at University of California for the heads-up on this information.)

The guidance below also provides information on common open source licenses and various UC IP policies.


More GPL Cure Period Commitments

Fourteen more companies, including Amazon, ARM, Intel, MariaDB, Pivotal and VMware — have joined in the Red Hat-led GPL Cooperation Commitment, a cure period pledge for violations of GPL and LGPL version 2 licenses, bringing the total number of pledging companies to 24.  The pledges now represent 39 percent of corporate contributions to the Linux kernel and six of the top 10 corporate contributors.  Red Hat has also invited individual Linux contributors to join.

There is a GitHub page with more information about the GPL Cooperation Commitment and instructions on how to join.

See my prior posts here and here.

Revisiting the Open Source Business Model

In the past few years, many companies have come to me asking for help drafting a “new open source license.”  As a rule, drafting a new open source software license is not a good idea, so I always hear this request with skepticism.  But lately, what is behind these requests is something new — frustration with, or perhaps misunderstanding of, existing open source business models. Inevitably, what these companies really want is some kind of proprietary license that makes source code available.  Not open source, but open in a different, and more limited, sense.

For a few years now, companies have tried alternative approaches that make source code available, but with license scope limitations that will prevent massive free riding on their development work.  For example, a few years ago, I helped with a model called theFair Source License.  That was one approach to limited, source-available licensing that allowed free use below a certain threshold.  But this trend is far from done, as companies continue to struggle with how to balance collaborative development and sharing of source code with making money for their own efforts, or their investors.

To understand why companies have come to this difficult choice, it’s necessary to understand the business models that companies use to make money with open source software.  Without making money, they can’t survive. But making money with open source is not intuitive, and not easy.

What is an Open Source Business Model?

Open source business models are almost always either service models or “razor blades” models — the later coming from the (possibly apocryphal) pronouncement of King Gillette,Give ’em the razor; sell ’em the blades.   The razor blades model works — just not for software companies selling only software.   Companies can sell hardware or other complementary products and give away open source software, and still fill the revenue coffers.  In fact, arguably, that’s what has made Linux such a darling of the entire industry. The companies contributing to Linux are, almost to a one, not selling Linux.  They are all selling something else: computers, other devices, or services. But the open source software they make does not give them a competitive edge.

A variation on this theme is that some companies have made money offering professional services, like development, system management, or maintenance services, for open source software.  But as everyone in the Valley knows, professional service models do not scale. Overhead allocation aside, the first hour of professional services costs a business just as much as the 10,000th hour.  With that model, you don’t get the multiples that investors expect and you are basically selling your personal time. All you can do in that model is work harder or work smarter, but there is a limit to how much you can do and how much you can know.  (As any lawyer can tell you.)

Some companies have done quite well selling other kinds of services — such as e-commerce or other web based services — that make heavy use of open source software.  But of course they are not selling software either; they are, like the razor blades companies, investing in their own infrastructure but selling something else.

Dual Licensing

Over time, companies who wanted to make a business selling open source software — the razors rather than the blades — have had to be more creative.  In the 2000s, a business model emerged that is usually called “dual licensing.” It was pioneered by MySQL AB, later acquired by Oracle. Dual licensing works like this:  the company releases software under a copyleft license like GPL. However, anyone integrating that software in a proprietary product would have to violate GPL to distribute its product.  Therefore, the company offers alternative, proprietary licenses, for those wishing to distribute the software as part of a proprietary product.

This model works because GPL does not allow distribution of GPL code within a proprietary product, and because vendors of proprietary products were not willing to lay open their proprietary code under a compatible license.  Therefore, the vendor of a dual licensed product places its developer community between a rock and hard place, with an option to buy their way out.

Many people find dual licensing models confusing, and they often ask, how can anyone license GPL code under a proprietary license?  The answer is that not just anyone can do that, only the author of the software (or a subsequent copyright owner via assignment). The author of software, as owner of the copyright, has the right to choose multiple outbound licenses.  In other words, the software is not GPL unless and until the author says so. Only the author can make that unfettered choice — everyone downstream must make the dual licensing choice.

But that model only works, in an economic sense, for cases like MySQL where two conditions are met: the software is not an entire program, and the software is mainly used in distributed products.  If the software is an entire program, then anyone can use and distribute the software under GPL. The pain point comes when the GPL code must be integrated with proprietary code to make it work. In this sense, dual licensing is like an intentional “license bug” that has to be solved with a proprietary license.

And there are complications.  Dual licensed products are rarely truly community projects, because, to make the model work, contributors have to grant the vendor broad enough rights to allow the vendor to use contributions under either GPL or proprietary terms — in other words to use a contribution license agreement or CLA.  Contributors tend to balk at this requirement, resulting in demands for license in = license out

Historically, dual licensing models were almost always implemented with GPL2 as the open source choice; most other licenses lack the conditions to drive private businesses to the proprietary licensing choice.  Once GPL3 and AGPL3 were released, those licenses took their place as part of the dual licensing model, because they imposed more conditions on the exercise of the license than GPL2.

But that is not the only limitation of the dual licensing model.  If the software is intended for uses (such as supporting SaaS, monitoring or development tools, or software intended for end use) that would not normally require distribution, then GPL would not drive anyone to take a proprietary license.   So, the dual licensing model waned in popularity over time. AGPL was potentially more effective in such cases, but again, only for pieces of programs, not whole ones. Today, pure dual licensing models are not so common, and have given ground to the “upsell” model described below.


In one variation on the dual licensing model, which is sometimes referred to as an upsell model, the company releases a basic “community” version of its software, saving the “enterprise” features for proprietary licensing.   This approach makes more sense in some cases than others. Enterprise editions can include features that are legitimately unique to commercial enterprise deployment — such as the ability to spin up and coordinate many simultaneous instances, servers or users.  But often they simply include attractive features that the developer wants to sell rather than give away. Even worse, some dual licensing models do not make clear the distinction between their community and enterprise editions, leaving customers unclear on which they need. 

Dual licensing and the free software movement have always been uneasy bedfellows. Dual licensors have been instrumental in establishing the enforceability of licenses like GPL; after all, they have the incentive and money to bring enforcement claims.  But free software advocates have often denigrated the pure dual licensing model or called the upsell model by pejorative terms like “crippleware.” 

Infrastructure and Free Goods

The moral rightness — or for that matter the commercial effectiveness — of dual licensing models is in the eye of the beholder.  But there is a subtler reason for this diversity of views that has nothing to do with philosophy. It is because the free software model is primarily focused on infrastructure, and the proprietary model primarily on higher level applications.

Free software for infrastructure is like banning toll roads.  If we have free roads, we can all share them, and get from place to place.  It makes sense for infrastructure to be a public good. There are lots of economic participants willing to contribute to infrastructure and the cost can be spread across many parties.  Collecting tolls is time consuming and expensive, and thwarts commerce. So we expect the government to build roads, fund the building collectively through taxes, and defray the expense through the increased commerce the roads create.

But collaborative economics don’t work so well for bicycles.  For bicycles to be useful, you definitely need roads. But you don’t need the government to build bicycles.  Private companies will build those, and in fact, will offer a dizzying array of models, prices, quality, and specialization.  The price for bicycles will be borne by those riding them, and the transactions costs are more efficient — it’s easier to charge once for a bicycle than every day for a road.

Similarly, free software is a great model for infrastructure, but much more difficult for applications.  While many companies will collaborate to build operating systems and web servers, they will not be so keen to collaborate to build applications.  This is not a new idea — it was behind Linus Torvalds’ position that user space programs could run on the Linux operating system, regardless of their licensing. 

New Licenses, New Models, Old Models

All this is preface to explain that today, more and more companies have decided that even dual licensing does not work as a business model for applications.  Dual licensing is about as far as an open source licensing model could be pushed without foregoing licensing revenue entirely. So now, software developers have begun to insist: I need my software to be free only for non-commercial purposes.  Or, I need my software to be free only for small users, or for some kinds of uses. And most common: I don’t want anyone to use my software to provide paid services without paying me. That is the genesis of the “new open source license” people ask me to write.  What they are saying, by implication, is that open source licensing does not work for them as a business model.

Wanting to prevent free-riding is understandable, but it is not open source.  In fact, any license restriction is not open source. For example, the Open Source Definition (planks 5, 6, 8 and 10) prohibit limitations on fields of endeavor, types of users, or types of products.  In fact, open source licensing, by definition does not limit the scope of a license — it only applies conditions to exercising the license. This distinction is subtle but important. In open source licensing, no one can stop you from doing whatever you want with the software — whether that use is commercial or non-commercial, or famously, good or evil.  Limited, or proprietary, licensing does not let you do everything you want, but only what the licensor allows. Perhaps you can use but not distribute. Perhaps you can distribute but not modify. Perhaps you can distribute only in free products. The variations are infinite.

Proprietary licensing has been around for a long time, of course, and it is not going away any time soon.  It was invented to allow software developers to make money. So it is not surprising that some developers, despite that they love the collaborative aspects of open source, want to move to proprietary models.  The question is not whether they will do that — they will. The only question is how much of the open source model they can preserve in a proprietary paradigm.


One of the things people love about open source licensing is that it is (mostly) standardized.  Although there are over 100 OSI-approved open source licenses, people only use a handful of them with any frequency: BSD, MIT, Apache 2, MPL, EPL, LGPL, GPL, AGPL.  One of the boons of open source licensing is that a few letters can identify a licensing model, and those tasked with license compliance don’t have to read hundreds of licenses to know how to do their job.

But proprietary licensing is, at this point, mostly still the wild west.  If you count Creative Commons licenses — which are not really software licenses — there is a small amount of standardization.  But for the most part, the end user license for one proprietary product bears little resemblance to that for another product, other than overarching substantive terms.  Lawyers who practice in this area all have their own favorite forms, and many of them are similar in substance, but you still need to read the fine print every time.

What the technology business is seeming to cry out for is standardized and easy to understand proprietary licenses.  They don’t want to pay lawyers thousands of dollars to write non-standardized licenses that licensees find hard to understand.  What they want is something more like a “smart contract” — a set of terms with minimal variations that can be quickly and easily understood but have a reasonable force of law.

Technology transactions lawyers need to understand that non-standardized and elaborate software licenses are going to become a thing of the past.  For proprietary licensing, we are at the point of “evolve or die.” But there is nothing preventing us from evolving; all we need is a pen.

For open source advocates, the choice is now.  Copyright holders have the right to set their terms, and open source terms don’t always work for businesses with commitments to the bottom line. Too much criticism of alternative models will probably result in vendors backing into proprietary licensing models that don’t preserve any of the benefits of open source.  In other words, halfway is better than none.