I recently came across an online discussion that mentioned this very interesting article, Usage and Attribution of Stack Overflow Code Snippets in GitHub Projects, by Sebastian Baltes, Stephan Diehl, is a study of certain licensing issues in Stack Overflow, a discussion site for software developers. Stack Overflow applies the CC BY-SA 3.0 license, a copyleft license for content, to contributions, and there is an ongoing debate as to the suitability of those license terms.
The study analyzes the attribution of “non-trivial” Java code snippets to estimate rate of usage that did not comply with CC-BY-SA notice requirements. The study found that “at most 1.8% of all analyzed repositories containing code from SO used the code in a way compatible with CC BY-SA 3.0. Moreover, we estimate that at most a quarter of the copied code snippets from SO are attributed as required.”
It is a fascinating topic, and it is refreshing to find a practical and empirical analysis of a licensing issue. (This article by Chaiyong Ragkhitwetsagul, Jens Krinke, and Rocco Oliveto also reports the results of surveys of Stack Overflow answerers and visitors to assess awareness to outdated code and software licenses.)
Those who do M&A deals and other open source compliance efforts know that the average code audit usually turns up a handful of these items. While many so-called snippets are short and may not enjoy copyright protection, that legal conclusion can be challenging to make, and unsatisfying to the risk-averse. To avoid uncertainty, buyers often want such snippets removed, resulting in additional engineering costs that are expended to manage small but non-zero legal risks. It is in economic terms a tax on development activity.
A project to convert Stack Overflow code contributions to a permission license, MIT, died on the vine in 2016. That is unfortunate. Discussion boards would better serve their community by requiring contributors to apply permissive licenses — or even public domain dedications or licenses with no attribution requirements — to small code examples. At least that should be the default choice. It seems doubtful that most contributors care enough about any copyright they may have in code snippets to apply — or enforce — significant conditions on what they contribute. Given the choice, they would probably be happy with permissive terms. Moreover, many of the contributions are taken from other sources and contributed without attribution of upstream license terms, which may or may not be compatible with CC-SA.