A recent conversation got me curious about licences in the PHP world. Licensing is important (and often overlooked), though is harder than it seems. Licencing is important as it tells the users of your software, the ways in which they are allowed to use your package and any attribution back, it also shows who is responsible should the software not work, along with other details such as how the end user can redistribute your code.
If you are unsure about licence differences tldrlegal is a great reference point.
I have had discussions with top solicitors in the past about software licencing as some parts of licences are more technical than they appear to be.
I'm not sure how much developers in general audit their dependencies apart from the occasional `composer licenses` run, and from my experience licensing seems to be an afterthought if the code is "open source" i.e you can see the code. This is especially so for web use, as most open source licences focus on the redistribution of code, which is not as much of a concern for most end users who are running a website, for example.
I was wanting a more in depth view of things.
I'm using a modified version of the script Nikita Popov wrote to analyse top packages. I look at the top 5,000 packages as ranked by Packagist in May 2021, and their chosen licenses.
Licence Name | Amount | Downloads |
MIT | 3,003 (57.84 %) | 25,564,929,753 (68.87 %) |
proprietary | 647 (12.46 %) | 256,489,222 (0.69 %) |
BSD-3-Clause | 554 (10.67 %) | 8,097,729,988 (21.81 %) |
Apache-2.0 | 194 (3.74 %) | 1,023,978,630 (2.76 %) |
GPL-2.0-or-later | 139 (2.68 %) | 230,877,508 (0.62 %) |
multi-licence | 75 (1.44 %) | 268,330,518 (0.72 %) |
AFL-3.0 | 73 (1.41 %) | 42,581,359 (0.11 %) |
BSD-2-Clause | 51 (0.98 %) | 259,176,670 (0.70 %) |
GPL-2.0-only | 44 (0.85 %) | 226,729,227 (0.61 %) |
LGPL-2.1-or-later | 38 (0.73 %) | 123,345,115 (0.33 %) |
OSL-3.0 | 36 (0.69 %) | 42,701,729 (0.12 %) |
unlicenced | 33 (0.64 %) | 23,469,355 (0.06 %) |
LGPL-3.0-or-later | 30 (0.58 %) | 35,324,798 (0.10 %) |
GPL-3.0-or-later | 27 (0.52 %) | 10,299,979 (0.03 %) |
GPL-3.0-only | 27 (0.52 %) | 204,289,909 (0.55 %) |
GPL-2.0 | 23 (0.44 %) | 24,879,582 (0.07 %) |
GPL-2.0+ | 20 (0.39 %) | 20,739,800 (0.06 %) |
GPL-3.0 | 19 (0.37 %) | 21,366,374 (0.06 %) |
LGPL-2.1 | 15 (0.29 %) | 92,421,941 (0.25 %) |
LGPL-3.0 | 14 (0.27 %) | 90,561,553 (0.24 %) |
WTFPL | 11 (0.21 %) | 17,593,161 (0.05 %) |
LGPL-3.0-only | 11 (0.21 %) | 38,856,436 (0.10 %) |
GPL-3.0+ | 9 (0.17 %) | 4,149,099 (0.01 %) |
ISC | 7 (0.13 %) | 48,001,372 (0.13 %) |
Apache2 | 6 (0.12 %) | 117,641,159 (0.32 %) |
LGPL-2.1+ | 5 (0.10 %) | 6,258,049 (0.02 %) |
AGPL-3.0-only | 4 (0.08 %) | 1,814,584 (0.00 %) |
AGPL-3.0 | 4 (0.08 %) | 4,173,312 (0.01 %) |
LGPL Version 3 | 4 (0.08 %) | 13,746,058 (0.04 %) |
Unlicense | 3 (0.06 %) | 3,444,615 (0.01 %) |
MPL-2.0 | 3 (0.06 %) | 3,657,522 (0.01 %) |
PHP-3.01 | 3 (0.06 %) | 669,857 (0.00 %) |
(Apache-2.0 or GPL-2.0) | 3 (0.06 %) | 22,890,472 (0.06 %) |
LGPL-2.1-only | 3 (0.06 %) | 26,482,135 (0.07 %) |
LGPL | 3 (0.06 %) | 6,563,853 (0.02 %) |
MPL-1.1 | 3 (0.06 %) | 458,795 (0.00 %) |
LGPL-3.0+ | 2 (0.04 %) | 617,055 (0.00 %) |
BSD-4-Clause | 2 (0.04 %) | 1,589,518 (0.00 %) |
LGPL-2.0-or-later | 2 (0.04 %) | 4,589,173 (0.01 %) |
BSD | 2 (0.04 %) | 1,510,896 (0.00 %) |
unlicense | 2 (0.04 %) | 2,539,285 (0.01 %) |
Facebook Platform | 2 (0.04 %) | 26,640,023 (0.07 %) |
CC-BY-4.0 | 2 (0.04 %) | 4,517,523 (0.01 %) |
OFL-1.1 | 2 (0.04 %) | 4,517,523 (0.01 %) |
LGPL 2.1 | 1 (0.02 %) | 796,454 (0.00 %) |
QPL-1.0 | 1 (0.02 %) | 529,644 (0.00 %) |
(CC-BY-4.0 and MIT) | 1 (0.02 %) | 1,255,243 (0.00 %) |
(MIT and proprietary) | 1 (0.02 %) | 262,255 (0.00 %) |
no usage restriction | 1 (0.02 %) | 1,565,314 (0.00 %) |
LGPL-3 | 1 (0.02 %) | 641,449 (0.00 %) |
MIT or GPLv2 | 1 (0.02 %) | 247,649 (0.00 %) |
QPL 1.0 | 1 (0.02 %) | 318,659 (0.00 %) |
Proprietary | 1 (0.02 %) | 267,169 (0.00 %) |
LGPLv2 | 1 (0.02 %) | 754,871 (0.00 %) |
Fair | 1 (0.02 %) | 143,793 (0.00 %) |
GPLv3 | 1 (0.02 %) | 450,407 (0.00 %) |
lgpl-3.0 | 1 (0.02 %) | 655,485 (0.00 %) |
https://github.com/paypal/Checkout-PHP-SDK/blob/master/LICENSE | 1 (0.02 %) | 856,278 (0.00 %) |
GNU | 1 (0.02 %) | 744,876 (0.00 %) |
Public Domain | 1 (0.02 %) | 2,230,320 (0.01 %) |
Artistic-1.0 | 1 (0.02 %) | 58,007,505 (0.16 %) |
GNU General Public License V2 | 1 (0.02 %) | 3,653,965 (0.01 %) |
GNU Public License | 1 (0.02 %) | 3,834,238 (0.01 %) |
beerware | 1 (0.02 %) | 4,137,393 (0.01 %) |
PHP | 1 (0.02 %) | 725,182 (0.00 %) |
GPL-2 or New-BSD | 1 (0.02 %) | 3,047,435 (0.01 %) |
Apache | 1 (0.02 %) | 2,896,010 (0.01 %) |
AGPL-3.0-or-later | 1 (0.02 %) | 368,557 (0.00 %) |
GPL-1.0-or-later | 1 (0.02 %) | 1,533,499 (0.00 %) |
Apache 2 | 1 (0.02 %) | 1,900,426 (0.01 %) |
BSD-3-Clause-Clear | 1 (0.02 %) | 1,584,859 (0.00 %) |
GPL-2 | 1 (0.02 %) | 770,559 (0.00 %) |
MPL-1.1+ | 1 (0.02 %) | 1,790,393 (0.00 %) |
LGPL-2.0-only | 1 (0.02 %) | 727,481 (0.00 %) |
CC-BY-2.5 | 1 (0.02 %) | 977,858 (0.00 %) |
PDDL-1.0 | 1 (0.02 %) | 210,510 (0.00 %) |
A few things to note here, firstly the amount of packages which suggest they don't have a licence. If this is true these packages should be avoided (more reading on that here), these packages have no explicit permission to use them.
The difference in the number of packages with a proprietary licence, and the number of downloads is striking, that suggests that licences are being looked at more than I thought, or that the proprietary packages are less popular (i.e. in the lower end of the 5k packages).
There are lots of not easily analysable licences as they are marked as proprietary. While some are truly reflective of this, it's hard as each one will have to be manually checked.
This is not entirely accurate though, as packagist only knows the licence indicated in the composer.json, this may or may not be the same licence as in the package.
There are four ways in which this could go wrong:
All of these cases occur in the following popular repositories:
Even Nexmo who have some of the best documentation, with a team who really know what they are doing make these mistakes, see Nexmo/vonage-php-nexmo-bridge: composer.json says MIT, Licence says BSD-3, so even industry leading developers working on well known and used packages make mistakes in licencing.
Other common things to do are:
A basic analysis is shown below
It's worth noting that some of the "Missing Licence" ones are actually against their own terms, as MIT for example says:
"The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software."
So not including the licence text in their own package is against their own terms.
Ideally before releasing a package as open source, people should consider what licence they want, each person will have their own reasons for wanting to do open source, so there's not a single licence which fits all needs. Github is getting better at asking you to include a licence when making a new repo, and can even import one on your behalf. This needs to match the distribution method, for PHP code this will almost certainly be Composer.
Composer could check that the licence matches, all of the data I have provided here has been downloaded from Packagist and Github, but it's not really Composer's/Packagist's job to do this. A reminder system could be useful, similar to how you get emails about invalid json files.
Education could also help, but licencing is a dry subject, which is hard to get right. I've been to conference talks about licensing, but they tend to be a bit preachy about the speaker's preferred licence.
A useful product would be something like Dependabot, which can check in bulk and open an issue on the repo. From my experience, once licence issues are pointed out, they are fixed quickly, although the code I have written to do this analysis could be extended on, it took 7 hours to get the information for only 5,000 packages. I'm not sure how to get a higher API limit, but suggestions would be welcome! I would be interested in writing a bot to validate this.
What can you do today? Check your open source packages, check the package on Github matches your packagist profile both extract this information.
If your package is using a known open source licence and you are unsure how to check, get in touch through twitter and I will gladly look at your licence set up.
Thanks to Greg Bowler, Rory and Phil Walker for the editing help and helpful suggestions