MusicBrainz Data License Issues
by Robert Kaye
rob@musicbrainz.org
January 2003


Introduction

The MusicBrainz project currently distributes its dataset using the OpenContent license. However, the Free Software Foundation frowns upon this license, and it really isn't suited for licensing factual data as it turns out. Given these issues, what license should be used for distributing the MusicBrainz dataset?

What is wrong with using the OpenContent license?

MusicBrainz uses the OpenContent license to make the data in the MusicBrainz server available to the public. This license hopes to be a GPL analogue license for general content (music, pictures, stories, etc.) and is not specifically designed to handle data that resides in an SQL server.

Richard Stallman has some objections to the OpenContent license and the GNU web page states:

This license does not qualify as free, because there are restrictions on charging money for copies. We recommend you not use this license.

If OpenContent does not solve the problems that MusicBrainz faces, then what license does? I like the spirit of the GPL, and I've considered using the GPL for MusicBrainz, but GPL uses software specific terms like source code, object code, and executable all of which do not apply to data. Richard Stallman suggested using the GNU Free Documentation License, but that license is specific to documentation and uses terms like title pages, appendices, and cover texts. This license feels like trying to force a square peg into a round hole, and I don't think it would provide adequate protection for a data set in a court of law.

After endless hours of research trying to find a fitting license I've come to the conclusion that none exist that can cover the issues that arise from trying to apply the Copyleft concept to a database. Simply trying to define what constitutes a derivative work for a database is a frustrating experience.

Consider the following issues:

  • Is a derivative work created when someone takes some data from a Copyleft database, combines it with some data from a proprietary database and then displays the data to a customer? What if that customer is or is not charged for access to that database?
  • It would seem that importing one column from a Copyleft database into a proprietary database would constitute a derivative work. But, what if a user inserts row identifiers from a Copyleft database into a proprietary database in order to cross reference them?
  • If someone takes some data from a Copyleft database and adds it to their own web page, is their web page now subject to the same license that covers the database?

A license that protects the contents of a database should be able to address the concerns above. This may seem like a tall order already, but the situation gets more complicated. In the 1991 case of Feist Publications, Inc. v. Rural Telephone Service Company, Inc., the US Supreme Court ruled that a database (a compilation work) must contain a minimum level of creativity in order to qualify for copyright protection (source). Merely listing factual data in an alphabetic listing is not enough, and anyone can legally extract portions (short of the whole compilation) of the database for their own use.

Put bluntly, factual data cannot be copyrighted.

To date, the data collected in MusicBrainz is 100% factual -- this will change over time as MusicBrainz starts collecting data like music reviews and genre listings. However, given the Feist decision, how can we protect the MusicBrainz data?


Detailed Problem Analysis

Instead of looking for one license that can cover the MusicBrainz data, we must now look for a license to cover the non-factual data and we must come to terms with the reality that factual data cannot be copyrighted.

Factual data cannot be copyrighted in the United States -- in the European Union and in lots of other countries of the world factual data can be copyrighted. But with the US as the weakest link in the chain to protect the data, it will be impossible to protect the data even if the MusicBrainz project were located outside of the US. Consider this scenario:

A citizen of a European country travels to the United States and downloads most (but not all) of the portions of the MusicBrainz data. Perhaps this person takes all of the data, except for a few obviously bad metadata entries. This person then exports the dataset by returning to the EU. Finally, said person can then take this dataset and apply a copyright notice and begin using and/or selling this data to other EU customers. While this may not be morally correct, it is perfectly legal.

Regardless of the license that MusicBrainz applies to the data, the above is legal. Lets assume for a moment that the MusicBrainz community decides to release the data under the GPL. The GPL assumes that copyright law applies to the source code/data and proceeds to give the licensee a set of rights, as long as the licensee abides by terms of the GPL. If the licensee violates any portion of the GPL, the GPL becomes null and void, and the licensee is now in direct violation of copyright law.

The EU citizen in the example above exports the GPL dataset back to the EU and then begins to sell copies of the data. That action is a clear violation of the GPL, which then causes the GPL to no longer apply, thus falling back to the existing copyright laws. But in this case, there are no copyright laws to fall back upon, and thus the MusicBrainz community has no legal recourse against the immoral citizen.

The end result is that there is no way to protect factual data. Period.

On the other hand, factual data cannot be copyrighted, and the MusicBrainz community should seek for or define a suitable license for the non-factual metadata. We'll cover this topic in much greater detail later.


A New Approach

If the factual data in MusicBrainz cannot be protected, do I just stop hacking and get a real job?

No. We need to examine our beliefs and our preconceived notions about having to protect our dataset. At the first CodeCon Conference Fred von Lohmann, who is a staff attorney for the EFF, suggested that the community stop trying to protect the dataset and instead focus on how to ensure that the dataset will always be available to the community.

Fred’s comments suggest a simple method for resolving the license issues that is not fraught with paradoxes and complex legal documents. Rather than restricting who and why someone can use the dataset, we simply need to ensure that the data has a permanent home that cannot be compromised by greed or other malicious activities.

The thought of letting go of control of the MusicBrainz data will be a new one for the MusicBrainz community. The concept itself is not new at all -- the most prominent example of letting go of control comes from Linus Torvalds. Linus, the creator of the Linux kernel which makes up the core of the GNU/Linux operating system, made the source code to his kernel available to the public in 1991, and thus relinquishing ultimate control over the code. Letting go of control over the kernel sources has spurred the development for Linux and its consequent rapid adoption in the industry.

The same will be true for MusicBrainz. Letting go of control of the data and explicitly placing the factual metadata into the Public Domain will make the metadata more useful than it is today. I am even willing to argue that we're holding back the MusicBrainz project by trying to protect the data.

Data has no intrinsic value. Data with relationships to other data has some value. Data with relationships to lots of other data is more valuable. In essence, the more connected a dataset is, the move valuable it becomes. Consider Metcalfe's law:

Metcalfe's Law states that the usefulness, or utility, of a network equals the square of the number of users.

I believe that Metcalfe's law applies to data as well:

The usefulness, or utility, of a dataset equals the square of the number of relationships contained in the dataset.

The more people/projects/companies use and link against the MusicBrainz dataset, the more useful the dataset will become. Each new relationship in the dataset and external relationships to and from the dataset will increase its value. Only by making the dataset freely available, without restrictions on its use, can the MusicBrainz dataset realize its full potential.


Threat Analysis

Lets assume that the MusicBrainz factual metadata will be explicitly released into the Public Domain. What threats exist that may make this move undesirable? Two threats become apparent:

  1. The M$ threat: M$ downloads the data and creates a service that sells this data to its customers. M$ makes money off the backs of the MusicBrainz community.
  2. The GN threat: GN purchases all of the MusicBrainz intellectual property, data and the souls and bodies of the creators of MusicBrainz. GN shuts down MusicBrainz and offers the same service for a fee to its customers.

These threats may seem serious at first glance, but a detailed breakdown shows that theses threats don't amount to much. In the first threat, M$ would have a hard time selling a product that is available for free to its customers. The free software community would make its voice heard (as it has in the past) and create an overwhelming negative PR nightmare for M$. The uproar from the community would make it clear that the service for which M$ had been charging is available for free at MusicBrainz. The idea of starting a service to charge for what is available for free is not a sound business practice.

M$ could also choose to add extra value to the product offering in order to charge money for the service. This concept is actually endorsed by the GPL, and thus is not really a threat. Furthermore, M$ could refuse to contribute changes back to the community. This means that they need to create the tools to update and maintain the dataset or take the MusicBrainz server source code and adapt it for their own use. Either one of these approaches costs lots of money in engineering time -- it would be cheaper to let the established MusicBrainz community take care of the maintenance of the dataset.

However, we all know that M$ has lots of money, and lets assume that they develop their own tools for data maintenance. It is unlikely that the paying M$ customer base would be willing to help maintain the dataset as the MusicBrainz community does. M$ is not known for having customers with noble goals about open communities. Thus, M$ would be forced to maintain the dataset themselves, and this translates into ongoing expenses, which are harder to justify than one time expenses. The dataset would also suffer in quality, since M$ cannot possibly have the breadth of users that MusicBrainz enjoys. M$ would be better off to have a friendly relationship with MusicBrainz and to send their customers to MusicBrainz for data maintenance.

The GN threat is even less important, if we take a few preventative steps. Fred von Lohmann suggests that we form a non-profit corporation that has the express bylaws that state that the MusicBrainz dataset must always be made available to the public. Furthermore, a US non-profit corporation cannot be bought by another corporation; in order to dissolve a non-profit corporation, its assets must be donated to another non- profit corporation. This will ensure that the assets of MusicBrainz cannot be legally acquired and locked away from the public.

And in the case that GN acquires the souls and bodies of the creators of MusicBrainz, the other members of the MusicBrainz community will take action to isolate compromised elements of MusicBrainz. And as stated earlier -- an isolated data set is worth a lot less than a well connected dataset.

In order for MusicBrainz to protect itself it must take a few legal and technical steps to ensure that the availability of the data cannot be compromised and that it is easier to work with MusicBrainz than to work in isolation.

The New Solution in Detail

A comprehensive solution to the MusicBrainz data license dilemma will require separate solutions for the factual and the non-factual metadata. The following steps propose such a solution:

  1. All factual metadata should be explicitly placed into the Public Domain, thereby acknowledging that factual metadata cannot be copyrighted.
  2. A MusicBrainz non-profit corporation should be created, which should adopt a set of bylaws that state that MusicBrainz will make all factual metadata created by the MusicBrainz community available to anyone who wishes to download the data.
  3. The non-profit corporation should establish a number of partnerships with groups like the Creative Commons or the Free Software Foundation, to mirror the MusicBrainz data. Having a neutral third party make the data available ensures fair access to the dataset and acts as a backup in case the MusicBrainz non- profit should cease to exist.
  4. The MusicBrainz community should work to identify any issues that arise from applying the Copyleft concept to a database of non-factual metadata. Defining a derivative work and to what extent linking to and using non-factual metadata is allowed under the Copyleft concept are only a few of the issues that need to be addressed.
  5. MusicBrainz should work with the established free software/open source license experts such as Richard Stallman and Eric Raymond to define a new license that addresses the complexities of applying the Copyleft to non-factual metadata. Creating a license that is derived from the GPL or other licenses that have had public scrutiny would be a great benefit to creating a solid license that can withstand an attack in a court of law.

Conclusion

The OpenContent license is not the right license for MusicBrainz, and as far as I can tell, no appropriate license for data has been created yet. This white paper serves as a call for participation for creating an appropriate license. If you have any thoughts on this matter, please take a moment to share them with me me or the MusicBrainz community in general.

 
Creative Commons EFF GPL LGPL This material is Open Knowledge Valid XHTML 1.1 Valid CSS 2.0
Original Design|vacubomb.com Contact details Server version: RELEASE-20071014