Context for Using Community Data License Agreements

In drafting the Community Data License Agreement (CDLA), we have been on a “listening tour” talking to many organizations about what a license agreement for collaborating on data might look like and how they may be used. In addition to receiving many excellent and sometimes specific suggestions for improvement, we have learned that we need to view any discussion of a data licensing within a broader context of how it will be used.

There will be at least three layers of potential considerations in any data sharing situation that uses the CDLA in an open community data repository. The first will be the inbound agreements or licenses that govern the contribution of data to the repository. Any organization can unilaterally publish data using a CDLA. However, if communities will be openly collaborating on building data communities other considerations come into focus. The second will be the outbound license and/or the technically imposed limits on its use. The third will be the charter of the organization that hosts the data. We’ve tried to share our findings on these three aspects below.

1. The Inbound License.

What we are focused on initially is the development of a family of inbound license or contribution agreements that will enable flexibility in the use of the data going forward by avoiding a period of “license proliferation” before the community converges on a few useful forms.  As we know from experience with open source software, different license terms can prevent the combination of valuable assets and require the maintenance of separate repositories.  The use of a small number of familiar licenses will greatly reduce the friction to both contribution and use of data that is made available in open repositories.  Limiting the inbound license to essential terms that are necessary for contributions to be made, allows for maximum flexibility as both the social and technical constructs around data evolve.

2. The Outbound License or Technical Restrictions on Use

In some situations, in data as in software, the inbound license and the outbound license may be the same.  For example, where the data is valuable but not sensitive and the goal is to maximize the use of the data, the same broad rights may be appropriate for both the inbound and outbound license.  But, in other situations, the use of the data will be limited by the sensitivity of the information it contains and additional restrictions will be imposed on the use of the data by law, by agreement and/or by design.

a. By Law. Nothing in an open source license alters the obligations of a user to comply with applicable laws.  For software, we know that U.S. citizens are still bound by U.S. export restrictions even with respect to open source software (although special rules may apply), but we do not include that obligation in the license.  There was a short form permissive open source license that was approved by OSI a number of years ago that included a specific export law statement.  That license has been formally deprecated and is no longer used even by its author.  For data, we know that it does not matter what license is applied to the data, or what the license says, everyone handling the data will be obligated to comply with the laws applicable to data in the relevant jurisdiction.

b. By Agreement. If the inbound license is permissive, or if the data is made available only from a hosted source, data repositories may impose additional restrictions on the use of the data.  In other words, the outbound license may not be the same as the inbound license.  The restrictions imposed may be intended to avoid misuse of sensitive data or violation of specific regulations known to be applicable. For example, if the use of the data is limited to a specific country, that restriction could be imposed in an outbound license.  But as the additional restriction or obligation may change or new technical controls or processes may make the contractual restriction unnecessary, it is wise not to include the restriction in the inbound license such that a renegotiation of the inbound license would be required in order to implement potential changes in the data’s availability.

c. By Design. The most effective way to impose restrictions is to build them into the architecture of the repository that is the source of the data.  Although technical restrictions have been highly disfavored by free software advocates, such as installing software on hardware that does not permit updates, technical restrictions may be necessary to enable the use of sensitive data as a shared resource.  Imposing restrictions by agreement shifts risk legally but does not reduce the actual risk of unintentional misuse.  Building a platform to automate the imposition of the restrictions, by, for example, allowing queries of sensitive data but limiting the form of results to prevent the distribution of the sensitive data, may be necessary to give contributors and collaborators comfort that their entrustment of the data to a specific source is reasonable.

3. The Charter of the Community that Is Entrusted with the Data

A third layer of restriction may be imposed by the Charter of the community, organization or institution that is entrusted with curating the data.  If the community is established for the purpose of making data available for a particular purpose, freedom to use the data for any purpose and in any manner within the confines of a charter will avoid restrictions that make sense in the current environment but make little or no sense in the longer term due to changes in technology, social norms or the legal landscape. Communities that curate data, particular communities focused on specific geographies, industries or uses, will be best able to anticipate, identify and structure criteria and reviews to address relevant restrictions to their community. For example, weather data from government sensors may not generally invoke data privacy concerns; however, if the sensors are individuals’ smartphones and tracking geolocation data, additional concerns may arise. It’s best for the community working in these specific contexts to address the issues of sharing the data, and difficult for a license drafter to address in the abstract. The privacy concerns may also evolve over time (e.g. Country X could shift from banning to allow sharing an individual’s geolocation data if the user consents).

We hope this context is helpful in our conversations regarding a proposed family of licenses that could serve as the standard inbound license or contribution agreement for a data repository.  We are not trying to build an entire data and regulatory infrastructure into the licenses.  We are rather trying to limit the licenses such that they resolve once and for all the contributor’s rights with respect to the data, but do not impose additional obligations that may, will or should change over time.  We do not want the licenses to impose the equivalent of required use of CD-ROM technology for delivery of data.  We want to strike the right balance to maximize both contributions and usefulness.

What happens after the inbound license will change over time.  But the data that is contributed today will continue to be valuable for decades or centuries to come. Our goal is to provide an inbound license that will not impede this evolution.