Summary of IMLS Shareable Local Authorities Forum, Cornell University, October 24-25, 2016

Summary drafted by Isabel Quintana

Workflow

Finding right mix for scalability and sustainability.
Role of data science and AI.
For both human and machine resolution we need an ensemble of information about an entity.

Provenance metadata very important for certain use cases, e.g. where identification of material is part of the scholarly enterprise. Examples: rare materials, archival data.
Not all consumers have the same needs, and tracking metadata provenance brings scalability issues. Needs must be defined by reference to real use cases and functional requirements.
We can keep the information at a basic level; or we can keep complex data but only expose a subset; or we could expose it all and allow the user to take only what they need.
Role of aggregator may be just that: to aggregate, and not necessarily to vouch for the data collected. But this makes recording provenance important.
How can we track this information? We don’t have a well established or widely shared model, infrastructure, or process for doing this.
Intellectual property/business model questions also play into this issue.

Whose responsibility is this? What are the obligations of providers?
Even if one or two databases exchange data, that data does not usually make it back out to all the contributors to that database.
But heavily centralized and regulated processes (e.g. NACO) also impose costs.

Some skepticism voiced about linked data.
Bridging or elimination of silos brings great potential for enriching knowledge. Importance of communities who do not participate in library authorities but have names that appear in library authorities. See e.g. RelFinder: http://www.visualdataweb.org/relfinder.php
It allows for very rich representation of relationships. What level of alignment is needed for interoperability and reuse?
What is the best way to do versioning?
Need better understanding of what data and services aggregators need to provide.
Local vs shared URIs

Mint at point of need vs requirements for stability.
Issues of trust/veracity of statements if we don’t maintain data locally and just point to another place.
Risks involved in sameAs assertions.

Community

Sharing remains important but may need to work differently from the way it used to.
Different models of collaboration or sharing between various participants throughout the identity management lifecycle

Different communities of practice: national, institutional, domain, regional.
For example, ISNI and publishers exchange information, as does the Getty and museums.
Collaboration with other language communities.
Need for examples of successful cooperative models that others can learn from.

Roles/responsibilities of various participants in the process (i.e. service providers, library community, publishers, standards community, etc.)

Publishers may care about identities because metadata sells, and they care about rights management.
Stakeholders looking for guidance from metadata community about requirements; CrossRef is a model.

Centralized versus distributed work, for example, VIAF aggregates data that could be done in one workflow.
Strong interest in developing minimum viable product specifications for shared data.

Responsibility and/or ethical issues

Issues of veracity of statements about entities; tracking the provenance of statements.
Traditional authority control is driven by literary warrant, but in a wider identity management context this is a limitation.
Privacy of data, e.g.

Cultural institutions stripping out data before sharing it in Europeana
Authors who do not want (or have not approved) certain data elements shared across databases
Providers (e.g. Ringgold) retaining proprietary rights over some data.

Tooling, infrastructure

Along the continuum of human versus machine processing, we need better computer processes so that humans need to work with the data less.
Strong interest in more effective sharing of algorithms and metadata profiles for matching. Consider clearinghouse of processes, workflows, data elements, etc., but sustainability is an issue.
What are needs for shared discovery of local authority data? How does this fit into the overall infrastructure? Could we do e.g. AWS template? We should look at examples.
Reconciliation as a service. Identities layer does not need to be ILS- or vendor-specific. Potential for cross-platform (including cross-vendor) solutions.
How can we leverage infrastructure that already exists, e.g. with ISNI? Are there other potential models, e.g. CrossRef?
Role of ISNI as the clearinghouse for names (the “ultimate ID”)

Best practices

Guidelines needed on creation and maintenance of URIs.
We need best practices for updates, versioning, etc.; and discuss who is responsible for disseminating changes, and at what point.
Need for mechanisms/services to enable identity management further upstream, e.g.

publishers could include identifiers when they send out their data
publishers could include enough information (full names, subject keywords) to facilitate disambiguation.
Enable authors to enter ID at the point when they submit the manuscript, then the ID can travel with the author/work through the workflow. What are the impediments to this workflow?

Community

Perhaps NISO can shepherd some development of best practices.
Document and share existing or pilot workflows as model for other users. Examples exist among older projects as well.
What are social (e.g. membership) barriers to acceptance of existing authority infrastructure? E.g. SNAC is arguably more democratic than NACO.
How can we get smaller libraries and publishers to do some identity management “without even realizing they are doing it”? In other words, make is as seamless, automated and low-barrier as possible.
Need for sustainable business models.
Consider other community models, e.g. http://archesproject.org, http://vocab.getty.edu.

Space shortcuts