Outline for Final Report

Charge (Adam C)

Problem statement

Digital collection identifiers (Rick)
A long standing issue in the presentation of our digital collections is the lack of a reliable identifier
for the collections themselves. Collections move to new machines, collections change their delivery platform,
and collections change their default behaviors. We need to be able to locate collections reliably over a long period of time and discover aspects of collections for interoperating with other collections.

Persistent IDs for digital preservation (Bill)
- We are working on preserving the digital objects in two large collections, the Euclid journals and the arXiv.org preprints. In general, we can view the objects as having at least one content file and an object descriptor file containing metadata about the object. Most digital objects contain multiple content and metadata files. We need to be able to identify and locate the files for a long time, regardless of where they are located. Rather than changing the metadata in the descriptor file every time a file is moved-an event that occurs several times during the archival ingest and storage process-we would like to create persistent identifiers that can be mapped to the files' current locations.
- The number of component files to be preserved will be several times larger than the number of digital objects. With processing efficiency in mind, we would prefer a solution that will allow us to resolve the identifiers locally, without going out over the internet for each request for resolution.
- The digital objects' component files in our preservation system will not be directly accessible to the public; access will occur through a gated interface. The persistent identifiers need not, and should not, be public.

Requirements for an implementation (John)

We don't want to break any system currently used at CUL that uses persistent identifieres, such as the PURL server. Backward compatible.
Provide a mechanism for persistent ID's for OAI-PMH
Optionally resolvable only within a constrained environment. Secured nameservice. For archiving. The individual
Can ensure confidentiallity.System should define a mechanism for client authentication/authorization to ensure data integrity and authority control.
It doesn't have dependencies on external systems in order to resolve local PIDs.
Every PID should be globally unique.
PID should be free of location semantics.
PID must be able to refer to multiple aspects, attributes, or behaviors of the digital object, but with a default aspect that conforms with convential use.
Globally resolvable.
Fine-grained control of PIDs, so that groups can maintain their own sets, without having to maintain multiple PID resolvers/servers.

Methodology (Adam S)

Recommendations
(embed the rationale for each recommendation)

Resource Requirements -- we don't know what the requirements are (development? a standalone machine? server space? maintenance?) (John)

A deliverable: A Usage Document that explains how the system can be integrated into CUL collection building (Adam C)

An explicit statement of the estimated lifespan of the PID and the object it represents. (Bill)