Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

●      Technical analysis of the Goldsen Archive's Archive’s interactive digital holdings and documented methodology, which will be highly instructive for comparable collections

...

●      SIP structure to support long-term preservation of objects, provisioning for future emulation development, with associated automated workflow for SIP generation and ingest, tested and vetted by CUL's CUL’s ingest process; and

●      A full report of this process, including financial and time expenses, so that other institutions might gauge the feasibility of a comparable project with their own complex media collections.

...

Our first goal is to determine use scenarios for different kinds of media art researchers. By building out use-case scenarios, we will develop a better sense of various users' users’ requirements for access to interactive digital assets. 

For example, an arts educator might have different needs than a professional artist or art student. The needs of one might be adequately, if imperfectly, satisfied with a screen shot of an interactive artwork; another might require more information about the work's work’s interactive properties. A narrative description of interaction might be enough for an arts researcher, whereas a software historian might require information about file types, information architectures, original storage devices, pre-migration platforms, or code.

...

2) Collection Analysis and Selection of Classes

unmigrated-wiki-markupTo develop our metadata and SIP requirements and subsequent automated ingest of complex media, we need better technical data about the contents of the collection itself. This need demands a better way of evaluating the nature and risk of the various pieces within the collection. Comparable assessments have been undertaken with collections of complex digital objects, such as the Preserving Virtual Worlds project\[[1]\|#_ftn1\]; however, no test bed for forensic assessment has been as broad, rich, complex, or wide-ranging as the interactive holdings of the Goldsen Archive.

We will capture file formats, hierarchical structure and relationships, hardware and software requirements (including operating systems and browser support), and other technical elements in an automated and systematic way. In consultation with the project advisors knowledgeable in digital forensics, and referencing similar projects already undertaken at the University of Maryland and Stanford University, we will evaluate the entire interactive collection. We will use this assessment to identify high-risk material based on risk of obsolescence for hardware, software, or browsers; material degradation or bit-rot; and critical dependencies such as relational and file structure contingencies. This assessment will establish the asset categories based in information architecture and technological risk-level for the next phase of the project. Ultimately, this analysis will provide a foundation for the preservation track by offering a baseline profile of the Goldsen Archive's Archive’s holdings.  Establishing an automated process for collection assessment would prove invaluable for comparable collections of interactive digital material.

Wiki MarkupTo reach our goal of developing a comprehensive Submission Information Package To reach our goal of developing a comprehensive Submission Information Package (SIP) that will contain appropriate content, metadata, and documentation required to support long-term access to new media artworks throughout evolving technological landscapes, this analysis of artwork characteristics will inform the required documentation for various classes of works at the hardware, software, operating system, and file levels. The analysis of the collection will inform the development of groupings, or classification of works, which share common representation information,\[[2]\|#_ftn2\] that can be used to form the initial structure of a SIP, and thus establish SIP classes.  

The formation of a data model that will capture those required dependencies and structural information will begin at this phase.  The data model definition will start with the identification of critical information entities that will need to be captured (processors, input/output devices, operating systems, software, libraries, file groupings), and the relationships between those entities.  This phase will lay the groundwork for the later development of the detailed attributes of each of these entities.

From this assessment, we propose to select two to three distinct, but related, "classes" “classes” of material to test; for instance, a "class" “class” might consist of a group of works created with the same software; related "classes" “classes” might represent a single software environment that functioned in CD-ROM and migrated to the web.  We will, however, let our findings and the advice of our consultants guide our selection in this project phase.  In making this selection, we will primarily look for categories that have: large impact---that is, exhibit information structures with potentially broad prevalence, even outside the Goldsen collections; good chance of success, and seem particularly viable for migration and potential future emulation; and scholarly value, specifically, we will seek categories that represent especially culturally significant artworks.

...

  1. Framework and methodology for analysis and classification;
  2. XML document for collection's collection’s item-level metadata as captured in broad-stroke forensic analysis;
  3. Data model for classes and representation information with accompanying documentation; and
  4. Population of the data model for each class as parsed from digital forensic analysis.

...

With a methodology established, significant properties will be defined for the selected classes of works with the aim of addressing the detailed attributes of the various components required to render a work, both from a purely technological standpoint, and as they relate to the work's work’s intended behaviors, display, and functionality. This will involve a breakdown of the artworks' artworks’ rendering environments, to determine what technical components are mandatory for long-term preservation and access.

...

We will likely build on METS and/or RDF. Rigorous analysis of the strengths and weakness of each will be conducted, addressing their suitability to effectively express complex metadata, meet local metadata requirements, and meet the needs and goals of our partners and sister collections.

Wiki MarkupMETS is well-established in the library and digital preservation communities as a logical wrapper of information packages.    It provides a means for expressing the structure of metadata and content files within a package, but is limited in how it can express relationships between those files, and also between the files and metadata that describe them.    In contrast, RDF, and an associated family of standards, including RDF/XML, OWL, SPARQL and SKOS, provide a highly flexible means for expressing not only relationships between files, but between all concepts (i.e. metadata) about those files.    While METS is widely adopted, and even locally implemented at Cornell for structural metadata during ingest into our digital archival repository, many organizations (including the Preserving Virtual Worlds initiative\[[3]\|#_ftn3\]) are moving toward RDF and developing OWL ontologies for defining semantic metadata (i.e. meaningful) that can be expressed and understood by machines, providing an additional layer of value over the traditional syntactic methods of encoding metadata. Further, RDF allows for flexible and expandable description, which could aid long-term iterative expansion of preservation metadata for these objects.

Regardless of the decision, our goal is to produce shareable results.  If METS is chosen, a METS profile will be developed and submitted to the Library of Congress for registration and use by other organizations with similar collections.  If RDF is chosen, an OWL ontology for the expression of data about interactive media art will be developed and disseminated.  Both options will likely be tested during this project, and thus both the METS profile and the OWL ontology may be disseminated.

...

Ultimate success of this project relies on the ability of others outside of Cornell's Cornell’s current institutional infrastructure to take a DIP and use its contents. To this end, and working closely with our advisory members, we will select a partner institution knowledgeable about new media artwork and preservation infrastructure to take one of the CUL packages and validate and understand its contents.

...

Testing the developed model is an integral part of any R & D effort. Following the model development, it will be tested comprehensively for the two or three defined classes. Ideally, comprehensively ingesting the classes to CUL's CUL’s digital archival repository will result in the actual preservation of the assets within the collection.    

...