Join MWG for:

Update from Linked Data for Production

July 17, 2019 - 3pm

Olin Library 106g


The "Linked Data for Production" (LD4P) team will give a series of short presentations on different aspects of the grant work, with time for discussion. LD4P is an Andrew W. Mellon funded project to identify pathways toward the implementation of linked data within a library environment:

Topics will include the Sinopia linked data catalog editor, work to support effective lookup services (QA), tools to import data from Discogs when cataloging music recordings, support for profiles in the editor, enhancing discovery, creation of a library linked data community (LD4), and the project cohort.

Please join us for a 2 hour hands-on look at existing linked data sets.

We often discuss linked data as a silver bullet for our metadata strategies, but we stop short of linked data native implementations because adequate tools are not available, decisions about which model to use are highly political, etc. Ignoring orthodoxy, politics, and the all-or-nothing mindset, this workshop will give an introduction to some practical approaches for evaluating whether an existing external dataset could be used to enhance our existing metadata strategies.

We will introduce strategies for evaluating datasets using SPARQL (the query language for RDF) through the use of Wikidata’s Query Service and possibly other SPARQL Endpoint services. We will also do some exercises using cURL (an open source command line tool) to retrieve linked data from a known URI. With access to the data we can perform analysis on whether this existing data can extend the metadata we already generate in our library platforms to support identified use cases. If we have time, we can discuss what considerations would need to be taken into account to follow through on this hybrid approach.

Please R.S.V.P. describing your level of comfort and interest with Linked Data/RDF and command line tools so we can make sure the focus of this workshop is catered to the participants.

January Code Club meetings

Is your New Year’s resolution to learn more about coding and scripting? If it is, or if you just declared it so right now, come to the first Code Club of 2019, tomorrow, Thursday, January 3, 2019 in Olin 701 from 12-1. We’ll work through the warm-ups and the exercises in Command Line Wizardry together. Bring yourself, your laptop, and your questions, and we’ll make it happen.

If you can’t make this one, never fear: we’re holding two more code clubs in January. Here are the rest...

  • Thursday, January 17, from 12-1 in Olin 701
  • Thursday, January 31, from 12-1 in Olin 701

Join us for two more events this fall!

On October 29th at 3pm in Olin 106, the MWG Steering Committee meeting will be open to all in order to brainstorm programming for the Spring. Please join us and bring your ideas for metadata-related programming. We're interested in developing programming that reaches out to all areas of the library, including discussions of how metadata affects the patron experience, how metadata connects our collections to the world, and how we can share tools and practices with one another. 

On November 15th at 11am in Mann Library's Stone Classroom, join us for "Basic Command Line Wizardry", a workshop with Dianne Dietrich. This one hour workshop will refresh and improve your command line skills. If you have specific questions or topics you would like to see covered please write to Dianne ahead of time with your requests dd388 at cornell dot edu.

Thanks to Jason Kovari and Steven Folsom for presenting on the ARM BIBFRAME extension this fall! Their slides are now available in eCommons:

Starting January 4th, the club with be meeting every other Thursday from 12-2pm in Olin 701

Over the past year and a half the Metadata Working Group has been sponsoring workshops in metadata clean-up and curation but we realized we needed to provide a space for people to practice those skills with support from others in CUL. We also knew that it needed to be a meeting that staff could work into their routine - so we're making it a revolving drop-in meeting, every other Thursday afternoon in Olin 701 (the Digital Humanities Lab).

A few things to mention:

  • There are a few computers in this lab but feel free to bring your own laptop if you have one.
  • The club is open to all interested CUL staff members. 
  • We follow the "Hacker School rules," which is to say that this is a safe, judgement-free space.

We can't promise we can solve your metadata problems but if you're having difficulty chances are that there's an easier way to do it. This group can hopefully start conversations among staff trying to solve similar problems.


Hope to see you in the new year.


Friday, December 8th, 9AM-12 PM, Mann Library Computer Lab B30A
Contacts for this meeting: Marcie Suzanne Farwell + Jasmine Burns

About: Open Refine Training

Many of us are dealing with lots of "messy" data, come learn the basics of using the data clean-up tool Open Refine in this 3-hour workshop taught by Erin Faulder and Jasmine Burns.

No experience necessary, you will be given a data set and time to practice skills and ask questions during class.

A helpful introduction to Open Refine from Library Carpentry can be found here:

If you're interested please sign up here so we have an accurate head count.


We look forward to seeing you there.


Friday, September 15th, 1 PM, Olin Library 2B48 RMC Lecture Room
Contacts for this meeting: Marcie Suzanne Farwell + Jasmine Burns

About: To kick off the new year of the Metadata Working Group, we have a Lightning Talks session scheduled for Friday, September 15th, at 1 PM in Kroch Library 2B48 RMC Lecture Room.

Lightning talks are informal, 5-7 minutes (tops) presentations that cover a perspective, idea, project, or question in a short time period. This allows for many different people to present on a variety of topics, in a fun atmosphere - as well as to sign up as late as the day of (as preparation is minimal). We warmly welcome anyone to give a lightning talk.

For this session, we'd like to heat about the tools you are working with to clean, query, manipulate, or otherwise work with your data. This can be things like:

  • Advanced Excel techniques
  • Open Refine
  • SQL

In your lightening talk, we'd like you to walk us through a problem you were able to solve with that particular tool. There may be many of us with similar needs that can benefit from seeing what kinds of situations these tools are used for already. If you need to bring a laptop so you can properly demonstrate the tool, let us know and we can accommodate.

To sign up to give a lightning talk, just send an email (to Jasmine or Marcie) and we'll get you set up. You can sign up as late as the day of. We will also give some time for discussion and questions/answers after the talks. This meeting will be followed by a questionnaire about the tools people are most interested in having working session and workshops during the next year. We will also being starting a coding club where people can practice the tools we are learning and ask questions in a low-pressure, no judgement atmosphere.

The session is open to everyone, and we hope to see everyone September 15th at 1 PM! All are welcome.


Peter MartinezOpenURL ViewerAllows e-Resources staff view OpenURL fields and data in a table. Helps in troubleshooting problems with links to online resources.
Peter MartinezKaltura bulk load tool (Excel)Creating an XML file for batch uploading records in Kaltura.
Dianne and Tre Digitization requirements and CULAR
Erin FaulderOpenRefineCleaning dates in EAD
DianneScripting tools"Match up a bazillion filenames"

Find here a list of Metadata Working Group dates for the 2017-2018 year for planning purposes.

This year, we're keeping the approach we started last year. This means workshops, working sessions, and we hope, deliverables from the Metadata Working Group in this academic year. So many of these meetings will be used for workshops or working session style events.

We'll start the year with a lightening session of the tools people are using to work with their metadata across CUL. We will then send out a questionnaire about the tools people are most interested in learning for the coming year and plan working sessions for the tools with the most popular rankings

We are going to try to keep to the schedule of the 3rd Friday of every month, but due to the nature of the workshops dates, times, and locations may vary.

  • Friday, September 15th (Kroch Library 2B48 RMC Lecture Room, 1:00-2:30 PM)

  • Friday, October 20th

  • Friday, December 8th, Open Refine Workshop (Mann Library B30A, 9AM-12PM)

  • Wednesday, January 17th, Excel Workshop (Uris B05, 9AM-12PM)

  • Friday, February 9th, Open Refine Workshop on Regular Expressions (Mann Library Stone Classroom, 10AM-12PM)

  • Friday, March 16, Adobe Bridge (Cancelled - to be rescheduled)
  • Tuesday, April 24, Excel Workshop 2: Vlookup and Pivot Tables (Mann Library Stone Classroom, 10AM-12PM )
  • Friday, May 18th, Lightening Talks: How I have used the workshops this year (RMC Lecture Room, 10:00-11:30PM)



  • Coding Club - every other Thursday, starting January 4th (Olin 701, 12-2PM)
  • Who: Karen Estlund, the Associate Dean for Technology and Digital Strategies for Penn State Libraries.
  • What: The final Metadata Working Group meeting of the semester!
  • Karen will start by speaking about the work she is doing at Penn State and the wider library world with time for questions.

    And since this year's MWG agenda has focused on training and hands-on work, Karen has also graciously offered to lead an activity in which everyone can participate. So grab a marker and some post-it notes and get ready to model some data.

  • When: Thursday, May 11th, 11:00am-12:30pm (Note the time change since we realized it was Slope Day)
  • Where: Mann 160
Metadata Working Group Tor Session

Thanks to all that made it out to our informal session on Tor. 

Here is the link to our slides for the Tor Session:

More information on our May 11th session with Karen Estlund coming shortly on our blog.


Metadata Assessment Workshop (31 Mar 17)

Thank you everyone for signing up for today’s Metadata Assessment Workshop. 

We will be meeting in Uris Lib B05 Classroom from 1 to 2:30 PM today. This classroom has computers available, and we will be walking people through setting up the hosted options for OpenRefine and Python/Bash at the beginning (so you don’t have to bring your own computer, though please do if you can). 

If you were unable to register but want to attend, please email user-9f226 or Marcie Suzanne Farwell to check first (we'll try to respond ASAP). We ask that for this particular session, people do not just drop in.

Minimal Setup 

If you have 10 minutes this morning and will be using the hosted options for the tools, please sign up for (free or 1 month test) accounts at Python Anywhere (, choose the Free beginner account) and RefinePro (which offers each new account a free month trial period): We will alert you to these options at the beginning of the workshop as well if you don’t have time. Do not worry about any other setup, we will do those at the event. 

If you are bringing your own laptop, you need Python 2.7  Pip (usually included with 2.7), and OpenRefine 2.7rc1 (rc1 recommended, rc2 or 2.6 *should* both also work) installed. If you have any trouble installing these, just use the hosted versions mentioned above for now, and we can chat at the end of the workshop about how to get your laptop set up for this work later on.


We have a short time to cover a meaty topic, so this should be treated as an introduction to 2 methods for doing this work for jumping off in your own daily practice.

  • Introduction / Setup Help : 10 minutes
  • Metrics for Metadata Assessment : 10 minutes
  • OpenRefine for Metadata Assessment : 30 minutes
    • Overview of OpenRefine
    • Loading a File
    • Facets
    • GREL or Google Refine Expression Language
    • Using Regular Expressions
    • Completeness Rankings
    • Export Reports
  • Python for Metadata Assessment : 30 minutes
    • Overview of Python MetadataBreaker Scripts
    • Harvesting Metadata
    • General Report
    • Looking at a Specific Field
    • Using SORT, UNIQ, GREP, Regular Expressions
    • Export Reports
  • Wrap up / Next Steps : 10 minutes

Sample Data

I’ve gotten requests to work with the following data sources:

  • eCommons
  • Fedora
  • FGDC
  • MARC
  • Solr
  • Shared shelf

The Cornell University Library Metadata Working Group is hosting a Data Scripting Bootcamp on February 20th, from 10:00 AM-4:00 PM with hour for lunch. This will be in Mann Library B30A, and we request that participants register here.

Note: this Bootcamp is meant to cover a number of (meta)data scripting topics in a short amount of time, with the focus being on learning through practical use. You won’t be an expert in these areas by the end of this bootcamp; you won’t even be entirely comfortable with many of these concepts at the end of the bootcamp, and this day won’t cover everything.

Instead, this bootcamp is meant to be a way for you to jump in with examples of metadata work you could script in hand. It hopes to give some preliminary preparation for working with metadata tools and scripts that we dive into further detail in other MWG workshops this Spring.

As such, we strongly recommend you register here and provide use with examples of workflows, use cases, or data you’d like to work with using the skills in this workshop. We will try to work your examples into this day!

Proposed Outcomes for Participants:

  • Know basics of data structures with particular relevance for library metadata work at Cornell;
  • Know basics of working in a Command Line Interface with bash/shell and examples of working with metadata in this environment;
  • Know basics of SQL, with examples and use cases for metadata queries and updates in particular;
  • Be able to run simple scripts and programs, understanding how this works with relevance for metadata work;
  • Know basics of Git and version control, be able to create a repo, push a change, check version of a dataset, etc.

Gratitude & License

This workshop materials reuses open curricula from the Library Carpentry project. We have updated lessons with specific examples for the Cornell University Library Metadata Working Group examples and data needs. We are not endorsed by Software, Data, or Library Carpentry.

We’ve also added curriculum materials influenced by similar introduction to scripting workshops at Library Technology conferences, so we would like to give a special thanks to Patrick Hochstenbach (Ghent University Library) & Johann Rolschewski (Berlin State Library) for their open and shared work.

As such, this workshop material is licensed under a Creative Commons Attribution 4.0 International License.

Bootcamp Sessions

Note: These times may shift. If you intend to only go to one session, please let us know by registering here. That way, we can alert you to time shifts if/when they occur for the session of interest to you.

SessionTimeDescriptionLink to Materials
Data Structures & Models10 - 11 AMBrief introduction to data structures and models for the metadata worked with in the workshopTBD
*nix Command Line Interface11 - 12:30 PMOverview of the CLI for running simple data scripts. Will introduce some basic bash scripts, as well as how to run existing Python scripts (and begin to debug errors) for analyzing metadata. YOU WON’T LEARN PYTHON, but how to run Python scripts.TBD
Git for Metadata at CUL1:30 - 3 PMUsing Git and GitHub for metadata work run previously.TBD
SQL / MySQL for CUL DATA3 - 4 PMBuilt off the previous, learn the basics of SQL - what it does and how to run basic queries or updates on metadata managed or used through out this workshop.TBD

We have a Metadata Working Group session scheduled for today, Friday, January 20th from 1 to 2:30 PM In Uris Library B05 Classroom. 

This working session will focus on Metadata Application Profiles. We will first spend some time exploring what they are, how they are specified, and what purposes they serve. Then we will break into a working period where participants can write, update, or get help with their own metadata application profiles. This work will then be captured in the Metadata Working Group wiki, serving as an effort to support more shared understanding and interoperability among the many different metadata stores, needs, and methods used in the Cornell University Library systems.

Want to know more about what Metadata Application Profiles are before arriving? Wikipedia gives a decent stub on the topic: Metadata Application Profiles are metadata specifications attached (sometimes loosely, sometimes tightly) to a particular application - datastore, repository, management system, discovery indexing layer, or other. It helps communicate expectations and opportunities for cross-collaboration. It helps connect implementations to desired models and standards, as well as document where we need to diverge from community standards.

Metadata Application Profiles can touch on descriptive, technical, administrative, structural or other (or a mix of all of the above) metadata. A popular metadata application profile example is the Digital Public Library of America Profile: which has a machine-actionable representation here:

Participants only need to bring with them some curiosity on the topic, though we hope you will bring a particular example of either a metadata application profile you’d like to improve, or a workflow/system/application you’d like to generate a metadata application for. We’ll provide you with the concept, a starter metadata application profile template, examples, and guidance.

Due to the working nature of this session, we will not be recording or streaming this via Webex/Zoom, but we will aim to capture outputs and discussion points (including a list of resources) in the Metadata Working Group public wiki - see the Working Documents here: MWG Working Documents

This meeting is open to everyone. Although is not a required pre-requisite for the day-long training in February, it is strongly recommended for those less familiarity on the subject.

Hi all-

We are happy to announce our Metadata Working Group schedule for this January-May 2017. Please save the dates on your calendars and plan to take part if able! If you see a working session or workshop you would also like to help facilitate, email Christina or Marcie.

Note that these sessions reflect the shift of the Metadata Working Group this year towards working sessions and workshops instead of a series of external speakers. We hope to build a network of expertise and resources for supporting metadata work broadly in the Cornell University Library system.


Metadata Working Group Steering Committee

Working Session on Metadata Application Profiles: January 20th, 1:00-2:30 PM

  • Format: Interactive Working Session
  • Place: Uris Library B05 Classroom
  • Proposed Outcomes for Participants

    • Know what a metadata application profile (MAP) is and what purposes it serves
    • See examples of MAPs from Cornell Library work areas
    • Work in groups or alone to generate a MAP for specific systems, workflows, or applications
    • Expand understanding of the various uses and needs for metadata across CUL

Library Carpentry Workshop - Shell, Data Structures, Git, SQL: February 20th, 10:00 AM-4:00 PM with hour for lunch

  • Format: All-day Workshop
  • Place: Mann Library B30A
  • Proposed Outcomes for Participants

    • Know basics of data structures with particular relevance for library metadata
    • Know basics of working in a Command Line Interface with bash/shell and examples of doing so with metadata
    • Know basics of SQL, with examples and use cases for metadata in particular
    • Be able to run simple scripts and programs, understanding how this works with relevance for metadata work
    • Know basics of Git and version control, be able to create a repo, push a change, check version of a dataset, etc.

Python & OpenRefine for Metadata Analysis Workshop: March 31st, 1:00-2:30 PM (NOTE THE DATE CHANGE FROM MARCH 17TH)

  • Format: Workshop
  • Place: Uris Library Classroom B05.
  • Registration: We’re asking you to register so we can get a headcount and best prepare this workshop. Register here: Anyone and everyone with an interest is welcome to attend.
  • Proposed Outcomes for Participants
    • Understand metrics by which to measure or assess metadata (completeness, quality, usage/frequency, ...)

    • Understand basic structure of Python scripts available for measuring metadata from various repositories and stores at Cornell

    • Understand how to use OpenRefine to assess state of datasets with particular preference for assessing metadata

      • facets

      • GREL lookups

      • completeness percentage / generation

      • Other Metadata QA metrics as defined/requested

Tor & Library Data Privacy Informal Presentation & Discussion: April 14th, 1:00-2:00 PM (NOTE THIS REPLACES THE ORIGINAL APRIL 21ST WORKSHOP)

Does it happen to you that when you tell your non-library friends you work with metadata, they leap to their understanding of what they’ve heard about metadata collected by institutions such as the NSA? (It happens to me) For our MWG April Session, we’re going to go ahead and explore that possible connection by having an informal presentation (given by someone who is admittedly not a Tor expert, just an enthusiast) and discussion on Internet Privacy and the Tor technology suite. We welcome anyone with an interest in the subject and any technical background. This will replace our originally scheduled April 21st workshop, which we can no longer hold this spring.

  • Format: Informal presentation and guided discussion
  • Place: Olin Library Classroom 106
  • Registration: No registration required. Show up as you’re able and interested.
  • Proposed Outcomes for Participants
    • Better understand the recent calls for increased privacy and security while online
    • Learn about projects such as the Library Freedom Project or Organizations such as the Electronic Frontier Foundation 
    • Understand how Tor Browsers and Tor Relays work in creating the ability to anonymously browse the Internet
    • Discuss where and how we see Cornell Library taking part in this privacy work and supporting efforts like Tor, EFF, LFP, etc.

Presentation by Invited Speaker Karen Estlund: May 11th, 11:00-12:30 PM

  • Format: External Expert Presentation
  • Place: Mann 160
  • Invited Speaker: Karen Estlund, the Associate Dean for Technology and Digital Strategies for Penn State Libraries.
  • Karen will be speaking about the work she is doing at Penn State followed by an activity everyone can participate in on data modeling.