Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Book mark this page for handy access. We will be referring to it at points in the workshop.
  • Log on to the HTRC Production Portal with your personal credentials.

    • Once signed in, click on "Algorithms" in the black navigational bar at the top of the page. Once you have done so, we will be ready to begin.
  • Worksets can be managed through the Production Portal's Blacklight instance. You can also access this by using the "Create Workset" link. Regardless of the avenue of entrance, you will have to log in a second time with the same credentials you are using in the Production Portal.

  • I've made three collections for your use in this class. (You are not obligated to use these; you may want to use collections of your own in the exercises.)
    • ShakespeareComedies@MPaolillo - 58 dramas authored by William Shakespeare with MARC 655 field denoting "Tragedies."
    • ShakespeareTragedies@MPaolillo - 54 dramas authored by William Shakespeare with MARC 655 field denoting "Comedies."
    • ShakespearePlays@MPaolillo - A larger CSV formatted collection consisting of the contents of both of the other two collections.
  • Use

    Wiki Markup
    {menuicon:elements1}
    this page when referencing custom stop word lists. I can post lists relevant to your collections if you have them.

  • There may be a bug that prevents display of the results of the algorithm "Meandre_OpenNLP_Date_Entities_To_Simile". You can display locally by following this fix:
    • Download the date_entity_simile.html to your machine and save.
    • Open with a text editor and change this line

      No Format
      <script src="https://htrc2.pti.indiana.edu/HTRC-UI-Portal2/js/timeline-api.js" type="text/javascript"></script>

      to this line

      No Format
      <script src="http://api.simile-widgets.org/timeline/2.3.1/timeline-api.js" type="text/javascript"></script>
    • Save the html file and open with a web browser and the data should display. (Firefox works best with Simile.)

Resources

  • Wiki Markup
    {menuicon:elements1}
    HathiTrust is:

    • an international

      Wiki Markup
      {menuicon:elements1}
      partnership of over 80 100 institutions.

    • a

      Wiki Markup
      {menuicon:elements1}
      digital library containing over 11 13 million books, 33% 38% of which are in the public domain. All items are fully indexed, allowing for full text search within all volumes. You can login with your Cornell NetID to

      • Create Collections (public or private)
      • Download PDF’s of any item available in full text
    • a trustworthy

      Wiki Markup
      {menuicon:elements1}
      trustworthy preservation repository providing long-term stewardship, redundant robust backup, continuous monitoring, persistent identifiers for all content

    • where Cornell University Library deposits books it digitizes at scale.
  • Wiki Markup
    {menuicon:elements1}
    - HathiTrust Research Center (HTRC) - a collaborative research center (jointly managed by Indiana University and the University of Illinois) dedicated to developing cutting-edge software tools and cyberinfrastructure that enable advanced computational access to large amounts of digital text.

    • Wiki Markup
      {menuicon:elements1}
      HTRC Production Portal - a web-based user experience of the HTRC. The production portal makes available all the full-text indexes of the Google-digitized deposits to HathiTrust that are in the public domain.

    • Wiki Markup
      {menuicon:elements1}
      HTRC User Community Wiki - home of the user support documentation, meeting notes, elist addresses and sign-up information, and FAQs.

  • A summary of the algorithms is attached below.

    Attachments