You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 59 Next »

Preparation

This page is a companion to the Workshop for select graduate students on the HathiTrust Research Center (HTRC) Portal 4/8/2014.
Please do bring a laptop! If you have one, bring your own. If you do not have one, please feel free to

Unknown macro: {menulink}

check one out at the Olin circulation desk

Unknown macro: {menuicon}

. Having a laptop will allow you to participate in the exercises and get the most out of class.
No special software will be needed. All exercises will be done through a Web browser, without any special plugins.
You will need to obtain credentials to log in. Please follow the steps on the

Unknown macro: {menulink}

HTRC Getting Started FAQ

Unknown macro: {menuicon}

for getting started with the Production Portal.

Agenda

The class is a guided exploration in the HTRC portal. We will explore the algorithms and use them to discover the capabilities of the algorithms, their limitations, and various strategies for addressing those challenges. This allows us to explore the HTRC portal in specific, and grapple first hand with basic issues encountered in computational analysis of text.

Support tips for exercises

  • Book mark this page for handy access. We will be referring to it at points in the workshop.

  • Log on to the
    Unknown macro: {menulink}

    HTRC Production Portal

    Unknown macro: {menuicon}
    with your personal credentials.
    • Once signed in, click on "Algorithms" in the black navigational bar at the top of the page. Once you have done so, we will be ready to begin.

  • We are using three collections that I made for this class
    • ShakespeareComedies@MPaolillo - 58 dramas authored by William Shakespeare with MARC 655 field denoting "Tragedies."
    • ShakespeareTragedies@MPaolillo - 54 dramas authored by William Shakespeare with MARC 655 field denoting "Comedies."
    • ShakespearePlays@MPaolillo - A larger CSV formatted collection consisting of the contents of both of the other two collections.

  • Use
    Unknown macro: {menulink}

    this page

    Unknown macro: {menuicon}
    when referencing custom stop word lists.

  • There may be a bug that prevents display of the results of the algorithm "Meandre_OpenNLP_Date_Entities_To_Simile". You can display locally by following this fix:
    • Download the date_entity_simile.html to your machine and save.
    • Open with a text editor and change this line
      <script src="https://htrc2.pti.indiana.edu/HTRC-UI-Portal2/js/timeline-api.js" type="text/javascript"></script>
      to this line
      <script src="http://api.simile-widgets.org/timeline/2.3.1/timeline-api.js" type="text/javascript"></script>
    • open with a web browser and the data should display. (Firefox works best with Simile.)

  • Worksets can be managed through the
    Unknown macro: {menulink}

    Production Portal's Blacklight instance

    Unknown macro: {menuicon}
    . You can also access this by using the "Create Workset" link. Regardless of the avenue of entrance, you will have to log in a second time with the same credentials you are using in the Production Portal.

Resources

  • HathiTrust is:
    • an international partnership of over 80 institutions.
    • a digital library containing over 11 million books, 33% of which are in the public domain. All items are fully indexed, allowing for full text search within all volumes. You can login with your Cornell NetID to
      • Create Collections (public or private)
      • Download PDF's of any item available in full text
    • a trustworthy preservation repository providing long-term stewardship, redundant robust backup, continuous monitoring, persistent identifiers for all content
    • where Cornell University Library deposits books it digitizes at scale.

  • Unknown macro: {menulink}

    HathiTrust Research Center (HTRC)

    Unknown macro: {menuicon}
    - a collaborative research center (jointly managed by Indiana University and the University of Illinois) dedicated to developing cutting-edge software tools and cyberinfrastructure that enable advanced computational access to large amounts of digital text.
    • HTRC Production Portal - a web-based user experience of the HTRC. The production portal makes available all the full-text indexes of the Google-digitized deposits to HathiTrust that are in the public domain.
    • HTRC User Community Wiki - home of the user support documentation, meeting notes, elist addresses and sign-up information, and FAQs.
  • No labels