You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Next »

Google Search Appliance at Cornell

According to Uncle Ezra, in March 2006 Cornell replaced the Inktomi search engine it had been using for University website searching with a Google Search Appliance. The Google Search Appliance (GSA) is administered by the Office of Web Communications. The GSA now supports the search you see on the Cornell Identity Banner on most Cornell University web pages.

The Office of Web Communications has provided instructions for modifying the Identity Banner search dialog to limit results to a single domain. This is the Unit Search capability. This could work for searching domains ending in 'library.cornell.edu', but many of the Cornell Universiy Library digital collections have different domain names. 

The Office of Web Communications also allows orgainzations to create a 'Google Search Appliance Collection'. This Collection is a list of the domains and paths you want to search. You maintain the list using an administrator interface that lets you add and remove items, provides statistics, and allows you to tell the indexing robots to check a certain domains in the collection. You can then point your search dialog at the indexes in your collection so only the paths in your collection will be searched.

Cornell University Library Websites Google Search Appliance Collection 

In September 2006 I asked Lisa Cameron-Norfleet at The Office of Web Communications to set up a Google Search Appliance Collection that I could use for searching Cornell University Library websites. She kindly (and quickly) created the collection and told me how to get in to the administrator interface. I added all the Cornell Universiy Library digital collections , library.cornell.edu, mannlib.cornell.edu, and a few other library websites. Here are the domains and paths currently in the collection.

I then pointed the search dialogs on several websites at this collection, and used some special code to display the search results. Here are the search pages that use the Google Search Appliance to search Cornell University Library websites:

What you can find with the Cornell Library GSA Collection

Statistics from Library Collection

Mime Types

Number of Files

Average Size

Total Size

Minimum Size

Maximum Size

application/octet-stream

979191

3

3306285

0

49421

text/html

344825

2666

9.19E+08

146

683541

text/plain

4671

9171

42838262

230

849341

application/postscript

500

15872

7936041

1096

97789

application/pdf

21812

21572

4.71E+08

149

508500

application/x-shockwave-flash

23

945

21745

244

5140

http://www.google.com/enterprise/gsa/index.html

Google's page describing the device

Cornell Google Search Appliance web page

Example Searches 

Search for this at Cornell search page: ?????? [Confluence does not show the Japanese!!!]

 Here is the search link for the GSA Collection Search.


 
Maybe http://www.digitalhimalaya.com/ should be included in the collection:

The Oxford Bön Project library gateway search 

The Oxford Bön Project Google Search Appliance of CUL sites search 

Full web search for The Oxford Bön Project 


Search for 'library hours' to find which libraries are searched:
GSA Collection Search for library hours 

Library Gateway Search for library hours 

 Extra

I'm not sure what the 'Search Library Pages' link on the Library Gateway page is doing - it finds things in 'library.cornell.edu' and 'mannlib.cornell.edu' and 'www.ilr.cornell.edu/library/catherwood/'.

  • No labels