You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

This is a public-facing web server critical to NRM's service offerings.

The tolerance for outage of this server is (define, please), per Ivan and Coates.

(Consequence if not available for hours? days? lost data going back how long? etc.)

 

Purpose of this write-up from Oliver, Chemistry IT Manager

This write-up represents an investment by Oliver to help inspire greater clarity in problem resolution by NMR so the right groups work on the problem, and no delays occur by the problem being routed to the wrong people. I also hope the write-up will inspire taking steps to help prevent a crisis when problems to occur.

 

Summary

The owner and manager of this service is Ivan Keresztes <ik54>.

Ivan is fully responsible for this server's function and continued maintenance and operations. As well as any desired enhancements.

That server and its infrastructure resides or depends on resources controlled or otherwise managed by Ivan, not Chemistry IT or CIT.

 

Key recommendation from Chemistry IT

Much of the below recommendations presume the scheduling server is critically important to NMR's service delivery.

Develop break/ fix procedures for Ivan's group which are independent from Ivan so it can serve the group when Ivan is away, if necessary.

Document processes to ensure server software remains patched, while ensuring continued functionality. This would include, especially for a public-facing web server, patching regularly, or upgrading over time, the OS, Apache, Perl, and their associated programs.

Line up, document, and test processes to ensure server is backed up and restorable to an acceptable period of time in the past.

 

Example: Contact CIT to determine if they would be capable and willing to provide expert support services via their fee services. Information available at <http://www.it.cornell.edu/about/atsus/iws/>. Explore if CIT (or other firm) could provide support backup to what Ivan knows about the server's set-up, especially important if he's away and a crisis occurs. Doing so before there are problems can increase the chance of getting expert and rapid responses, as compared with what you will get if waiting for a problem to occur. CIT (or other firm) might also be able to expertly and cost-effectively facilitate adding reasonable security, or functional enhancements, over time.

 

Contextual information

The service runs on an Apache web server running on a Linux server, and depends on files and Perl scripts.

The Linux server is hosted within Amazon Web Services (AWS), via Cornell's contract. This incurs a monthly charge (amount?). The server is managed remotely by Ivan.

N.B. The AWS charges are currently going through CIT. CIT is processing the charge to their account as a favor to Chemistry so we did not have to create an account ourselves with AWS. (This can be changed, if desired.) CIT currently has no other persistent responsibilities or connections to this server.

 

Clarifying CIT's and Chemistry IT's expectations

CIT and Chemistry IT are not responsible for break/ fix of the NMR web scheduler or any of its or related infrastructure.

CIT and Chemistry IT are not responsible for enhancements to the NMR web scheduler or related infrastructure.

 

With Chemistry IT's assistance, CIT did provide a generous amount of free consulting technical expertise, and implementation work and debugging to migrate the server from the extremely old hardware in 248 Baker Lab into the Amazon Web Services (AWS) infrastructure. They ensured correctly configured networking. They also de-bugged the software to ensure it would run correctly on more contemporary software (OS, Apache, and Perl). Migration occurred Tuesday, Oct. 11, 2016, from about 8:45 am to 10-ish. Ivan signed off on migration's success on (date?). (Q: Was the last problem detected on 12/19/16, which then hopefully was subsequently resolved to completion? That incident inspired this write-up since Chemistry IT was contact when the problem with the NMR Scheduler was detected.)

 

Historical notes

Chemistry IT has served as trusted consultants to Ivan regarding this server, through helping to get it migrated to AWS.

Chemistry IT had hosted the hardware in 248 Baker Lab for many years, from before Oliver's arrival. The server had been in B-71 ST Olin before being moved into 248 Baker Lab. Shortly after Oliver's arrival in 2012, Oliver notified Ivan of our group's reluctance to continue hosting the server since we were concerned we would be inadvertently drawn into dealing with a preventable crisis. The risk of the crisis was high since the service was critical to NMR's service delivery and the server's hardware was so very old, as was the software it depended on. When the server was finally migrated off the hardware, that hardware was about 13 years old. Also, as a public-facing web server, we judged that it was unacceptably neglected in terms of best practices as well as practices defined by University policy and expectations. For example, it was not being patched or updated regularly or timely, if at all, against security vulnerabilities. It had been running an OS version from about October 2005 (RHEL 4.2) (Is this correct OS and date?)). Early in Oliver's tenure, Ivan hoped to re-write the scheduler. Those plans fell through over the following 4 years. The continued neglect of the server, and its increased potential for a crisis, kept on growing as the years passed.

  • No labels