Excerpt |
---|
The new Matrix is faster, but it is different. Learn about the differences here to reduce your aggrevationaggravation. |
Table of Contents |
---|
TIP: This is an easy-to-remember web address to this page:
(1) Information related to Phase 2 Matrix testing (Full Scheraga group)
End-user reporting ticket number is INC000001223799.
|
---|
How is the new
...
2014 Matrix different than the old Matrix?
- Graphical representation comparing the old Matrix to the new Matrix, for researchers.
- Text containing full Full details on the new Matrix configurations. For both researchers and support staff.
- Matrix Partitioning Configuration 2014-09-15.pdf
- Includes conventions, intentions, and associated technically-imposed limitations.
End-user reporting ticket number is INC...
...
Work to be done during new Matrix testing: Who is responsible for what?
Each researcher must confirm they can access their test account on the new Matrix.
If you have trouble connecting to the new Matrix, notify ChemIT ASAP. Thank you.
- In any communication to ChemIT during testing, include this number in the subject line. Thank you.
Each researcher must confirm that the applications they depend on work for them on the new Matrix.
If and when you can confirm that all works as expected, please notify ChemIT.
=> Notify ChemIT and include the
...
INC000001223799 ticket number in the email. Thank you!
If you have a problem with an application
...
or MPI issues, please contact the appropriate Group contact.
=> Try working it out by yourself.
- You should understand how and why your application works.
- Almost all problems are not the application itself. Instead, it's almost always how a researcher has their scripts configured to use an application.
=> If you can't fix it yourself, please contact the appropriate group member; see below for who to contact.
- Do not contact ChemIT for this, please.
Researchers in Poland should contact Czarek. Follow his instructions, please.
- To resolve your problem, Czarek may then need to then work with ChemIT staff.
All others should contact the group member assigned as lead tester for
...
the application they are having problems with. Follow their instructions
...
, please.
- Matrix end-user application information
- To resolve your problem, a lead tester may need to work with ChemIT staff.
Deadlines for researchers' testing
DATE 1: Monday, 10/19/14: By this date, all researchers are expected to at least simply login (verifying that their account itself works).
- KEEP? Email us once you are in and working by replying to this message using your Cornell account and keep the subject & tracking number.
DATE 2: Thursday, 10/22/14: By this date, all researchers will need to test and either approve, or report problems with, the software they depend on.
- Report details of any questions or problems you find to the appropriate researcher.
- See section above, "Each researcher must confirm that the applications they depend on work for them on the new Matrix.", for who to report to.
DATE 3: Monday, 10/27/14: By this date we expect to have all testing completed, as long as everything is indeed working as expected.
- Any delays in your testing will extend the project schedule.
Testing schedule activities by ChemIT staff
What's been done to date:
- The system, with all requested software, has been installed.
- All hardware has been tested.
- All applications have been tested by group researchers designated as application leads.
Upcoming scheduled events (dove-tailing with above deadlines for researchers)
- Thursday, Oct. 15: ChemIT makes a copy of the end-user's home directory (on the old Matrix) onto
- Link to chart of s/w to group tester.
Notes, to get started:
...
- the end-user's storage directory (on the new Matrix)
...
- , for testing purposes.
- Thursday, Oct. 16: End-user testing starts.
- Friday, Oct 31: ChemIT deletes copy of end-user's storage (on the new Matrix). ChemiT retains copy of end-user's home directory (on the new Matrix).
- See section below, "Important notes in what data is erased. And what data is saved."
- Monday, November 3 - old Matrix goes off-line to begin converting all nodes and transfer user data to new cluster.
Notes to get you started and keep you going
During testing, the new and an old cluster are both available to you at the same time.
QuickStart info
- Thus, you will be able to log into two different home directories, within two completely different systems:
SSH to the new (Test) system at matrixtest.scheraga.chem.cornell.edu
- The old system remains at scheraga.chem.cornell.edu
- Your account ID and password on the new Matrix are the same as the old Matrix.
- See section below, "Get access to your account", within the "(2) Information relevant for testing and after testing" section.
- Continue to use the old Matrix cluster for your production research work.
- Use the new cluster ONLY to confirm it will work for you once we cut-over from the old Matrix. Do not use it for production research.
- Interruptions to the new Matrix during this testing phase may occur at any time, with no advanced warning.
Detailed info, particularly if QuickStart above is insufficient
Elaborate on "We will provide a second email with instructions for getting to the new cluster and setting up your home directory, and how storage is set up."
Particulars you need to know about new Matrix cluster, for testing:
Your home directories, "/users/netID", are basically EMPTY of your files to start.
- Selectively choose (from storage) just the files you need for running jobs.
- For your convenience, ChemIT has pre-configured your home directory by copying your .ssh/ directory and your .tcshrc file from your /storage directory into your /home directory.
A recent snapshot copy of your data from old Matrix has been put on the storage system in "/storage/netID"
All researchers most therefore move or copy items from their storage directory into their home directory, as they see fit.
...
.
- Storage is where non-actively used, Scheraga-related, research files belong.
...
- should be saved.
- In production (NOT DURING TESTING!), you will move results and data you want to save long term back to storage
- REMEMBER: All date in "/storage/netID" will be deleted after this testing phase.
- This deletion will allow us to move your current, production data from the old Matrix, when we cut-over.
- REMEMBER: All date in "/storage/netID" will be deleted after this testing phase.
- Phase One researchers only: Your /storage/netID has been carried forward from your initial testing.
- REMINDER: It, too, will be deleted before the cut-over from the old Matrix.
You thus need to move or copy files needed to run jobs from storage to your new home "/users/netID" to test jobs.
- Leave files in storage which you don't need for your jobs.
- Keeping your home directory small aids in disaster recovery for the entire group!
ChemIT will maintain a chart to help the group track what does and does not work regarding MPI.
Other details
- See the storage chart for more details on disks & partitions in the new cluster, and how that compares to the old Matrix.
- All research applications are installed under “/software”.
- Get instantly more space, for temporary use, in your home directory, by using the "/notbackedup" disk.
- See section above, "How is the new Fall 2014 Matrix different than the old Matrix?", for links to more details.
- Get info on the nodes on the new Matrix which are available during testing.
- See chart with full details:
- Summary info, true during Phase Two testing:
- Initially about 17 nodes will be made available to all researchers.
- This represents the newly purchased nodes and two of each of 5 types of old node.
- There are 8 new compute nodes (m108-m115) accessible. Each new node has 20 cores available.
- For testing, select old Matrix nodes have been moved to the new Matrix to ensure adequate computing capacity during testing.
- At this point, 1 or 2 of GPU nodes will be running as just CPU nodes. GPU functions will be added at a later date if possible.
- Initially about 17 nodes will be made available to all researchers.
- See chart with full details:
- Queuing information
- No Express queue available for testing.
- An Express queue will be provisioned when new Matrix is in production.
- The default queue is dque instead of express as no node has been assigned to be available for Express queue.
- No Express queue available for testing.
Important notes in WHAT DATA WILL BE ERASED AFTER TESTING. And what data is saved after testing.
ChemIT will save all end-user's home directories as they build them up on the new Matrix.
- This is to preserve all researcher's investments in getting their research files to work on the new Matrix.
ChemIT will erase all end-user's storage directories on the new Matrix, at end of testing.
- On the day we cut-over from the old Matrix to the new Matrix, ChemIT will make a final copy of each user's old Matrix home directories directory into their new Matrix storage directoriesdirectory.
Timeline for testing
??: ChemIT makes a copy of the ChemIT will create a snap-shot copy of each end-user's current home directory (on from the current production Matrix) onto and place it into the end-user's storage directory (on the new Matrix), for testing purposes.
Thursday, 10/16: End-user testing starts.
??: All end-users should have logged in and started to set up their new home directories.
??: End-user testing ends.
.
(2) Information relevant for testing and after testing
Overall information
- Text containing full details on the new Matrix configurations. For researchers and support staff.
- Includes conventions, intentions, and associated technically-imposed limitations.
- Before production, migrate from above section, "How is the new Fall 2014 Matrix different than the old Matrix?", for links to more details, once those are updated
Get access to your account
- ssh to:
- During test:
matrixtest.scheraga.chem.cornell.edu
- When new Matrix is in production, use the new cname:
- matrix.chem.cornell.edu
- N.B. We were going to use the same cname as was used for production before (scheraga.chem.cornell.edu).
- During test:
- Use your username and your Matrix-specific password.
Learn to effectively use your home directory, along with /storage, /notbackedup, and /software.
- Your home directory is "/users/netID".
- In this location, just retain the files you need for running jobs.
- Place files into storage which you don't need for your current jobs.
- KEY: Keeping your home directory small aids in disaster recovery for the entire group.
- Get instantly more space, for temporary use, in your home directory, by using the "/notbackedup" temporary disk.
- See section below, "How to run jobs requiring lots of space in ones home directory".
- Your storage directory is "/storage/netID".
- Storage is where non-active, Scheraga-related, research files belong.
- In production (NOT DURING TESTING!), you will move results and data you want to save long term back to storage
- REMEMBER: All date in "/storage/netID" will be deleted after this testing phase.
- This deletion will allow us to move your current, production data from the old Matrix, when we cut-over.
- All research applications are installed under “/software”.
How to run jobs requiring lots of space in ones home directory
- (to be added) purpose and use of /notbackedup partition to expand a user's effective home directory's storage capacity.
- If using the /notbackedup is not adequate, researcher can explain this and request a larger quota through their Group Cluster Representative (Gia, as of 10/2014)
Matrix user quotas
New user's
...
default quotas
The group has instructed ChemIT to make the following defaults on a new user's account:
- 50GB 10GB /home
- 50GB /storage
A new user can obtain request a larger quotas quota through their Group Cluster Representative , Gia. (Gia, as of 10/2014.)