Node

Motherboard version

Processor

CoresHyperthreading on?

RAM

Hard Drive

headnodeSupermicro X9DRW-iF v3.2Dual Xeon E5-2620v28N16GB

3ware 9750-4i RAID controller - firmware FH9X 5.12.00.007

RAID1 setup on (2) 3TB drives and back-in-time setup on additional 3TB drive

rh001Asus DSBF-DE v1006Dual Xeon E54208N16GB1TB
rh002Asus DSBF-DE v1006Dual Xeon E54208N16GB1.5TB
rh003Asus DSBF-DE v1006Dual Xeon E54208N12GB1.5TB
rh004Supermicro Z8PE-D12 v1401Dual Xeon E55208N12GB1.5TB
rh005Supermicro Z8PE-D12 v1401Dual Xeon E55208N12GB1.5TB
rh006Supermicro Z8PE-D12 v1401Dual Xeon E55208N12GB1.5TB
rh007Supermicro Z8PE-D12 v1401Dual Xeon E55208N12GB1.5TB
rh008Supermicro Z8PE-D12 v1401Dual Xeon E55208N12GB1.5TB
rh009Supermicro Z8PE-D12 v1401Dual Xeon E55208N12GB1.5TB
rh010Supermicro Z8PE-D12 v1401Dual Xeon E55208N12GB1.5TB

rh011 -

previous HN

??????
rh012Supermicro X9DRT-HF v3.3Dual Xeon E5-2620v212N32GB2TB
rh013Supermicro X9DRT-HF v3.3Dual Xeon E5-2620v212N32GB2TB
rh014Supermicro X9DRT-HF v3.3Dual Xeon E5-2620v212N32GB2TB
rh015Supermicro X9DRT-HF v3.3Dual Xeon E5-2620v212N32GB2TB
rh016Supermicro X9DRT-HF v3.3Dual Xeon E5-2620v212N32GB2TB
rh017Supermicro X9DRT-HF v3.3Dual Xeon E5-2620v212N32GB2TB
rh018Supermicro X9DRT-HF v3.3Dual Xeon E5-2620v212N32GB2TB
rh019Supermicro X9DRT-HF v3.3Dual Xeon E5-2620v212N32GB2TB
rh020Supermicro X10DRT-H v1.0aDual Xeon E5-2630v316N64GB500GB
rh021Supermicro X10DRT-H v1.0aDual Xeon E5-2630v316N64GB500GB
rh022Supermicro X10DRT-H v1.0aDual Xeon E5-2630v316N64GB500GB
rh023Supermicro X10DRT-H v1.0aDual Xeon E5-2630v316N64GB500GB
rh024Supermicro X10DRT-H v1.0aDual Xeon E5-2630v316N64GB500GB
rh025Supermicro X10DRT-H v1.0aDual Xeon E5-2630v316N64GB500GB
rh026Supermicro X10DRT-H v1.0aDual Xeon E5-2630v316N64GB500GB
rh027Supermicro X10DRT-H v2.0Dual Xeon E5-2630v316N64GB500GB
ApplicationsVersion
adf2012.01 & 2012.10.29 & 2013.01 & 2014.04ls
firefly71g & 8_beta_linux_openmpi_1.4
gaussiang03-E.01 & g09-A.02 & g09-C.01 & g09-D.01
gv508
intel compiler2015.0.090
mathematica10.2.0 (removed 8/8/2016)
mopac2009
mpich1.2.7.p1
mpich21.4.1p1
openmpi1.4.4 & 1.6.5 & 1.8.4
phonopy1.6.1
vasp5.3.5
vmd1.8.7
yaehmop3.0.3

 

To check sol head node hard drives status, go:

https://192.168.255.100:888

with Administrator as log in name, password is root password

 

Maintenance records:


3/8/2016: Lulu: No errors detected on hard drive checks. No security updates available via YUM. FSCK run and no errors.


3/8/2016: updated BIOS on rh027 to v2.0 from v1.0a, ran into serious problems, did not do others in same series but did do the IPMI firmware updates on them, updated rh012-rh019 from v3.0 to v3.2, ran out of time to do the BIOS updates on rh004-rh010 - meh26

7/26/2016: Lulu: No firmware update from Michael. There are no security updates available via YUM. Checked all hard drives are fine. Forced fsck, no errors found. User's home data occupied 92% data partition. Hoffmann group users need clean their home directories. Here are four users who had used most of the space:

705386768KB       lgr48

361360488KB       yt443

355886296KB       px32

354069408KB       tz265

10/18/2016: Lulu: No firmware update from Michael. There are no security updates available via YUM. Checked all hard drives are fine. Tried ddimage the root partition to an image file at external hard drive. Use sgdisk to backup disk partition table.

1/17/2017: Lulu: No firmware update from Michael. There are no security updates available via YUM. Checked all hard drives are fine. Forced fsck, no errors found. User's home data occupied 87% data partition. Checked root /usr & /var space, looks OK now.

4/18/2017: Lulu: No firmware update from Michael. There are no security updates available via YUM. Checked all hard drives are fine. User's home data occupied 93% data partition.

7/18/2017: Michael: no need to worry about any firmware on motherboards or RAID controller.

7/18/2017: Lulu: There are about 6 security updates via yum on head node. There are about four security updates on capsule but failed and aborted on updating. We got the version before yum update from Ezbackup and stored in /data/chroot/centos6-6-restore. We will see if we need revert it back to the version before update. Checked all hard drives are fine. Forced fsck,

1/16/2018: Lulu: No security updates via yum on head node. ("No packages needed for security;" - meltdown won't got patched by this) Checked all hard drives are fine. Forced fsck,

4/17/2018: Lulu: No security updates via yum on head node. Checked all hard drives are fine. Forced fsck, root partition was 90% full. Then I cleaned up many server and job logs. Root partition is 69% full.  Cannot access 3ware web interface with error message "The server at 192.168.255.100 is taking too long to respond. 3ware commands working. Need more investigation. 

10/16/2018: Michael did some firmware update on compute nodes. Michael replaced the router with EdgeRouter X. Lulu: No security updates via yum on head node. Checked all hard drives are fine. Forced fsck,  Root partition is 77% full.  Access 3ware web interface without problem.

1/17/2019: Lulu: No security updates via yum on head node. Checked all hard drives are fine. root partition was quite full. I cleaned up many server and job logs. Now root partition is 70% full. 

4/16/2019: Michael:

  1. Update EdgeRouter X from 1.10.7 to 2.0.1
  2. Update headnode BIOS from 3.2 to 3.3

      Lulu: Forced fsck. data is 93% full. rh001 cannot boot.

7/23/2019: Lulu: backintime /dev/sdb drive has I/O errors. I disabled mounting and disabled time machine for now. It has EZbackup now. data is 97% full. rh001 cannot boot.

10/15/2019: Lulu: forced fsck. backintime /dev/sdb drive has I/O errors.rh001 does not boot.

01/22/2020: Lulu: backintime /dev/sdb drive has I/O errors.rh001 does not boot

  • No labels