Lancaster group ORCA jobs' some features (like TDDFT) require more memory (For example, with PAL8, Memory needs 35304 MB) . To avoid crashing problem, Users need reduce number parallel cores on nodes with less memory. Please see the attached table for the memory size of each node. kml005, kml006, kml009, kml010 have more memories.
Node | Owner | Motherboard version | Processor | Cores | Hyperthreading on | Memory | Hard drive |
---|---|---|---|---|---|---|---|
headnode | Lancaster | Asus Z8PE-D12 v1201 | Dual E5620 | 8 | No | 24GB | Dual software RAID1 2TB, backintime 2TB |
kml001 | Lancaster | Asus Z8PE-D12 v1201 | Dual E5520 | 8 | No | 24GB | Software raid0 (3) 640GB = 1.8TB |
kml002 | Lancaster | Asus Z8PE-D12 v1201 | Dual E5520 | 8 | No | 24GB | Software raid0 (3) 640GB = 1.8TB |
kml003 | Lancaster | Asus Z8PE-D12 v1201 | Dual E5520 | 8 | No | 24GB | Software raid0 (3) 640GB = 1.8TB |
kml004 | Lancaster | Asus Z8PE-D12 v1201 | Dual E5520 | 8 | No | 20GB* | Software raid0 (3) 640GB = 1.8TB |
kml005 | Lancaster | Asus Z8PE-D12 v1201 | Dual E5620 | 8 | No | 96GB | Software raid0 (4) 640GB = 2.4TB |
kml006 | Lancaster | Asus Z8PE-D12 v1201 | Dual E5620 | 8 | No | 48GB | Software raid0 (4) 640GB = 2.4TB |
kml007 | Lancaster | Asus Z8PE-D12 v1201 | Dual E5620 | 8 | No | 24GB | Software raid0 (3) 640GB = 1.8TB |
kml008 | Lancaster | Asus Z8PE-D12 v1201 | Dual E5520 | 8 | No | 24GB | Software raid0 (3) 640GB = 1.8TB |
kml009 | Crane | Supermicro X9DRT-F v3.2 | Dual E5-2620v2 | 12 | No | 64GB | Single 2TB |
kml010 | Crane | Supermicro X9DRT-F v3.2 | Dual E5-2620v2 | 12 | No | 64GB | Single 2TB |
Maintenance records:
3/21/16: updated kml009 and kml010 to version 3.2 from 3.0 of Supermicro X9DRT-F bios - meh26
3/21/16 Lulu: There is no error found on sda,sdb,sdc. Force fsck; no security update.
8/22/16 Lulu: No firmware update from Michael. There are no security updates available via YUM. Checked all hard drives are fine. Forced fsck, no errors found. I backed up current system to /backup/sysBackup-082216.tgz. I also removed some system crash log to free 3GB system space.
8/22/16 Lulu: No firmware update from Michael. There are no security updates available via YUM. Checked all hard drives are fine. Forced fsck, no errors found. I backed up current system to /backup/sysBackup-082216.tgz
11/16/16 Lulu: No firmware update from Michael. There are no security updates available via YUM. Checked all hard drives are fine. Forced fsck, no errors found. Backedup root partition by ddimage. /var/lib/mlocate/mlocate.db is too big ~5GB, I modified /etc/updatedb.conf to ignore sbgrid and backup then I removed the file mlocate.db and rebuilt it by "updatedb". Lulu yum updated all crane workstations; Michael updated crane synology DSM.
2/15/17 Lulu: No firmware update on cluster. There are no security updates available via YUM on cluster. Checked all hard drives are fine. Forced fsck on head node 2/12 because of power outrage, no errors found. BIOS update on Crane workstations. yum updated all crane workstations; Michael updated crane synology DSM. Michael updated RESE-01 (serving Crane nfs) to enable auto boot after power off and on.
5/17/17 Lulu: No firmware update on cluster. Forced fsck on head node. kml001 sda is failing. kml001 has been turned to offline. BIOS update on Crane workstations. yum updated Crane nfs server(CRAN-19). yum updated all crane workstations; Michael updated crane synology DSM. Michael patched Hyper-V server where NFS server lives.
8/30/17 Lulu: No firmware update on cluster. Forced fsck on head node. BIOS update on Crane workstations. yum updated Crane nfs server(CRAN-19). yum updated all crane workstations; Michael updated crane synology DSM. Michael updates Hyper-V server Windows. as-chm-cran-14 Nvidia card fan is failing, Oliver replaced the card. We also found one monitor (for cran-13) has some issues.
2/21/18 Lulu: No firmware update on cluster. Forced fsck on head node. BIOS update on Crane workstations. yum updated Crane nfs server(CRAN-19). yum updated all crane workstations; Michael updated crane synology DSM (Synology took really long time to boot, it maybe relate with one drive having sectors error). Michael updates Hyper-V server Windows.