data flow
  • Raw Data: fastq files as received from Genomics facility or other service provider (sometimes distributed)
  • Intermediate data: ends up as a tar file
  • distributed derived data:  product of intermediate data plus QC input from IGD

Robert Bukowski posted this message on Basecamp.

IGD data in new locations
All the IGD flowcell data is now in the following two locations: 

Tier1: 
cbsudesktop08:/data/GBSFlowCells and cbsudesktop08:/data/GBSFlowCells /IGD_old 

Tier2: 
cbsudesktop08:/IGD_tier2/GBSFlowCells and cbsudesktop08:/IGD_tier2/GBSFlowCells/IGD_old 

The QC results are in the respective QC subdirectories. 

Tier1 storage is attached directly to cbsudesktop08 machine, so any calculations run on that machine can use the files directly (i.e., there is no need to copy them to /workdir, for example). Tier1 location is also available on cbsudesktop08 (as a symlink) as /local/GBSFlowCells/IGD, so any scripts havng this location hardcoded should still work. 

Tier1 is available from minilims as  /mnt/CBSUdesktop08GBS/IGD - this is where the minlimis application developed by Aaron should take the files from. 

Tier2 storage is located on a glusterfs system comprised of two different file servers, separate from cbsudesktop08 and located in the machine room on 7th floor at Rhodes Hall. The directory /IGD_tier2 is network-mounted on cbsudesktop08 for easy access, but it should ot be used in any calculations. 

We are now ready to remove any remaining IGD flowcells from cbsufsrv4 and from minlims:/mnt/data_eq/nextgen/IGD_NEW. Before we do this, IGD folks, please verify that you can access the data in new locations.

  • No labels

1 Comment

    • Raw data to be stored only on cbsudesktop08 and Tier2
    • Analysis data to be stored in cbsudesktop08 and minilims;oved to Tier2 when 6 months old
    • Protection level on cbsudesktop08 is RAID-6