You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 62 Next »

Bioteam Pre-Proposal 20120309

IGD-GBS MiniLIMS Proposal v1_2.doc

action items

  • Rob Elshire and IGD generate minimum distribution information standard
  • Theresa defines details of billing dashboard page
  • IGD create lists of pages (data objects and dashboards?)
    • define "page"? I thought you might have meant data objects, but I think not. if you mean "user interface" isn't it better to define in terms of functionality and use case? implementation of "pages". I guess I'm not totally positive what you need from us.
  • Charlotte detail spec for workflow
  • IGD detailed sample submission spec
  • IGD define roles and authorization levels for each page
  • James define APIs for billing and ordering integration with CLC LIMS
    • not a pre-requisite for contract and project launch.

items for discussion

  • pull qc data into minilims and report on it - including possibility of illumina data reports but only possibility of getting this would be Peter's facility, not other providers. fastqc is easy to run and staff can do it themselves easily.
  • IGD concern about maintaining and modifying QC scripts. likely an internal problem rather than bioteam
  • specific discussion with bioteam about workflow implementation per charlotte's detailed spec.

Overview

This is a project to replace and enhance existing laboratory management systems supporting the Institute for Genomic Diversity's Genotyping By Sequencing (GBS) service.

As IGD has become more of a service facility, we need a more cohesive system for accepting samples, tracking samples, and customer billing. Currently these are all completely separate systems, with no linkage between them. We also have no centralized support, as all the web-based systems were designed by different people for different reasons. Pricing table and procedures can be found here http://www.igd.cornell.edu/index.cfm/page/projects/GBS/GBSpricing.htm

GBS Paper in PLoSone: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0019379

Detailed Workflow Spec (slides)

workflow high level overview

  1. Project Initiation
    • Requires personal contact with lab director
    • Charlotte or Sharon sets up project in redmine
  2. Sample submission: client goes to http://sorghumdiversity.maize.cornell.edu/ to get the required steps, and upload their information to attempt to get approved to send DNA samples
  3. Client sends DNA samples to the lab
  4. Client sets up customer registration at https://cores.lifesciences.cornell.edu/userdev/newuser.php
  5. DNA samples tested and if okay, Charlotte sets up a cost issue in redmine so Theresa knows what to bill.
  6. Customer is invoiced
  7. DNA samples begin moving through various lab steps; redmine updated at various steps
  8. SampTrac is used to update HTS database entries and create barcodes for DNA and library plates.
  9. Libraries are submitted for sequencing (sometimes more than once)
  10. Raw Data -> Data Files
    • QC scripts run
    • Pipelines change can be Tassel, or Fei Li's pipeline
  11. Data Interpretation
    • need to know Biology / skilled analyst
  12. Data Distribution
  13. Data storage for X period of time

Current Components

Overview

GBS Database Schema

sample submission interface fields and database schema

QC Scripts

"SampTrack" C# windows program to interface with GBS for setting up pools etc.

redmine project management

  • Project initiation
    •   create project in redmine
  • Individual plate tracking
    • Can note if a plate failed somewhere in the process and link it to the redone plate.
    • Can link billing information so we know if the plate has been paid for
  • I don't like:
    • Having to click through so many screens to get to the next sample in a project
    • Hard to look at samples in groups, like all of the samples on a flowcell

Data Distribution

  • manual
    • either "hapmap table" by email, or dropbox or physical hard drive if client wants raw data

billing

current billing workflow

*invoicing handled through LSCLC cores LIMS

Theresa's comments on GBS billing issues

Misc operations

Vision for new implementations

Staff authentication and authorization

  • Can authentication be accomplished via the CLC lims and then a token be passed to minilims as a form of SSO?
  • It isn't clear from the docs whether built-in roles can be extended or if roles can be given permissions on a per-page or per data object basis.
    • IGD will be preparing a list of roles and map them the list of pages.

Customer Authentication and Authorization

  • customers will log into the CLC lims (as they do for other cores)
  • CLC lims will collect and store payment information (including PCI-DSS scope)
  • customers will be handed off to the sample submission interface in minilims where they will provide GBS specific ordering information
    • implies that no additional authentication is needed for minilims explicitely
  • Data distribution: see data distribution section

Project management

  • not clear if redmine is the right solution, but it has its place now. would need similar functionality, probably with better integration with other components.
  • What we like about redmine-  A really nice way to see the history of a particular plate of DNA or project:
    • Project summary: enzyme to use, contact information, expected sample number, comments for data analysis
    • Project workflow where we can check off status and note if something did not go as expected
    • Links that cross-reference.  I link all costs to plates so I know what has been paid for.  We also can link redo plates with the original preps.
    • Upload submission sheet, quantification, experion files (.pdf)
    • Note payment status
    • Track optimization results and R&D
  • Would be great if we could check in samples are received when we print the barcode and automatically notify sender that they have arrived.

GBS Database Schema

  • May not need full implementation, especially most of "passport" info.
  • should be extended for Stat and QC management. right now the tables for "stat" and "qc" are really just placeholders.

Sample Submission

  • re-implement
  • sample submission change requests
  • suggest implementing as part of LSCLC Cores LIMS
    • Single Sign-in with other cores. consistent customer interface for all services.
  • could populate information in Minilims module, or customer development within LSCLC LIMS
  • Can probably use the order forms in minilims
    • integrate with LSCLC LIMS authentication
    • consider possible dual-path authentication that supports cuwebauth as an option
  • Sample submissions are required to be in a plate format with 95 samples and 1 random blank. 
  • We ask for concentration, but we re-quantify and it would be great to be able to enter those new numbers into the database.
  • I would love a system which would notify sender when I check in the DNA so I don't have to manually send them an email.

General Workflow notes

  • sufficient dashboard pages to see what is in queue and at what step
  • when a task is completed, update status flag for data object (sample, project, etc)

Integration with Submitting DNA Sequencing requests to service providers

  • Currently I have to enter sample information onto the Core Facility website by hand.  Automation of this would be complicated by my need to assign billing to different accounts.  In addition, after Tom has made the flowcell I go onto Wikilims to get the flowcell name and lane assignments for the samples I submitted. 
  • If it is a 384-plex I make a new name for the sample submitted that is not recorded anywhere (could be written in redmine, but I don't always remember).  The database lane_id is also appended to the name so there is a numerical way to check back, but people like a human readable name that correlates to the project.
  • If samples are going to be sequenced off site then, in addition to the above steps I need to create additional emails and packing slips.

QC Scripts

  • QC Scripts
  • Comments and request changes for QC output
  • Besides the changes I listed above, but the biggest bug is that the scripts look for the cut site overhang and then the barcode.  Barcodes containing the cut-site overhang are not recognized and we see it as 0 reads.  The data pipeline does barcode and cut-site recognition simultaneously and so you still get data for these barcodes.
  • Am excited by the possibility of pulling reports off the Illumina machines to further our QC.  I was also introduced to a Java based program, FastQC, which may be helpful for us.
  • Assuming qc reports are pulled into minilims and parsed, how flexible can we do queries to find statistics over time and although each report is by lane, it contains plate information. slice and dice

Data Distribution

  • no changes required at this time ???
  • possibly change SOP to record that data was released
  • Our current scope does not include any changes to data distribution, which is currently done manually via dropbox or hard drive.
  • detailed specification for the information/format of data and metadata distribution:
    • Rob Elshire will be producing a standard for information (In my mind I think of it like MIAME or MIAPE standards).
      there may be different definitions for different classes of customers
      • customers that receive only sequence data and metadata about panel in the form of a "Key File".
      • customers for whom the facility has produced genotypes. they may or may not also want the sequence data.
  • build a distribution mechanism that defines the contents of an archive file that can be accessed via web link, including programmatically (curl or wget) and suitable notification
    • could be modelled on the distribution that our sequencing facility produces
      • note use of "cntrl" parameter which is a keycode so that anyone with that link that includes that parameter can download the file without explicit authentication. Since it could be brute forced it nags at me a little bit, but it does make programmatic downloads easier and security by obscurity should be sufficient enough for this purpose.
    • CLC will provide storage available to minilims from which files can be served. whoever runs the pipeline analysis will assemble archive and place in filesystem
      data distribution spec:

Billing

new billing overview

  • minilims will provide a "dashboard" page
    • Theresa will define the information on this page. I'm not sure what "level" of view she needs. per project, per sample, per order?
    • minilims can make a RESTful query to CLC LIMS/Billing System and return existing invoice information per project (or sample or order)
    • minilims page will provide functionality to open an existing invoice (in new window?)
    • minilims page will provide functionality to submit an HTTP GET or POST to RESTful interface in the CLC LIMS/Billing System that will
      • provide an order or invoice number (available to minilims at time order is placed), a line item(s) description, price, quantity
      • service will create an invoice and return a reference to the invoice so that it can be opened from minilims

Misc operations

  • Configure new adapter plate configurations
    • self service interface/upload?
  • build rich enough interface for web? extend and maintain c# "product"?
  • Custom DB queries desired added functionality:
  • Make more editable.  Need to be able to edit web submission fields after they are entered into database.
    • Sometimes we find mistakes after data has been analyzed, these revisions must be tracked somehow.
    • Other things we may want to change
      • Allow edits to submission before approval?  Keeps from having to reject over small issue, can just edit and approve.
      • Other mistakes (i.e. taxa names, standardizing project names) only a special person can enter these and it isn't easy- keep it that way?
  • No labels