Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

Bioteam Pre-Proposal 20120309

IGD-GBS MiniLIMS Proposal v1_2.doc

action items

  • Rob Elshire and IGD generate minimum distribution information standard
  • Theresa defines details of billing dashboard page
  • IGD create lists of pages (data objects and dashboards?)
  • IGD define roles and authorization levels for each page
    • define "page"? I thought you might have meant data objects, but I think not. if you mean "user interface" isn't it better to define in terms of functionality and use case? implementation of "pages". I guess I'm not totally positive what you need from us.
  • Charlotte detail spec for workflow
  • IGD detailed sample submission spec
  • James define APIs for billing and ordering integration with CLC LIMS
    • not a pre-requisite for contract and project launch.

items for discussion

  • pull qc data into minilims and report on it - including possibility of illumina data reports but only possibility of getting this would be Peter's facility, not other providers. fastqc is easy to run and staff can do it themselves easily.
  • IGD concern about maintaining and modifying QC scripts. likely an internal problem rather than bioteam
  • specific discussion with bioteam about workflow implementation per charlotte's detailed spec.

Overview

This is a project to replace and enhance existing laboratory management systems supporting the Institute for Genomic Diversity's Genotyping By Sequencing (GBS) service.

As IGD has become more of a service facility, we need a more cohesive system for accepting samples, tracking samples, and customer billing. Currently these are all completely separate systems, with no linkage between them. We also have no centralized support, as all the web-based systems were designed by different people for different reasons. Pricing table and procedures can be found here http://www.igd.cornell.edu/index.cfm/page/projects/GBS/GBSpricing.htm

GBS Paper in PLoSone: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0019379

Current Components

Specification for new implementation

Overview: Detailed Workflow Spec (slides)

Functionality: Staff authentication and authorization- 10/25/12- no interaction possible

  • Can authentication be accomplished via the CLC lims and then a token be passed to minilims as a form of SSO? no
  • It isn't clear from the minilims docs whether built-in roles can be extended or if roles can be given permissions on a per-page or per data object basis.
    • IGD will be preparing a list of roles and map them the list of pages and we will compare how they map to minilims built-in versus cu$tom dev

Functionality: Customer Authentication and Authorization- 10/25/12- this has been eliminated

  • customers will log into the CLC lims (as they do for other cores)
  • CLC lims will collect and store payment information (including PCI-DSS scope)
  • customers will be handed off to the sample submission interface in minilims where they will provide GBS specific ordering information
    • implies that no additional authentication is needed for minilims explicitely, pass a token from CLC lims
  • Data distribution: see data distribution section

Functionality: Project management- 10/25/12- still not started

  • What we like about redmine-  A really nice way to see the history of a particular plate of DNA or project:
    • Project summary: enzyme to use, contact information, expected sample number, comments for data analysis
    • Project workflow where we can check off status and note if something did not go as expected
    • Links that cross-reference.  I link all costs to plates so I know what has been paid for.  We also can link redo plates with the original preps.
    • Upload submission sheet, quantification, experion files (.pdf)
    • Note payment status
    • Track optimization results and R&D
  • Would be great if we could check in samples are received when we print the barcode and automatically notify sender that they have arrived.

Data Structure: GBS Database Schema- 10/25/12- trying to replicate

  • May not need full implementation, especially most of "passport" info.
  • should be extended for Stat and QC management. right now the tables for "stat" and "qc" are really just placeholders.

Page(s): Sample Submission

  • re-implement
  • sample submission change requests
  • suggest implementing as part of LSCLC Cores LIMS
    • Single Sign-in with other cores. consistent customer interface for all services.
  • could populate information in Minilims module, or customer development within LSCLC LIMS
  • Can probably use the order forms in minilims- no
    • integrate with LSCLC LIMS authentication
    • consider possible dual-path authentication that supports cuwebauth as an option
  • concept slides
    **Step 1 and 2 are literally true. customer will authenticate using CLC lims - customer will select GBS as the service and will provide payment information - customer session will be passed to Minilims
    • #7 we should consider querying minilims from our core lims and displaying the status information along with other core services on that dashboard.
  • Sample submissions are required to be in a plate format with 95 samples and 1 random blank.- need coding 
  • We ask for concentration, but we re-quantify and it would be great to be able to enter those new numbers into the database.- done
  • I would love a system which would notify sender when I check in the DNA so I don't have to manually send them an email.
  • sample submission validation code

Functionality/Page(s): Integration with Submitting DNA Sequencing requests to service providers

  • Currently I have to enter sample information onto the Core Facility website by hand.  Automation of this would be complicated by my need to assign billing to different accounts.  In addition, after Tom has made the flowcell I go onto Wikilims to get the flowcell name and lane assignments for the samples I submitted. 
  • If it is a 384-plex I make a new name for the sample submitted that is not recorded anywhere (could be written in redmine, but I don't always remember).  The database lane_id is also appended to the name so there is a numerical way to check back, but people like a human readable name that correlates to the project.
  • If samples are going to be sequenced off site then, in addition to the above steps I need to create additional emails and packing slips.

Functionality: QC Scripts

  • QC Scripts
  • Comments and request changes for QC output
  • Besides the changes I listed above, but the biggest bug is that the scripts look for the cut site overhang and then the barcode.  Barcodes containing the cut-site overhang are not recognized and we see it as 0 reads.  The data pipeline does barcode and cut-site recognition simultaneously and so you still get data for these barcodes.
  • Am excited by the possibility of pulling reports off the Illumina machines to further our QC.  I was also introduced to a Java based program, FastQC, which may be helpful for us.
  • Assuming qc reports are pulled into minilims and parsed, how flexible can we do queries to find statistics over time and although each report is by lane, it contains plate information. slice and dice

Functionality/Page(s): Data Distribution

  • no changes required at this time ???
  • possibly change SOP to record that data was released
  • Our current scope does not include any changes to data distribution, which is currently done manually via dropbox or hard drive.
  • detailed specification for the information/format of data and metadata distribution:
    • Rob Elshire will be producing a standard for information (In my mind I think of it like MIAME or MIAPE standards).
      there may be different definitions for different classes of customers
      • customers that receive only sequence data and metadata about panel in the form of a "Key File".
      • customers for whom the facility has produced genotypes. they will also receive the sequence data.
  • build a distribution mechanism that defines the contents of an archive file that can be accessed via web link, including programmatically (curl or wget) and suitable notification
    • could be modelled on the distribution that our sequencing facility produces
      • note use of "cntrl" parameter which is a keycode so that anyone with that link that includes that parameter can download the file without explicit authentication. Since it could be brute forced it nags at me a little bit, but it does make programmatic downloads easier and security by obscurity should be sufficient enough for this purpose.
    • CLC will provide storage available to minilims from which files can be served. whoever runs the pipeline analysis will assemble archive and place in filesystem
      data distribution spec:- don't worry about specific files, just need a system to distribute zipped file.

Functionality/Page(s): Billing

new billing overview

  • minilims will provide a "dashboard" page
    • dashboard info from Theresa Theresa will define the information on this page. I'm not sure what "level" of view she needs. per project, per sample, per order?
    • minilims can make a RESTful query to CLC LIMS/Billing System and return existing invoice information per project (or sample or order)
    • minilims page will provide functionality to open an existing invoice (in new window?)
    • minilims page will provide functionality to submit an HTTP GET or POST to RESTful interface in the CLC LIMS/Billing System that will
      • provide an order or invoice number (available to minilims at time order is placed), a line item(s) description, price, quantity
      • service will create an invoice and return a reference to the invoice so that it can be opened from minilims

Functionality:Page(s): Misc operations

  • Configure new adapter plate configurations - done
    • self service interface/upload?
  • build rich enough interface for web? extend and maintain c# "product"?
  • Custom DB queries desired added functionality:
  • Make more editable.  Need to be able to edit web submission fields after they are entered into database. - done
    • Sometimes we find mistakes after data has been analyzed, these revisions must be tracked somehow.
    • Other things we may want to change
      • Allow edits to submission before approval?  Keeps from having to reject over small issue, can just edit and approve.
      • Other mistakes (i.e. taxa names, standardizing project names) only a special person can enter these and it isn't easy- keep it that way?
    • Batch edits?