Weekly Conference Call

8/29/2014 - API conference call

having trouble integrating php pages. Clay's pages require multiviews to be set off and Dave's pages require multiviews on. Needs work
Lukas would like api to be compatible with pure javascript. This requires either jsonp or CORS. Need to see which method will be easier to implement

8/18/2014

loading GBS on T3

loading is slow as expected. If the program assumes that all data is new then it completes in 24-48 hours
rebuilding allele_cache takes 2 hour
rebuilding alleles_bymaker takes 2 hours
rebuilding alleles_byline is not possible because the query to determine markers in a given line takes forever.
this table is used by clustering and line filtering (I expect the clustering routine could be rewritten to use allele_bymarker table or ignore GBS data

8/14/2014

post code fro api in github
add link to fieldbook github

7/31/2014

API conference call
tablet interface
field layout - export from website
traits - export from website, import to website
traits - import to website

7/28/2014

merge public/private data
1) change import format so data can be appended
    lines (add column for breeding program, assume crop)
    phenotype results (add column for trial, breeding program)
    genotype results (create a static page of load files for markers and genotype data)

storage for GBS data
1) create 2D mysql tables for alleles_byline and alleles_bymarker using new entry for each experiment/line experiment/marker.
    relational database links to lines, markers, experiments
    con: slow to extract random set of lines and markers
2) create HDF5 files for each data set. These can be created and accessed using Python CGI scripts. Most tools and download could be modified to use this data source.
    1 file for each genotype experiment
    not relational, constraints have to be programed in code
    slices and chunks of data can be queried
    con: difficult to have random set of lines and markers
Criteria - performance - similar
               complexity of access (hdf5 query tools are little more complicated but it is easy to bring in whole data set
               integration with website - mysql has much better integration, python cgi and mod_python has good tools

7/17/2014

- loading 1.3M WCSS1 markers
- report from AOWC (JL)
- oat consensus map from Jessica Schluete
- New oat breeder in Florida wants his own T3.
- do we need sandbox version (not now)

How to merge local and public db
1) load multiple files - most imports require user input or confirmation
2) export T3 format similar to template
    this could work on a table level but for linked tables it would require special processing
    for lines - export line_records and line_synonyms except uid
    trial means would have to be rewritten
    for markers - export markers and marker_synonyms
    phenotype trials would have to be rewritten
3) change import scripts so they can be appended
need proposal for next week

instructions for exporting from agrobase then load into T3
test page speed with SSL

- T3/GrainGenes coordination document

- Experiment design tablet integration
use sandbox website and ignore plot_id or
use production website and allow user curation
- SSL certificate from Comodo

experiment design and tablet
1. options for working on sandbox
a) on sandbox user will 1)create trial annotation 2)create experiment design 3)export field layout and trait definition 4) make measurements
the plot_id in the field book comes from the sandbox which will not match anything in productions. we can ignore the plot_id if we have trial for each plot. Should we have a selection box for trial when loading from fieldbook?
2. give users permission to load into production database

7/14/2014

changes to marker upload page (Clay)
1. order allele optional
2. the check for duplicate names and sequences within the import file is always run. It now skips these duplicate entries when loading into database.
3. fixed bug where import_file_log table was not being updated
4. improved instructions
5. check both strands?

change yes/no to if/than
add title to program description
combine checkbox and tree using jsTree plugin for jquery
change "reference map" to "genome seq"
link to download just synonyms/updates/additions

American Oat Workers Conf (Dave, JL)
Private Oat instance and admin support (James)
T3 paper (Vic)
removing saved session when logged in (JL)
experiment design page, how can tablet device save measurements in production website? (Clay)

fix plot_id type in fieldbook table

requirements (trial annotation, field layout loaded)
1. save a unique id for the field layout that works across machines
2. do we want to allow specific people to upload or only allow curator?
3. problem with plot_id is unique to machine. could assign this arbitrarily then translate
4. need method to verify that field layout in tablet matches layout in database.
5. could give user permission to load trial on production machine.
Proposal
work from production machine not sandbox.
require user to submit fieldbook and plot level data to curator
1. submit fieldbook 2. wait for curator to load 3. generate import file for tablet. 4. send to curator

7/7/2014

1. GBS data - when to order alleles alphabetically. Should resolve the case where strand is unknown.
    check opposite strand when doing synonym match
    discard duplicates within import file
    list the markers removed from import file and save to log file
    the allele order should not affect synonym matches
2. Manage tab (phenotype trials) - harvest date misspelled

6/30/2014

- HWWGBS marker sequence (29K duplicates out of 82K SNPs) - Jesse
- curator_data/.htaccess to increase default PHP resource limits
*D: Move to php.ini
- API wants genotype data by experiment. Need to cache it this way?
- confer with Clay
- loading Eduard's SNP markers - Vic
*C: Ref vs Alt alleles: Ref always first in sequence [/]?
*C: Parse the genotype calls from the .vcf file for loading.
- *: Dropbox overload/lockup: delete big files

6/23/2014

1. HWWGBS marker sequence (29,865 duplicates out of 82,000 SNP markers) - Vic
    loading GBS data (HWW and Cornell Master) - Vic
    how to handle GBS tags with more than one SNP
    what is dif score in file header? what should be used for assembly name?

2. Emergency planning and preparedness (federal shutdown or natural disaster) - Clay
    database backup, website, DNS
    wheat/raw 8G
    wheat/curator_data/uploads 900M

3. Updates to the download page - Clay
    independent choice of phenotype or genotype
    genotype using one genotype experiment
    definition of terms
    warnings if large download or bad filter selection

6/09/2014

Gina's Barley genotype calls, continous values 0 to 2. - store in a new column. Drop the old Illumina raw score data. Should we combine the alleles table with the genotype_data table?
Eduards GBS genotype data - use 128 base pairs
update INSTALL document with new R package.

User group conference call - bug in download page, should allow phenotype, genotype, or both. In experiment design should the select trial give you a list of lines?

Lab research update - should add a note to download page that describes why the filter setting on the production sever need higher settings

6/02/2014

Big Projects for T3

1. predefined data sets including filtered lines and markers. Used to access data from published studies. Do analysis tools use this or only download.
2. single login using Google account authentication
3. imputation of marker data
4. Automated validation -- T3 should perform checks on newly uploaded data to identify possible errors (phenotypic outliers, unlikely marker scores based on local haplotypes, ...).
5. Diversity panel selection -- T3 should be able to perform an analysis that identifies the N most genetically diverse lines, based on marker alleles, in the currently selected set of lines.
6. accelerate genotype data retrieval (HDF5? using byte storage and compression similar to TASSEL) consider data growth
7. provide option to download genotype data by experiment not consensus. email results of slow query. Do users need analysis of data? (maybe use predefined data set)
8. manage lists - a page to view/edit/save/name selections

9. external access to data and tools (common API)
structured data format on web pages (RDF) send journal from EBI, present idea to users
10. jBrowse, gBrowse
11. links to external websites
12. using iPlant eXceed super computer resources, TASSEL resources
13. Sandbox, plot level loading and analysis (playground) needs documentation. provide sandbox on iPlant to preserve user changes.
14. Field book integration

development time, priority

June 9th User group agenda
allele conflicts update - finding and deleting bad datadownload page - option for only phenotype download
experiment design - describe agricoleae, integration with field book
GBS naming, loading

June 2nd agenda
Barley GBS project

tcap oat server - can mysql/apache run multiple copies?

template files should be in sync with template page.

5/27/2014

follow up with Katmandoo and Seattle API
- start work with http://docs.breeding.apiary.io/

follow up with Trevor and Field Book development
- new on github https://github.com/trife/Field-Book
- new SQLite structure?, connection with T3 and Cassavabase
1) login API using google for single signon
2) drop box or other for upload and backup data
3) connect to T3 for traits and layout

next user group conference call - June 9th?

experiment design page
- converted layout of madii design to t3 format
- for madii the check column need to be changed from line number to boolean
- for madii the order of check lines determine if primary or secondary
- wait for new version of madii script

GBS data
- Marker name: contig3917765_1al-5481 or IWGSC2012_3917765_1al-5481
For the different versions of some chromosome-arms: IWGSC2012_4as_v2-1185913

Big Projects meeting
- Clay send out announcement and request for topics

PAG Asia

5/19/2014

report from crop database API workshop (Dave M)
Genome Back Office - sharing genotyping data (JL, Ed Buckler, Susan McCouch)
git site - https://github.com/plantbreeding
http://docs.breeding.apiary.io/
http://www.ebi.ac.uk/rdf/documentation/uris-ebi-data
SSL configuration of email and logon (Dave H)
experiment design page (Clay)
row/column not correct
Iowa State University (Lawrence Lab) national group of maize researchers evaluating T3

5/12/2014

- request for database dump from evogene
should remove user table, add link from documentation page

- marker panels for PVP application (Vic)
we have "my Marker Panels" working

- Mark has assigned codes for proprietary lines in Cornell Master
private lines coded as NYCNL##, Vic will load malt machine first for checking

- reformatting marker sequences for synthetic (Vic)
new web script to translate AB into ACTG before loading into db

- API for T3
wheatplus/api

- On GWAS page made changes to label of result exports. Need to review with JL and others if the page is clear

- Alleles for all lines - add genotype experiment column

5/5/2014

- Do we need to alphabetize alleles for Synop GBS data? (Vic)
- How to set an existing phenotype data point to 'missing'? (David M)
- GWAS
- added option for variance calculation method (Clay)
- interpreting the Validation plot (Vic)
- Download page - added selection option for genotype data download (Clay)
- User group meeting May 12th (Trial Design demo, Clay)
   add dialog box for upload and check lines
   add list of traits measured
   add links to android field book
   add links to phenotype import
- New template for phenotype data from multiple trials per file (DaveM)

- Methods section for manuscript (Vic)
- phenotypes for replicated checks not stored uniquely, can't edit (DaveM)
- HWW = "Hard Red Winter Wheat" [Eduard]? "Hard White Wheat"? (DaveM)

4/28/2014

T3 modules
    Presentation-abstraction-control (PAC), model-view-control (MVC)
    functional programming    modular programming - technique that emphasizes separating the functionality of a program into independent, interchangeable modules
    Dave M will meet with Gates foundation to see if there is common ground between groups
    WebEx with developers of Katmandoo

API
    cropontology.org/api
    symantec web
    linked data

Changing Variety names of lines in T3 that are also in GRIN to a lab-specific code names.
     If we do this (i.e. create new unique names for those 17 lines with WB1
     through WB17 as primary T3 names), we are going to have to edit and
     re-upload all four of the phenotype trials and genotype data for the LRpanel.
     Adult Leaf Rust Response
     This request is to recode the line names to use experiment specific names (this is relatively easy)

Download page, genotype data optional
add check box to select genotype data
on map selection page find way to make page faster or background computation optional

Compare trials page, plots for more than 2 trials

marker import file for Eduard's GBS data

interchange between DArTdb, Katmandoo, T3
    DArTdb - internal LIMS at DArT
    KDDart - data storage and integration platform hosted by DArT
    Katmandoo - database and client software (trial management, pedigree(beta), windows mobile(beta), crossing tool, molecular marker, inventory.
    Fieldscorer 4 Android - collecting trait data in field

4/21/2014

GBS

Cornell Master - check with Lynn to code Pioneer lines

Eduard GBS - save assembly version

Download page - make genotype download optional

POPSEQ data - more wheat large data coming from Jessie, do not save imputed data

Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)

Analysis

Karen - curren page compares two trials, need to compare check/controls across all trials, locations, years

Colaboration

discuss with Lee Hickey how to get all data into one database

North American Barley Researchers Workshop, June 29 to July 2, 2014, University of Minnesota
Kevin Smith, Karen Beaubien
July 2 14:00-17:00 Special Workshop on “Big Data” at Science Teaching & Student Services, 222 Pleasant Street SE

Page tree