Page History

11/25/2014

loading markers

blast existing marker sets and assign primary/synonym status based on which was created first

SynOp
Cornell Master
HWWAMP_GBS_2013
HWWAMP_GBS_2014

10/14/2014 - T3 conference call

The import template for germplasm lines should be unique for each website. This needs to be fixed on the template page.
After converting barley website to use genetic characters then we should be able to use a common code source from git.
add description to wizard when selecting Breeding Program or Data Program
use BLAST to check for marker synonyms

10/2/2014 - API conference call

look at KDDart
read up on DAL and authentication
which API part in KDDart do we want to use

API

CIMMYT adopt BMS for Maize and Wheat
demo at PAG (germplasm, fieldbook)
meeting at PAG - day after confernece

KDDart
http://software.kddart.com/help/DAL/index.html
http://www.diversityarrays.com/kddart

T3 paper

select common single gene trait and compare gwas and maps

Fieldbook design to study

8/29/2014 - API conference call

having trouble integrating php pages. Clay's pages require multiviews to be set off and Dave's pages require multiviews on. Needs work
Lukas would like api to be compatible with pure javascript. This requires either jsonp or CORS. Need to see which method will be easier to implement

8/18/2014

loading GBS on T3

loading is slow as expected. If the program assumes that all data is new then it completes in 24-48 hours
rebuilding allele_cache takes 2 hour
rebuilding alleles_bymaker takes 2 hours
rebuilding alleles_byline is not possible because the query to determine markers in a given line takes forever.
this table is used by clustering and line filtering (I expect the clustering routine could be rewritten to use allele_bymarker table or ignore GBS data

8/14/2014

post code fro api in github
add link to fieldbook github

7/31/2014

API conference call
tablet interface
field layout - export from website
traits - export from website, import to website
traits - import to website

7/28/2014

merge public/private data
1) change import format so data can be appended
    lines (add column for breeding program, assume crop)
    phenotype results (add column for trial, breeding program)
    genotype results (create a static page of load files for markers and genotype data)

storage for GBS data
1) create 2D mysql tables for alleles_byline and alleles_bymarker using new entry for each experiment/line experiment/marker.
    relational database links to lines, markers, experiments
    con: slow to extract random set of lines and markers
2) create HDF5 files for each data set. These can be created and accessed using Python CGI scripts. Most tools and download could be modified to use this data source.
    1 file for each genotype experiment
    not relational, constraints have to be programed in code
    slices and chunks of data can be queried
    con: difficult to have random set of lines and markers
Criteria - performance - similar
               complexity of access (hdf5 query tools are little more complicated but it is easy to bring in whole data set
               integration with website - mysql has much better integration, python cgi and mod_python has good tools

7/17/2014

- loading 1.3M WCSS1 markers
- report from AOWC (JL)
- oat consensus map from Jessica Schluete
- New oat breeder in Florida wants his own T3.
- do we need sandbox version (not now)

...

experiment design and tablet
1. options for working on sandbox
a) on sandbox user will 1)create trial annotation 2)create experiment design 3)export field layout and trait definition 4) make measurements
the plot_id in the field book comes from the sandbox which will not match anything in productions. we can ignore the plot_id if we have trial for each plot. Should we have a selection box for trial when loading from fieldbook?
2. give users permission to load into production database

7/14/2014

changes to marker upload page (Clay)
1. order allele optional
2. the check for duplicate names and sequences within the import file is always run. It now skips these duplicate entries when loading into database.
3. fixed bug where import_file_log table was not being updated
4. improved instructions
5. check both strands?

...

requirements (trial annotation, field layout loaded)
1. save a unique id for the field layout that works across machines
2. do we want to allow specific people to upload or only allow curator?
3. problem with plot_id is unique to machine. could assign this arbitrarily then translate
4. need method to verify that field layout in tablet matches layout in database.
5. could give user permission to load trial on production machine.
Proposal
work from production machine not sandbox.
require user to submit fieldbook and plot level data to curator
1. submit fieldbook 2. wait for curator to load 3. generate import file for tablet. 4. send to curator

7/7/2014

1. GBS data - when to order alleles alphabetically. Should resolve the case where strand is unknown.
    check opposite strand when doing synonym match
    discard duplicates within import file
    list the markers removed from import file and save to log file
    the allele order should not affect synonym matches
2. Manage tab (phenotype trials) - harvest date misspelled

6/30/2014

- HWWGBS marker sequence (29K duplicates out of 82K SNPs) - Jesse
- curator_data/.htaccess to increase default PHP resource limits
*D: Move to php.ini
- API wants genotype data by experiment. Need to cache it this way?
- confer with Clay
- loading Eduard's SNP markers - Vic
*C: Ref vs Alt alleles: Ref always first in sequence [/]?
*C: Parse the genotype calls from the .vcf file for loading.
- *: Dropbox overload/lockup: delete big files

6/23/2014

1. HWWGBS marker sequence (29,865 duplicates out of 82,000 SNP markers) - Vic
    loading GBS data (HWW and Cornell Master) - Vic
    how to handle GBS tags with more than one SNP
    what is dif score in file header? what should be used for assembly name?

...

3. Updates to the download page - Clay
    independent choice of phenotype or genotype
    genotype using one genotype experiment
    definition of terms
    warnings if large download or bad filter selection

6/09/2014

Gina's Barley genotype calls, continous values 0 to 2. - store in a new column. Drop the old Illumina raw score data. Should we combine the alleles table with the genotype_data table?
Eduards GBS genotype data - use 128 base pairs
update INSTALL document with new R package.

...

Lab research update - should add a note to download page that describes why the filter setting on the production sever need higher settings

6/02/2014

Big Projects for T3

1. predefined data sets including filtered lines and markers. Used to access data from published studies. Do analysis tools use this or only download.
2. single login using Google account authentication
3. imputation of marker data
4. Automated validation -- T3 should perform checks on newly uploaded data to identify possible errors (phenotypic outliers, unlikely marker scores based on local haplotypes, ...).
5. Diversity panel selection -- T3 should be able to perform an analysis that identifies the N most genetically diverse lines, based on marker alleles, in the currently selected set of lines.
6. accelerate genotype data retrieval (HDF5? using byte storage and compression similar to TASSEL) consider data growth
7. provide option to download genotype data by experiment not consensus. email results of slow query. Do users need analysis of data? (maybe use predefined data set)
8. manage lists - a page to view/edit/save/name selections

...

template files should be in sync with template page.

5/27/2014

follow up with Katmandoo and Seattle API
- start work with http://docs.breeding.apiary.io/

...

Big Projects meeting
- Clay send out announcement and request for topics

PAG Asia

5/19/2014

report from crop database API workshop (Dave M)
Genome Back Office - sharing genotyping data (JL, Ed Buckler, Susan McCouch)
git site - https://github.com/plantbreeding
http://docs.breeding.apiary.io/
http://www.ebi.ac.uk/rdf/documentation/uris-ebi-data
SSL configuration of email and logon (Dave H)
experiment design page (Clay)
row/column not correct
Iowa State University (Lawrence Lab) national group of maize researchers evaluating T3

5/12/2014

- request for database dump from evogene
should remove user table, add link from documentation page

...

- Alleles for all lines - add genotype experiment column

5/5/2014

- Do we need to alphabetize alleles for Synop GBS data? (Vic)
- How to set an existing phenotype data point to 'missing'? (David M)
- GWAS
- added option for variance calculation method (Clay)
- interpreting the Validation plot (Vic)
- Download page - added selection option for genotype data download (Clay)
- User group meeting May 12th (Trial Design demo, Clay)
   add dialog box for upload and check lines
   add list of traits measured
   add links to android field book
   add links to phenotype import
- New template for phenotype data from multiple trials per file (DaveM)

- Methods section for manuscript (Vic)
- phenotypes for replicated checks not stored uniquely, can't edit (DaveM)
- HWW = "Hard Red Winter Wheat" [Eduard]? "Hard White Wheat"? (DaveM)

4/28/2014

T3 modules
    Presentation-abstraction-control (PAC), model-view-control (MVC)
    functional programming    modular programming - technique that emphasizes separating the functionality of a program into independent, interchangeable modules
    Dave M will meet with Gates foundation to see if there is common ground between groups
    WebEx with developers of Katmandoo

...

interchange between DArTdb, Katmandoo, T3
    DArTdb - internal LIMS at DArT
    KDDart - data storage and integration platform hosted by DArT
    Katmandoo - database and client software (trial management, pedigree(beta), windows mobile(beta), crossing tool, molecular marker, inventory.
    Fieldscorer 4 Android - collecting trait data in field

4/21/2014

GBS

Cornell Master - check with Lynn to code Pioneer lines

...

Page tree

Versions Compared

Old Version 69

New Version Current

Key