February 26 - GUI example 1 / 4 : The Cancer Genome Project
Integrated data management and analysis to put the systemic puzzle together (for more see Mission)
February 19 - Homework review: UCSC data service API
- The reference report was published at http://www.biomedcentral.com/1471-2105/8/333.
- Affymetrix Integrated Genome Browser.
- GenBoree at Baylor uses DAS.
- DAS 2 was released in 2007 and supprots both retrieval and submission.
- DAS workshop 2008.
February 12,14 - S3DB, a distributed semantic DBMS
Individual accounts will be created in class. You can also download your own at http://www.s3db.org/. Additional documentation can also be found here.
The N3 notation for RDF is VERY instructive, have a look here.
For complete documentation on RDF the right place to go is the source, W3C's reference documentation.
example of public RDF browser: Welkin.
example of commercial RDF browser: Sentient.
Function for building S3QL queries: S3QLSyntax
February 7 - more on Graphic User Interfaces
January 31 - Review of homework assignments - dynamic programming for sequence alignment and regular expressions:
HOMEWORK
(note old solutions at http://ibl.mdanderson.org/~jalmeida/IB2008/)
Write you favorite implementation of a UCSC client in matlab and include with your m-files a archive html report describing its use, produced with cell programming + publish.
January 24 - hands on session developing a client for UCSC Genome Browser (everybody):
Integrative genomics using DAS : http://www.biomedcentral.com/1471-2105/8/333
HOMEWORK
Don't forget pending assignments: Write matlab function that reads HTML table into cell array and also the alignment homework. This is due Thursday,.
Biodas queries against UCSC Genome Database
%% Data sources command
%Get all genomes (data sources) available. XML looks like this:
%
%
%
%
%
%
% ...
%
% this code is for manipulating using DOM API
genomes_Dom = xmlread('http://genome.ucsc.edu/cgi-bin/das/dsn');
dsnL = genomes_Dom.getElementsByTagName('DSN');
sourceL = genomes_Dom.getElementsByTagName('SOURCE');
descrL = genomes_Dom.getElementsByTagName('description');
% this is code for manipulating using XML Toolbox from Geodise (download it first)
xml = urlread('http://genome.ucsc.edu/cgi-bin/das/dsn');
t_parseany = xml_parseany(xml);
%when comes to a leave, use this:
t_parseany.DSN{1}.SOURCE{1}.ATTRIBUTE(1)
t_parseany.DSN{1}.SOURCE{1}.CONTENT
%% entry_points command
%Get all chromosomes (entry_points command). Here, hg16 refers to a specific genome assembly (Human Genome July 2003), as returned w/ the data source command.
%eg.
urlread('http://genome.ucsc.edu/cgi-bin/das/hg16/entry_points')
%% types command
%Get annotation types for a sequence segment. Annotation types are all features annotated for sequence, as knownGenes, snps, mRNAs, ESTs, exons, introns and so on.
%This command give us a overview of the annotation for the sequence, including the number of each feature. XML looks like this:
urlread('http://genome.ucsc.edu/cgi-bin/das/hg16/types?segment=4:3000000,4000000');
%% types command with parameters
%Get one or more specific features for sequence. Several features and/or segments can be queried at same time.
%eg. Retrieve all SNPs and identified genes for the segment.
urlread('http://genome.ucsc.edu/cgi-bin/das/hg16/features?segment=4:3000000,4000000;type=snp;type=knownGene');
%% dna command
%Get raw nucleotide sequence data.
urlread('http://genome.ucsc.edu/cgi-bin/das/hg16/dna?segment=chr4:30000,300100');
1. Assessing the Significance of Conserved Genomic Aberrations Using High Resolution Genomic Microarrays. http://genetics.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2Fjournal.pgen.0030143
2. Computation of recurrent minimal genomic alterations from array-CGH data http://bioinformatics.oxfordjournals.org/cgi/content/abstract/22/7/849
3. STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments
http://www.genome.org/cgi/content/abstract/gr.5076506v1
4. SIRAC: Supervised Identification of Regions of Aberration in aCGH datasets
http://www.biomedcentral.com/content/pdf/1471-2105-8-422.pdf
5. A Robust Algorithm for Copy Number Detection Using High-Density Oligonucleotide Single Nucleotide Polymorphism Genotyping Arrays
http://cancerres.aacrjournals.org/cgi/content/abstract/65/14/6071
6. Modeling recurrent DNA copy number alterations in array CGH data
http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/13/i450
7. Efficient Calculation of Interval Scores for DNA Copy Number Data Analysis
http://www.liebertonline.com/doi/abs/10.1089/cmb.2006.13.215?journalCode=cmb
http://www.pnas.org/cgi/content/full/104/50/20007
January 24 - hands on session developing a client for UCSC Genome Browser (everybody):
HOMEWORK
Don't for get last sessions's: Write matlab function that reads HTML table into cell array.
January 22 - Data structures and data services (Pablo, Jonas):
HOMEWORK
Write matlab function that reads HTML table into cell array.
1. Discussion of matrix notation using teh homework assignment. [my solution].
2. Alignment as a similarity metric. [Presentation].
3. Discussion of collective assignment on developing a client that will use UCSC Genome Browser as a data service.
HOMEWORK
Since you did so well in the introductory class, today we move to an advance algorithm deployment assignment. The Homework is described in the last slide of the presentation.
This class will introduce the two main components of the integrative exercise: data structures and programming languages. The exploration of these two topics will be pursued in MATLAB, a fast prototyping scientific and engineering programming environment.
In addition to the very extensive help material that comes with MATLAB (from manuals to viodeos, clisck on "Help" in the top menu to find more), Mathworks' website also includes a great selection of webminars.
HOMEWORK
Today we have a small homework assignment just to make sure we all know how to send them to me: write a m-function that identifies the largest element of a matrix and return their location.