January 24 - hands on session developing a client for UCSC Genome Browser (everybody):
- BioDAS (Diogo)
- Review of interoperable solutions.
- Introduction to the Document Object Model (DOM) and how it can be mapped into a regular data structure using XML. See for example Matlab's XMLREAD command and compare it with this tool box.
Integrative genomics using DAS : http://www.biomedcentral.com/1471-2105/8/333
HOMEWORK
Don't forget pending assignments: Write matlab function that reads HTML table into cell array and also the alignment homework. This is due Thursday,.
Biodas queries against UCSC Genome Database
%% Data sources command
%Get all genomes (data sources) available. XML looks like this:
%
%
%
%
%
%
% ...
%
% this code is for manipulating using DOM API
genomes_Dom = xmlread('http://genome.ucsc.edu/cgi-bin/das/dsn');
dsnL = genomes_Dom.getElementsByTagName('DSN');
sourceL = genomes_Dom.getElementsByTagName('SOURCE');
descrL = genomes_Dom.getElementsByTagName('description');
% this is code for manipulating using XML Toolbox from Geodise (download it first)
xml = urlread('http://genome.ucsc.edu/cgi-bin/das/dsn');
t_parseany = xml_parseany(xml);
%when comes to a leave, use this:
t_parseany.DSN{1}.SOURCE{1}.ATTRIBUTE(1)
t_parseany.DSN{1}.SOURCE{1}.CONTENT
%% entry_points command
%Get all chromosomes (entry_points command). Here, hg16 refers to a specific genome assembly (Human Genome July 2003), as returned w/ the data source command.
%eg.
urlread('http://genome.ucsc.edu/cgi-bin/das/hg16/entry_points')
%% types command
%Get annotation types for a sequence segment. Annotation types are all features annotated for sequence, as knownGenes, snps, mRNAs, ESTs, exons, introns and so on.
%This command give us a overview of the annotation for the sequence, including the number of each feature. XML looks like this:
urlread('http://genome.ucsc.edu/cgi-bin/das/hg16/types?segment=4:3000000,4000000');
%% types command with parameters
%Get one or more specific features for sequence. Several features and/or segments can be queried at same time.
%eg. Retrieve all SNPs and identified genes for the segment.
urlread('http://genome.ucsc.edu/cgi-bin/das/hg16/features?segment=4:3000000,4000000;type=snp;type=knownGene');
%% dna command
%Get raw nucleotide sequence data.
urlread('http://genome.ucsc.edu/cgi-bin/das/hg16/dna?segment=chr4:30000,300100');
1 comment:
There is a paper here describing how to access UCSC Genome Database using a C API.
Post a Comment