Tuesday, January 29, 2008

Integrative Bioinformatics 6

[Index of classes ]

January 24 - hands on session developing a client for UCSC Genome Browser (everybody):

  1. BioDAS (Diogo)
  2. Review of interoperable solutions.
  3. Introduction to the Document Object Model (DOM) and how it can be mapped into a regular data structure using XML. See for example Matlab's XMLREAD command and compare it with this tool box.

Integrative genomics using DAS : http://www.biomedcentral.com/1471-2105/8/333

HOMEWORK

Don't forget pending assignments: Write matlab function that reads HTML table into cell array and also the alignment homework. This is due Thursday,.



Biodas queries against UCSC Genome Database

%% Data sources command

%Get all genomes (data sources) available. XML looks like this:
%
%
% Mar. 2006 at UCSC
% http://genome.cse.ucsc.edu:80/cgi-bin/das/hg18
% Human Mar. 2006 Genome at UCSC
%

% ...
%

% this code is for manipulating using DOM API
genomes_Dom = xmlread('http://genome.ucsc.edu/cgi-bin/das/dsn');
dsnL = genomes_Dom.getElementsByTagName('DSN');
sourceL = genomes_Dom.getElementsByTagName('SOURCE');
descrL = genomes_Dom.getElementsByTagName('description');

% this is code for manipulating using XML Toolbox from Geodise (download it first)
xml = urlread('http://genome.ucsc.edu/cgi-bin/das/dsn');
t_parseany = xml_parseany(xml);

%when comes to a leave, use this:
t_parseany.DSN{1}.SOURCE{1}.ATTRIBUTE(1)
t_parseany.DSN{1}.SOURCE{1}.CONTENT

%% entry_points command

%Get all chromosomes (entry_points command). Here, hg16 refers to a specific genome assembly (Human Genome July 2003), as returned w/ the data source command.
%eg.
urlread('http://genome.ucsc.edu/cgi-bin/das/hg16/entry_points')

%% types command

%Get annotation types for a sequence segment. Annotation types are all features annotated for sequence, as knownGenes, snps, mRNAs, ESTs, exons, introns and so on.
%This command give us a overview of the annotation for the sequence, including the number of each feature. XML looks like this:
urlread('http://genome.ucsc.edu/cgi-bin/das/hg16/types?segment=4:3000000,4000000');

%% types command with parameters

%Get one or more specific features for sequence. Several features and/or segments can be queried at same time.
%eg. Retrieve all SNPs and identified genes for the segment.
urlread('http://genome.ucsc.edu/cgi-bin/das/hg16/features?segment=4:3000000,4000000;type=snp;type=knownGene');

%% dna command

%Get raw nucleotide sequence data.
urlread('http://genome.ucsc.edu/cgi-bin/das/hg16/dna?segment=chr4:30000,300100');







1 comment:

Diogo said...

There is a paper here describing how to access UCSC Genome Database using a C API.