Monday, November 26, 2007

Upcoming event: 2007 Computational & Theoretical Biology Symposium: Dec. 7-9 at Rice University

Hey folks,

it's worthy to check, the registration is free and the speakers will be talking about interested topics. I just don't know why they're making it in the week before the finals, including at Rice...


We would like to invite you to attend the 4th annual Computational &
Theoretical Biology Symposium at Rice University. It will be held from
December 7 - 9 and features invited talks of more than 20 speakers from
leading institutions across the US. Participants at this annual
symposium will gain new insights into a variety of approaches in
theoretical methods of statistical mechanics, nonlinear dynamics, and
systems biology that are being developed and applied to study and
manipulate nature.

Admission to the symposium is free and open to everyone; registration is
not required. To learn more about the hosts/venue, program and invited
speakers, visit <> .

Student and Postdoc participants are encouraged to present in a poster
session to be held on Saturday, December 8th from 1:00 - 2:30 pm and
during the coffee break. Presenters are asked to send an e-mail to <> with
poster details by Thursday, November 29th so that logistical
arrangements can be made.

Best Regards,

Symposium Organizing Committee

Monday, May 28, 2007

AMIA 2007 Spring Congress Synopsis

This year's AMIA Spring Congress involved researchers from many fields including Medical Communities, Nurses, Informatics, Basic Research (bench) and some industry representatives as well, all coming together with a common goal: translate biomedical knowledge into medical practice. The meeting included 5 tracks, spanning from nursing informatics to clinical decision support, personalized health records and translational research, so it was a good mix of several domains each with its own challenges and methodologies.
Adam Bosworth, vice-president of Google opened the meeting with a blast, his vision on how health related information will/should be handled in a not so distant future was mesmerizing, his insight on how google is prepared to adress this challenge an unexpected surprise. Here are his notes.
The translational research informatics track was one of the most interesting, it became clear how CTSA has had and is still having a key role in the development of a new science - Integrative Bioinformatics. Several CTSA awardees made their voices heard and lots of ideas flew around the room, from tools already being developed and evaluated to tools promising but still in the planning process, integration was the keyword and the main challenge.
caBIG was also given a chance to give their 2 cents, but the most enthusiastic seemed to rely mostly on in-house costumized tools. There were plenty of semantic web technologies aficionados, including users of Protege, but no Semantic or Sloppy Databases (exept ours :D).
Christopher Chute also made his presence felt, his insight in what regards the basic researchers vs clinical research "Chasm of Semantic Despair" was particularly insightfull.

Tuesday, May 15, 2007


Internal web-based resource for IBL people can be accessed here: IBLabook. If you have problems accessing it, think you should be given access, or have recommendations for changes to what is available there please bring it up at the weekly lab meeting, Tuesday's 9am at HMB 13.304.

Thursday, May 10, 2007

Integrative Bioinformatics 7

XML mediated interoperability between data structures. The basic idea is very simple: [environment/structure 1] --> XML --> [environment/structure 2]. A very powerful suite of tools, formal and computational, have been developed to deal with the mediation so we have plenty to keep you busy for a couple of hours.

For the Matlab centric exercises please note the xmlread, xmlwrite and xslt commends. Have a go at the W3C links listed in the help files of those commands. In today's class we are going to cheat and use tools that both rely on Matlab structures and/or can manipulate them using XPath. I'll also explain why this is a very popular and useful cheat in any language. Note also that the best libraries are often produced by people with the same problems as you. For example see these two: XML Toolbox and XML Parser. I also wrote a library to deal with XML mediation XML4MAT but one of you (no names please) told me it is ugly.

Thursday, May 03, 2007

Integrative Bioinformatics 6

Modelling Strings

Today we have a hands on tutorial on modelling strings and how you can use them to represent and/or retrieve structured information. This will be a class on menial Bioinformatics.

Modelling strings relies heavily on regular expressions. Spend some time getting familiar with the concepts of patterning, lazy and greedy quantifiers and tokens. Try the regexp function with all 5 output arguments to see what is there.

Wednesday, April 25, 2007

Integrative Bioinformatics 5

Case Study: Ovary Cancer data integration at the Kleberg Foundation

(SCRB1. conference room 4)

[Index of classes]

Lets get serious: you now have data structures that describe your (or at least Katherine's) data so the time has come to show you how it all fits together. Instead of having the class in HMB this time you are invited to join a workshop about this topic in the South Campus. We'll do two things:

1. I'll post a sample solution of a script that would assemble a data structure for K's data. The idea is not that my solution is the best, it is just to encourage you to post your own m-files assembling alternative data structures. This way next week we'll be generating multiple XML structures from the different solutions.

2. You are asked to listen to my short presentation at 2:10 pm encouraged to participate in the followup discussion. Here's the full agenda for the afternoon in case you can and want to hear the full story:

Workshop at the Kleberg Center for Biomarker discovery,South Campus Research Building (SCRB1. conference room 4) behind the welcome desk to the right upon entering the building.
01:15 pm -01:30 pm
Dr. Rahul Mitra – Kleberg Center

01:35 pm - 02:05 pm
Dr. Bryan Hennessy – Kleberg Center
Molecular Medicine Studies on ovarian and breast cancers at Kleberg Center

02:10 pm - 02:40 pm
Dr. Jonas Almeida – Kleberg Center
Data collection and analysis at Kleberg Center

02:45 pm - 03:45 pm
All: Open discussion

Wednesday, April 18, 2007

Integrative Bioinformatics 4

XML constructs

[Index of classes]

What is XML?
In this class we are going to learn what are the main features of XML, and how it can be used to build data structures (including some insights on the homework ;-) ). XML is one of the most popular and widely used representation languages on the web. Since it's implementation in 1996 (originally by the SGML Editorial Review Board), it has been widely extended and became a w3c recommentation in 2006. Due to it's flexibility many standard development teams in biomedical research have relied on XML to implement knowledge domain representations, such as MAGE-ML, MiniML, SBML, AGML, etc.
On a different line of though, we are also going to learn how to use S3DB through the API, by building (surprisingly!!) an XML structure that resembles SQL.

For more on XML visit the W3C recommendation and the tutorial.

Google presentations

Google is adding presentations to their Docs and Spreadsheets package.
"Presentations" is a new addition to the Google Docs and Spreadsheets (GOffice) where docs are saved to an online storage facility and can be accessed from any computer. Owners of the documents can edit docs and can grant access to collaborators who are able to modify docs and invite more collaborators; and viewers who can only view the most recent version of the documents or spreadsheets. It seems that about 99% of the time GOffice can replace MS Office. But GOffice has "intarwebbiness", is free, and is so easy to share!

Thursday, April 12, 2007

Nobel Lives: Sidney Brenner and John Sulston

[Sidney Brenner and John Sulston interviewed by Sarah Montague]

Sidney Breener and John Sulston are interviewd by Sarah Montague about their research which lead to receiving the nobel prize.
Areound minute 8:10 Sydney Brenner comments on the distinction between science and technology: "Technology is much harder than science. In Science, if something doesn't work, we change the problem! In technology we have to solve the problem :D"

A very very good interview.

Wednesday, April 11, 2007

Integrative Bioinformatics 3

[Index of classes]
Semantic Constructs using S3DB

The Semantic Web or the Web of Data is a new technology, rather it is a vision. Putting data on the web is a challenge, and there is no standard way to do it. The first step is to build a data structure to share data. But biological data is a tangle, therefore we need a way to represent that tangle. That way is RDF.
In this class we are going to develop an Entity-Relation-Entity structure for a specific data model using S3DB and find out how to immediatelly input data into that structure to make it public.

Tuesday, April 10, 2007

Are we Stuck?

"Should we really expect static data standards to adequately model and report data from rapidly changing technologies and accommodate novel and innovative uses of the data?"

One more insight on the difficulties despite the benefits of developing standards :[Nature Biotechnology - 24, 1374 - 1376 (2006)]

"The authors of data-reporting standards, such as those developed by MGED, are not likely to be the ones painstakingly entering data from their laboratory notebooks to describe experiments— it will be graduate or postdoctoral students who will be required to fulfill these standards only upon the occasion of an impending publication." (humm ... )
"Thus, the pain of meeting biological data-reporting standards is going to be experienced by those with little input into the standards themselves and little incentive to use the standards correctly.

Three Nat.Biotech on standards for Biology

The first stone was thrown by Standard operating procedures - Is biological research ready for the new wave of data-reporting standards currently under development? [Nature Biotechnology - 24, 1299 (2006)], which was shortly followed by a survey of what the community thinks about this - Systems biology standards—the community speaks [Nature Biotechnology - 25, 390 - 391 (2007)] - and an informed oppinion that it is as much up to the funding stick as to the carrot of integration to make it happen - Incentivizing standards development and adoption [Nature Biotechnology - 25, 391 - 392 (2007)].

Sunday, April 08, 2007

Quantifying social group evolution

Interesting article on social evolution and how to quantify such growth (Link to article), and I wonder if there is a biological basis (Link to article) to this manifestation. In other words, how memory formation resembles social networks.

Thursday, April 05, 2007

Integrative Bioinformatics 2

[Index of classes ]

Building Data Structures

The identification of data structures is a deciding step in the ability to achieve integration. Today we'll discuss why and more importantly we'll put those principles to practice by assembling a data structure for Katherine's example data set.


Write a Matlab function that assembles a data structure for the sample data set. Submit that function back to S3DB, anywhere you want that makes sense - remember S3DB is also managing data structures.

If you are not challenged by that then write this function to assemble this data structure directly from its location on the web (I know there are two locations, Google Docs and S3DB, either should work so if you're not challenged enough try to do both ;-) ).

[Index of classes ]

Building Data Structures

The identification of data structures is a deciding step in the ability to achieve integration. Today we'll discuss why and more importantly we'll put together a data structure for Katherine's sample data.


Write a Matlab function that assembles a data structure for the sample data set. If you are not challenged by that then write this function to assemble this data structure directly from its location on the web (I know there are two locations, Google Docs and S3DB, either should work so if you're not challenged enough try to do both ;-) ).

Wednesday, April 04, 2007

The Semantic Web at MIT Technology Review

Yet another memorable webcast of a Tim Berners-Lee fast paced lecture. This one in the 45 min formal seminar:

Opening Keynote - The Semantic Web,
Tim Berners-LeeSeptember 29, 2004 8:40 AM
LOCATION:Kresge Auditorium
SPONSOR INFO:The Emerging Technologies Conference at MIT showcases the technologies that are poised to make a dramatic impact on the world. This two-day event is produced by Technology Review Magazine. It brings together world-renowned innovators and leaders in technology and business for keynote, panel and breakout discussions that center on the transformative technological innovations certain to improve the quality of life, create opportunities and fuel economic growth.

NOTES ON THE VIDEO (Time Index):Video length is 58:03.
At 3:22, Jason Pontin, Editor-in-Chief, Technology Review, introduces Tim Berners-Lee.
At 3:47, Tim Berners-Lee begins.
At 38:25, Q&A begins, with Bob Metcalfe, Founder, 3Com Corporation, and General Partner, Polaris Venture Partners, moderating.
At 48:03 Metcalfe asks Berners-Lee, "What web browser do you use?"

Tuesday, April 03, 2007

A Semantic Web Of Data

Tim Berners Lee talks on MIT Technology Review

Link to Video

Tim Berners Lee talks about what the Semantic Web of Data is and zooms in on its importance for life sciences research. A very significant part of integrative bioinformatics research is to be able to reproducde the results of previous study and for that access to the data has to be easy, both for humans and applications.
Particularly insighfull is his incredibly simple description of the role of RDF in the future web of data: RDF is to data as XML is to documents.

Worse Is Better

The concept known as "worse is better" holds that in software making (and perhaps in other arenas as well) it is better to start with a minimal creation and grow it as needed. Some might call this "piecemeal growth."
Thus, "risk-taking and a willingness to open one’s eyes to new possibilities and a rejection of worse-is-better make an environment where excellence is possible. Xenia invites the duende, which is battled daily because there is the possibility of failure in an aesthetic rather than merely a technical sense."

Tuesday, March 27, 2007

How much does it cost to build an ontology? Answer: 3 t-shirts, 4 coffee mugs, and one chocolate moose


Link to Paper

During two days at a conference focused on circulatory and respiratory health, 68 volunteers untrained in knowledge engineering participated in an experimental knowledge capture exercise. These volunteers created a shared vocabulary of 661 terms, linking these terms to each other and to a pre-existing upper ontology by adding 245 hyponym relationships and 340 synonym relationships. While ontology-building has proved to be an expensive and labor-intensive process using most existing methodologies, the rudimentary ontology constructed in this study was composed in only two days at a cost of only 3 t-shirts, 4 coffee mugs, and one chocolate moose. The protocol used to create and evaluate this ontology involved a targeted, web-based interface. The design and implementation of this protocol is discussed along with quantitative and qualitative assessments of the constructed ontology.

Thursday, March 22, 2007

Integrated Bioinformatics 1

[Index of classes ]

This first class will outline the three main sections covered in this course:

a) Programming environments - enphasis on the minimum toolkit made of Matlab, PHP, Javascript.
b) Data structures - emphasis on using mat, XML, RDF.
c) Integrated data managment and analysis tools - S3DB, Bioinformatics Station.

This session also seeks to establish what programming tools, knowledge and computing resources do the participants pocess. The teaching material will be made available on line with each session. Each class ends with a homework assignment which is also the first topic in teh next session.

The structure of this course will be constructive before being instructive. The first few sessions aim at making the participants familiar, and participative, with the collaborative data management and analysis tools being developed and integrated at MDACC. Therefore, the first aim of this course is to enable those who produce data and those who analyze it to interoperate. Once that is established, this constructive aim will be succeeded by an instructive emphasis on algorithm identification and deployment. This hands on computational statistics component will be complemented with some elements of emerging Integrative Bioinformatics theory.


Those who want Katherine's data please email me your email!

Integrative Bioinformatics

Integrative Bioinformatics (IB) has emerged as the label of choice to describe the development of integrated data management and data analysis infrastructure for the life sciences. The elaboration of the technology required, the sophistication of computational statistics methodologies involved, and the potential for formal abstract representation of very complex Biological phenomena systemically, has led to the coalescence of Integrative Bioinformatics as a research topic of its own. Accordingly, a graduate research program was put in place to train newcomers in this field.

The graduate training in integrative bioinformatics includes a formal program covering founding elements of Biology, Computer Science and Statistics. However, the defining characteristic of the training program, and indeed of the research practice is this field, is its integrative focus. Accordingly, this material is delivered in a problem solving format geared towards the identification and deployment of algorithmic solutions that interoperate with the global, mostly public, suite of bioinformatic resources. This choice of format also anticipates the graduate research itself which includes bioinformatic tool making.

Active courses:

Integrative Bioinformatics 2008 [GS01 0123]

Location HMB 13.356, Tuesdays and Thursdays 11am-1pm


  1. [Jan 8] - Introduction (Jonas)
  2. [Jan 15] - Introduction to programming and data structures in MATLAB. (Jonas)
  3. [Jan 17] - Algorithm deployment illustrated for alignment as a metric. (Jonas)
  4. [Jan 22] - Data structures and data services (Pablo, Jonas)
  5. [Jan 24] - hands on session developing a client for UCSC Genome Browser (everybody)
  6. [Jan 29] - Document object model (DOM) and XML as vehicles for interoperability.
  7. [Jan 31] - Dynamic programming and regular expression homework.
  8. [Feb 05] - Design and practice of Graphic User Interface development.
  9. [Feb 07] - more on GUIs. Discussion of the DILS 2008 challenge.
  10. [Feb 12] - S3DB: a distributed, semantically explicit, RESTful, DBMS.
  11. [Feb 14] - Continuation of last class: S3DB.
  12. [Feb 19] - Homework review of UCSC client (Diogo > Lena > Chunyan > Rys > David)
  13. [Feb 21] - Common Standards vs Common Protocols [Romesh Stanislaus presents]
  14. [Feb 26] - GUI example 1/4 : TCGA client.

Past courses:

Integrative Bioinformatics 2007

Location HMB 13.356, Thursdays 2-4 pm

  1. [March 29 ] - Introduction (Jonas)
  2. [April 5] - Introduction to Data Structures (Jonas)
  3. [April 12] - Semantic constructs using S3DB (Lena)
  4. [April 19] - XML constructs (Romesh & Lena)
  5. [April 27] - Case Study: Ovary Cancer data integration at the Kleberg Foundation (Jonas)
  6. [May 3] - Menial Bioinformatics - modelling strings for parsing data files (Jonas).
  7. [May 10] - XSLT, XPath, XQuery, XML I/O.
  8. [May 16] - NO CLASS TODAY, we are participating in SOA meeting at MDACC. If you want to attend please email me (Jonas) and I'll send you the directions to teh seminar room.
  9. [May 24] - Last session: overview of multivariate exploratory and disciminant statistical analysis methods. Here's the good bey and thank you Note, in 2008 we hope to have IB as a formal topic of GSBS:

-------- Original Message --------
Subject: Classes are over
Date: Thu, 31 May 2007 09:59:02 -0500
From: Jonas S Almeida
To: Helena Deus , Jonas Almeida ,,,, Helena Deus ,,,, Pablo Freire , Katherine Hale

Hi everybody,just in case any of you missed this and is heading for the integrative bioinformatics class today: the classes ended last week with the overview of multivariate exploratory and discriminant analysis methods. I also want to take this opportunity to thank you all so much for your participation in this class. A special thank you to Katherine for bringing her datasets and Biology problems to the table and sharing them with us. I look forward to keep interacting with you. Maybe we should set specialized workshops to frame specific collaborations. In any case, I'll use this first run of the Integrative Bioinformatics class to request it to be added as a subject of GSBS.cheers,

JonasJonas S Almeida, Professor
Dept Bioinformatics and Computational Biology
Univ. Texas MDAnderson Cancer Center - unit 237
1515 Holcombe Blvd, Houston TX 77030-4009, USA
Tel: 713 792 9875 ;
fax: 305 574 5818

Tuesday, March 20, 2007

To ontology or not to ontology?

That is the million dollar (billion :)) question. Read it and come to your own conclusions...
1)New technologies will make online search more intelligent--and may even lead to a "Web 3.0." part 1 and part 2
2)The Darker Side of the Semantic Web click here
3)Adaptation, or more likely contortion, of these technologies to be biologically relevant... coming soon...

Monday, March 19, 2007

Saturday, March 17, 2007

The Third Erich L. Lehmann Symposium takes place May 16 - 19, 2007, at Rice University. Deadline for early registration and submission of contributed talks is April 22.

List of Sessions:

1. Statistical problems in the analysis of genomic and magnetic resonance imaging data.
2. Modeling correlated biomedical data.
3. Multiplicity: Developments and current issues.
4. Multiple testing and subgroup analysis.
5. Probability, Levy Process, and Applications.
6. Regularized methods of classification and estimation of nonparametric regression and covariance matrices when data is high dimensional.
7. Statistical Inference for Population Substructures via Clustering, Mixture Models and other Approaches.
8. Statistical Optimality in Bioinformatics: Theory vs Practice.

Friday, March 16, 2007

PodCast on SPARQL and the Semantic Web

In this ITConversations podcast reccorded 2006-07-17 [MP3], Elias Torres, a senior software engineer at IBM and a member of several W3C working groups, gives us an overview of the Semantic Web and how RDF and SPARQL are set to become the tools of choice when extracting data from the World Wide Web. In an interview, hosted by Phil Windley, Torres discusses what has happened to this technology in the past, where it is hopefully going in the near future, and what you can do today to take advantage of it.

Monday, March 12, 2007

Ontology Evolution: Not the Same as Schema Evolution

Knowledge and Information Systems,Volume 6, Number 4 / July, 2004

This paper addresses some important issued in data integration:
1) how the adoption of standards causes changes in the domain of discourse
2) how the application of the ontology to particular tasks, either data analysis or visualization of data sets, causes the emergience of concurring ontology views (conceptualization)
3) how the translation of an ontology from one knowledge representation language to another affects the specification of the ontology

As ontology development becomes a more ubiquitous and collaborative process, ontology versioning and evolution becomes an important area of ontology research. The many similarities between database-schema evolution and ontology evolution will allow us to build on the extensive research in schema evolution. However, there are also important differences between database schemas and ontologies. The differences stem from different usage paradigms, the presence of explicit semantics and different knowledge models. A lot of problems that existed only in theory in database research come to the forefront as practical problems in ontology evolution. These differences have important implications for the development of ontology-evolution frameworks: The traditional distinction between versioning and evolution is not applicable to ontologies. There are several dimensions along which compatibility between versions must be considered. The set of change operations for ontologies is different. We must develop automatic techniques for finding similarities and differences between versions.

Wednesday, March 07, 2007

Clinical proteomics: A need to define the field and to begin to set adequate standards

[PROTEOMICS - Clinical Applications (2007) 1 (2) 148-156]

A team of 26 authors has attempted to suggest initial and as yet preliminary guidelines for clinical proteome analysis. As stated in the abstract "the aim of this manuscript is to initiate a constructive discussion about the definition of clinical proteomics, study requirements, pitfalls and (potential) use".

Note in particular Table 1 with good practice recommendations for and from experimentalists.

3) Projects

Integrated Data Management (S3DB) and Analysis (BiS)

The various projects at IBL rely on the articulation between two infrastructure resources. One is a semantic database where arbitrary data structures can be stored and managed. This work was described in two reports in Nature Biotech: the rationale at 2005 Sep; 23(9):1099-103, and teh application at 2006 Sep, 24(9):1070-1071. The prototype application, S3DB, is now in use by several experimental groups. The second resource is a code distribution tool that synchronizes client machines with a data analysis environment maintained in a central repository of applicatrions. This tool is designated as Bioinformatics Station (BiS) and, just like S3DB, its analytical modules are made publicly avaiable with open source.

The various projects at IBL are therefore pursued as interoperable modules of a common infrastructure. Most of them rely of collaborations with extramural research groups for theory development, algorithm identification and to a lesser degree application deployment. This creates a de facto distributed research group for which IBL is a front end. For more information about the research work leading to the individual modules of BiS please see

Tuesday, March 06, 2007

Digital Future of the United States: Part I - The Future of the World Wide Web

Tim Berners-Lee congressional hearing March 2, 2007 on the topic "Digital Future of the United States: Part I - The Future of the World Wide Web".

For information on Time Berner-Lee start with his entry on wikipedia and then proceed to his own page at W3C. This hearing was hosted by the Subcommittee on Telecommunications and the Internet, Energy and Commerce Commitee, of the US House of representatives.

Note mention of the Mayo Clinical medical reccord system @20min even before the hearing starts and a few times later. The actual hearing only starts afetr 30 min introduction by the congressional sub-cPlease add you own notes to this entry when you hear it to help the rest of us go back to particular excerpts. Here is the direct link to the podcast.

Data integration at ~ 00:41:40
Life Sciences data integration through semantic web ~ 00:44:00 mins
Web Science at min 48, this topic is expanded in Science. 2006 Aug 11;313(5788):769-71.

--- End of TBL presentation at ~min 58 ---
--- Questioning started at 1:24:30 ---

> 1:24:30 - property rights

2) People

Jonas S Almeida, PhD, Professor,
Laboratory director,
updated list of publications and brief vitae at

Romesh Stanislaus, PhD, Instructor
Dept of Bioinformatics & Computational Biology
Member IBL

Yuliya Karpievitch, Graduate Student
Dept of Bioinformatics & Computational Biology

Helena F Deus

Graduate Student
Dept Bioinformatics and Computational Biology @ MDAnderson Cancer Center
Biomathematics group @ ITQB/UNL - Portugal

Marco Vilela, Graduate Student
Dept of Bioinformatics & Computational Biology

Member IBL


Pablo Freire, Graduate Student
(Pablo, please fill in + we need your pic)


1) Mission Statement

The Integrative Bioinformatics Laboratory (IBL) conducts research on integrated management and analysis of biomolecular data.

IBL is a research unit of the Dept of Bioinformatics and Computational Biology, Division of Quantitative Sciences of The University of Texas Mdanderson Cancer Center, at Houston, Texas. IBL developed and maintains a computational framework for interoperable data management and analysis where predictive modeling in systems Biology. This integrative mission is purssued through theory development, algorithm identification and deployment of data management infrastructure. These prototypes are developed in response to the specific need for seamless systemic integration in biomedical research.

From Bytes to Bedside

From Bytes to Bedside - Data Integration and Computational Biology for Translational Cancer Research. [PLoS Comput Biol 3(2)]

Mathew JP, Taylor BS, Bader GD, Pyarajan S, Antoniotti M, Chinnaiyan AM, Sander C, Burakoff SJ, Mishra B.

Major advances in genome science and molecular technologies provide new opportunities at the interface between basic biological research and medical practice. The unprecedented completeness, accuracy, and volume of genomic and molecular data necessitate a new kind of computational biology for translational research. Key challenges are standardization of data capture and communication, organization of easily accessible repositories, and algorithms for integrated analysis based on heterogeneous sources of information. Also required are new ways of using complementary clinical and biological data, such as computational methods for predicting disease phenotype from molecular and genetic profiling. New combined experimental and computational methods hold the promise of more accurate diagnosis and prognosis as well as more effective prevention and therapy.