Integrative Bioinformatics Laboratory: 04/01/2007

Wednesday, April 25, 2007

Integrative Bioinformatics 5

Case Study: Ovary Cancer data integration at the Kleberg Foundation

NOTE THE LOCATION FOR THIS CLASS IS IN THE SOUTH CAMPUS
(SCRB1. conference room 4)

[Index of classes]

Lets get serious: you now have data structures that describe your (or at least Katherine's) data so the time has come to show you how it all fits together. Instead of having the class in HMB this time you are invited to join a workshop about this topic in the South Campus. We'll do two things:

1. I'll post a sample solution of a script that would assemble a data structure for K's data. The idea is not that my solution is the best, it is just to encourage you to post your own m-files assembling alternative data structures. This way next week we'll be generating multiple XML structures from the different solutions.

2. You are asked to listen to my short presentation at 2:10 pm encouraged to participate in the followup discussion. Here's the full agenda for the afternoon in case you can and want to hear the full story:

Workshop at the Kleberg Center for Biomarker discovery,South Campus Research Building (SCRB1. conference room 4) behind the welcome desk to the right upon entering the building.
01:15 pm -01:30 pm
Dr. Rahul Mitra – Kleberg Center
Introduction

01:35 pm - 02:05 pm
Dr. Bryan Hennessy – Kleberg Center
Molecular Medicine Studies on ovarian and breast cancers at Kleberg Center

02:10 pm - 02:40 pm
Dr. Jonas Almeida – Kleberg Center
Data collection and analysis at Kleberg Center

02:45 pm - 03:45 pm
All: Open discussion

Wednesday, April 18, 2007

Integrative Bioinformatics 4

XML constructs

[Index of classes]

What is XML?
In this class we are going to learn what are the main features of XML, and how it can be used to build data structures (including some insights on the homework ;-) ). XML is one of the most popular and widely used representation languages on the web. Since it's implementation in 1996 (originally by the SGML Editorial Review Board), it has been widely extended and became a w3c recommentation in 2006. Due to it's flexibility many standard development teams in biomedical research have relied on XML to implement knowledge domain representations, such as MAGE-ML, MiniML, SBML, AGML, etc.
On a different line of though, we are also going to learn how to use S3DB through the API, by building (surprisingly!!) an XML structure that resembles SQL.

For more on XML visit the W3C recommendation and the tutorial.

Google presentations

Google is adding presentations to their Docs and Spreadsheets package.
"Presentations" is a new addition to the Google Docs and Spreadsheets (GOffice) where docs are saved to an online storage facility and can be accessed from any computer. Owners of the documents can edit docs and can grant access to collaborators who are able to modify docs and invite more collaborators; and viewers who can only view the most recent version of the documents or spreadsheets. It seems that about 99% of the time GOffice can replace MS Office. But GOffice has "intarwebbiness", is free, and is so easy to share!

Thursday, April 12, 2007

Nobel Lives: Sidney Brenner and John Sulston

[Sidney Brenner and John Sulston interviewed by Sarah Montague]

Sidney Breener and John Sulston are interviewd by Sarah Montague about their research which lead to receiving the nobel prize.
Areound minute 8:10 Sydney Brenner comments on the distinction between science and technology: "Technology is much harder than science. In Science, if something doesn't work, we change the problem! In technology we have to solve the problem :D"

A very very good interview.

Wednesday, April 11, 2007

Integrative Bioinformatics 3

[Index of classes]
Semantic Constructs using S3DB

The Semantic Web or the Web of Data is a new technology, rather it is a vision. Putting data on the web is a challenge, and there is no standard way to do it. The first step is to build a data structure to share data. But biological data is a tangle, therefore we need a way to represent that tangle. That way is RDF.
In this class we are going to develop an Entity-Relation-Entity structure for a specific data model using S3DB and find out how to immediatelly input data into that structure to make it public.

Tuesday, April 10, 2007

Are we Stuck?

"Should we really expect static data standards to adequately model and report data from rapidly changing technologies and accommodate novel and innovative uses of the data?"

One more insight on the difficulties despite the benefits of developing standards :[Nature Biotechnology - 24, 1374 - 1376 (2006)]

"The authors of data-reporting standards, such as those developed by MGED, are not likely to be the ones painstakingly entering data from their laboratory notebooks to describe experiments— it will be graduate or postdoctoral students who will be required to fulfill these standards only upon the occasion of an impending publication." (humm ... )
"Thus, the pain of meeting biological data-reporting standards is going to be experienced by those with little input into the standards themselves and little incentive to use the standards correctly.

Three Nat.Biotech on standards for Biology

The first stone was thrown by Standard operating procedures - Is biological research ready for the new wave of data-reporting standards currently under development? [Nature Biotechnology - 24, 1299 (2006)], which was shortly followed by a survey of what the community thinks about this - Systems biology standards—the community speaks [Nature Biotechnology - 25, 390 - 391 (2007)] - and an informed oppinion that it is as much up to the funding stick as to the carrot of integration to make it happen - Incentivizing standards development and adoption [Nature Biotechnology - 25, 391 - 392 (2007)].

Sunday, April 08, 2007

Quantifying social group evolution

Interesting article on social evolution and how to quantify such growth (Link to article), and I wonder if there is a biological basis (Link to article) to this manifestation. In other words, how memory formation resembles social networks.

Thursday, April 05, 2007

Integrative Bioinformatics 2

[Index of classes ]

Building Data Structures

The identification of data structures is a deciding step in the ability to achieve integration. Today we'll discuss why and more importantly we'll put those principles to practice by assembling a data structure for Katherine's example data set.

Homework:

Write a Matlab function that assembles a data structure for the sample data set. Submit that function back to S3DB, anywhere you want that makes sense - remember S3DB is also managing data structures.

If you are not challenged by that then write this function to assemble this data structure directly from its location on the web (I know there are two locations, Google Docs and S3DB, either should work so if you're not challenged enough try to do both ;-) ).

[Index of classes ]

Building Data Structures

The identification of data structures is a deciding step in the ability to achieve integration. Today we'll discuss why and more importantly we'll put together a data structure for Katherine's sample data.

Homework:

Write a Matlab function that assembles a data structure for the sample data set. If you are not challenged by that then write this function to assemble this data structure directly from its location on the web (I know there are two locations, Google Docs and S3DB, either should work so if you're not challenged enough try to do both ;-) ).

Wednesday, April 04, 2007

The Semantic Web at MIT Technology Review

Yet another memorable webcast of a Tim Berners-Lee fast paced lecture. This one in the 45 min formal seminar:

Opening Keynote - The Semantic Web,
Tim Berners-LeeSeptember 29, 2004 8:40 AM
LOCATION:Kresge Auditorium
SPONSOR INFO:The Emerging Technologies Conference at MIT showcases the technologies that are poised to make a dramatic impact on the world. This two-day event is produced by Technology Review Magazine. It brings together world-renowned innovators and leaders in technology and business for keynote, panel and breakout discussions that center on the transformative technological innovations certain to improve the quality of life, create opportunities and fuel economic growth.

NOTES ON THE VIDEO (Time Index):Video length is 58:03.
At 3:22, Jason Pontin, Editor-in-Chief, Technology Review, introduces Tim Berners-Lee.
At 3:47, Tim Berners-Lee begins.
At 38:25, Q&A begins, with Bob Metcalfe, Founder, 3Com Corporation, and General Partner, Polaris Venture Partners, moderating.
At 48:03 Metcalfe asks Berners-Lee, "What web browser do you use?"

Tuesday, April 03, 2007

A Semantic Web Of Data

Tim Berners Lee talks on MIT Technology Review

Link to Video

Tim Berners Lee talks about what the Semantic Web of Data is and zooms in on its importance for life sciences research. A very significant part of integrative bioinformatics research is to be able to reproducde the results of previous study and for that access to the data has to be easy, both for humans and applications.
Particularly insighfull is his incredibly simple description of the role of RDF in the future web of data: RDF is to data as XML is to documents.

Worse Is Better

The concept known as "worse is better" holds that in software making (and perhaps in other arenas as well) it is better to start with a minimal creation and grow it as needed. Some might call this "piecemeal growth."
Thus, "risk-taking and a willingness to open one’s eyes to new possibilities and a rejection of worse-is-better make an environment where excellence is possible. Xenia invites the duende, which is battled daily because there is the possibility of failure in an aesthetic rather than merely a technical sense."

Integrative Bioinformatics Laboratory