Open source scientific data sharing

Let’s take this scenario:-

A prominent scientist X publishes a clinical trial that Drug A is effective in a particular disease.

Another scientist Y publishes another article reinforcing that Drug A is indeed an effective treatment.

Then several small studies are done which show that Drug A has harmful side effects. Some of these get accepted in prominent journals, while others are rejected. Five years later, FDA discovers to its surprise (!!!!) that Drug A is indeed associated with harmful side effects.

This has happened, is happening and will happen!!!

The big question is – Is there any way to prevent this?

Welcome to open source scientific data sharing.

Open source data sharing is similar to open source software, where not only the end result (i.e. software) but also the source code is given away for free. Open source data sharing therefore should aim to not only have all the scientific articles available for free but should also include the raw data that was generated during the course of the experiment or clinical trial.

There are thousands of scientists in the world who can benefit from this data and come up with amazing results. Probably, if the data for Vioxx and Vytorin studies was available online, their side effect profile would be exposed much earlier by “entrepreneur” scientists.

This talk given by Clay Shirky at TED in 2005 explains why we need to move from institutionalization towards open collaboration (which can only be enabled by data sharing).


Of course, there are certain differences between his and my proposal. He is giving public the authority to generate data. I am proposing that scientists generate data by careful experiments/clinical trials and publish their results but in addition make the data available for “entrepreneur” scientists to analyze. It is better to have many eyes looking at data rather than a few.

Besides searching for potential side effects in a mountain of data, open source data sharing has bigger benefits. Consider this:

Many NGO’s are currently funding scientific research in the hope of speeding up the process of scientific research to produce results quickly. Stand up to Cancer is bringing the best scientific minds across the world to develop new medications to fight cancer. This is excellent – but it goes back to Clay Shirky’s talk – Institutionalization vs open collaboration. The best brains can generate excellent data and also probably are the best people to analyze it. But, this data generated should be made available for all other scientists on this planet to analyze and build upon.

Historically most inventions happened by accident. Entrepreneur scientists can make these accidents happen more often - only if given access to scientific data.

Tags: , ,

One Response to “Open source scientific data sharing”

  1. [...] topic of data management and sharing, head on over to “Medicine and Man” and hear what open source scientific data sharing is [...]

Leave a Reply