High-speed GÉANT and JANET research networks enable global collaboration on 1000 Genomes Project
Cambridge, UK, June 9 2011 – The European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI), which hosts the world’s largest public collection of molecular biology databases, is using the pan-European GÉANT research network and JANET, the UK research network, to help biologists share vital data across the globe.
GÉANT logo
EMBL-EBI relies on the fast, secure transmission of large amounts of biological data between its campus in rural Cambridgeshire and its many partners around the world. The EMBL-EBI website receives more than 3.5 million information requests every day, and over 80 terabytes of data is transmitted by EMBL-EBI over the high-speed, high-bandwidth JANET and GÉANT networks every month. JANET transports EMBL-EBI information within the UK; GÉANT then communicates it to national networks around Europe, to the US via links with Internet2 and to China via links with CERNET.
EMBL-EBI is a key partner in many global initiatives. One of these, the 1000 Genomes Project, is sequencing the genomes of 2500 people around the world and studying the minute differences that make people unique. The knowledge generated in this project is being used to advance our understanding of human health by explaining genetic susceptibility to disease or responses to particular drugs, for example. Chaired by Richard Durbin at the Wellcome Trust Sanger Institute in the UK and David Altshuler at the Broad Institute in the US, the Project includes participants from Europe, the Americas and Asia. Researchers at EMBL-EBI are responsible for creating a strategy to characterise these variations, and for creating the bioinformatics infrastructure to support the massive movement of these data.
The pilot phase of the 1000 Genomes Project, completed in 2010, created 4.9 terabases of DNA sequence and uncovered 8 million variations that had never been seen before. By its completion in 2012, the project is expected to produce between 60 and 80 terabases of data – the equivalent of around 250,000 gigabytes of data.
Information currently flows to EMBL-EBI from seven sequencing centres in China, Germany, the UK and the US; it is then mirrored to the National Human Genome Research Institute (NHGRI) in the US and supplied to 40 groups across the world for initial analysis. Final datasets are then accessed by another 100 groups of researchers.
“Data generated by biological experiments is doubling every five months, driven by leading-edge initiatives such as the 1000 Genomes Project. Our mission at EMBL-EBI is to make the results of these international collaborations freely available to the scientific community wherever they are located,” said Dr Paul Flicek, Head of Vertebrate Genomics at EMBL-EBI. “To do this we need an infrastructure that is robust, flexible and high performance, linking us to our partners across the globe. Our close working relationship with the JANET and GÉANT networks delivers the speed and capacity that we need, giving us confidence and allowing us to focus on sharing data to push forward scientific progress.”
By 2020, biological data generation is expected to reach thousands of times the current rate. This growth far exceeds predicted increases in storage capacity, meaning current models of centralised data resources will not be able to cope. ELIXIR, an ESFRI (European Strategy Forum on Research Infrastructures) project of global significance, aims to create a stable infrastructure for biological data in Europe that distributes information across multiple locations while making it available to researchers wherever they are. ELIXIR is co-ordinated by EMBL-EBI and is currently entering its construction phase. Once created, ELIXIR will rely on high-speed networks such as GÉANT, JANET and other national research networks across Europe to deliver data in real time to scientists wherever they may be.
“Mapping the DNA of thousands of organisms, including the human genome, is leading to breakthroughs in medical research that can potentially deliver better health outcomes for people across the world,” said Matthew Scott, General Manager of DANTE, the organisation which on behalf of Europe’s National Research and Education Networks (NRENs) has built and operates the GÉANT network. “The European Bioinformatics Institute is leading the way by making these complex, large datasets freely available to international researchers. EMBL-EBI’s use of GÉANT and JANET is at the heart of its mission to create and share information by providing direct access to its vast range of biological data resources and tools. With research now relying on fast access to distributed data resources, high speed, robust networks are at the heart of pushing back the frontiers of scientific knowledge.”