|
Bioinformatics (WP6)
Aims and Overview
We will build on the existing informatics structures developed under the EUMORPHIA project
to both track mouse and data generation as well as to store and disseminate
phenotype data. We will maintain the EMPReSS SOP database that underpins
EMPReSSslim. We will continue to enhance the EuroPhenome database to provide
ready access to phenome data integrated with phenotype generating SOPs. In
addition, we will develop and institute a tracking system as a component of
EuroPhenome that will allow the user to view the progress of any mutant in the
EUMODIC programme from mutant generation to primary phenotyping to secondary
phenotyping. The workpackage will bring
together results from all the phenotype screens in a single database and
provide a common interface for accessing and searching the data. By doing this
it will integrate the work of the phenotyping work packages and add value to
their individual efforts.
Workplan
The core activity of the workpackage will be to
develop:
·
a system for the acquisition of phenotype data
from mice phenotyped at the different centres participating in the project
·
a tracking system to allow partners to identify
which mice are being phenotyped, the centre(s) working on them, and the current
status of experimental work on them.
At a technical level, this will require us to
implement a system for the input of phenotype data, a database to hold the
data, and a browser and analysis tools for investigators to look at the data
subsequently.
EuroPhenome
database
The database will be based on the structure of the
EuroPhenome database developed under EUMORPHIA. This currently holds much of
the information required to be stored for EUMODIC, but it will need to be
extended to include genotype information and information needed for tracking
mouse lines. The major area for development in the first instance will be the
data acquisition module. Again, we plan to take advantage of advances made
during EUMORPHIA, specifically the ontological schema developed for the
description of mouse phenotypes. In
this, the assay employed to determine phenotype information can be used as a
convenient point-of-entry for data capture by annotating it with ontology terms
describing the nature of the data being captured (i.e. what phenotypic
attribute is being measured) as well as with information about units or
measurements, acceptable maximum and minimum values and so on. We plan to make
use of this concept to provide a user-friendly and quick means for data
capture. It will be necessary to have a working, if minimal, system of this
kind in place early in the project to allow data to be entered from the time
phenotyping starts. All software developed during the project will be made
available to the community on an open source basis.
EMPReSS SOP database
Many of the SOPs used in the project already reside
in the SOP database of EUMORPHIA (
Data dissemination
An important secondary aim will be to have a mechanism in place within the data capture system that
will enable rapid publication of phenotype data onto the web. We also aim to
make the database (or a frequently refreshed copy of it) accessible directly to
other, external software systems, for example via direct SOAP queries, in the
interests of improved data accessibility and integration. Our longer-term aim
in this respect, in collaboration with other phenotype data repositories, is to
develop an integrated system for accessing different flavours of mouse
phenotype information from a single site. A critical feature of our efforts in
data dissemination will be to support links to EUCOMM, EurExpress, EMAGE and
EMMA. The close links established
between these programmes through PRIME
will ensure the development of an integrated data system. The primary data
integration module within EuroPhenome will be at the level of the mouse gene
that has been knocked out in EUMODIC.
This gene will link, via the ENSEMBL/VEGA ID, to numerous data sources,
including the expression data for that gene in EURexpress. A secondary
data integration module will be at the level of the phenotype descriptions6
using ontologies such as PATO. In EuroPhenome, a number of phenotypes
will be annotated with phenotype descriptions.
These will be linked to numerous other data sources including, for
example, expression data in EURexpress that had common descriptions of
anatomical structures.
Links with other phenome databases
It is important that the existing and new databases
for data on mouse functional genomics are linked either physically or by search
terms, so that data can be mined from one database to another. This will make data more accessible and also
add value. Scientists in EUMORPHIA have
been working closely with other mouse phenotyping efforts and have formed close
links with scientists in the JAX, Oak Ridge National Laboratory, RIKEN and Australia. They have met twice at an International
Phenome meeting organised by the PRIME
project (EC- funded Coordination Action).
It is envisaged that these links will be continued through the Casimir
project ((EC- funded Coordination Action, under contract negotiation).
Further development of mouse phenotype
ontologies
Capturing data from the phenotyping experiments
will provide a test of our ontology schema and it is likely that we will be
faced with unforeseen challenges in representing the data. In particular, there
remains an unresolved issue concerning the interpretation of raw data from (for
example) behavioural tests - data such as "time at periphery in an open field
test" require further interpretation to convert them into phenotype
descriptions immediately useful to scientists. We are also likely to discover
incompleteness in the existing ontologies which will need to be remedied.
Furthermore, in order to provide the maximum utility, we will investigate ways
of relating phenotype data to human (clinical) phenotype data to aid inference
about the possible utility of particular mouse lines as models of human
disease. An area of particular emphasis in this respect is the description of
the full range of pathology data (as opposed to histopathology, which is
currently covered by the MPATH ontology), which is likely to be best addressed
using a combinatorial schema similar to that described for phenotypes in
general.
|