[GSoC 2013] Identifying problems with gene predictions: Welcome to the Gene Prediction Project's Blog!

People involved in this project:

GSoC Student: Monica Dragan

Mentors: Anurag Priyam and Yannick Wurm

Genome sequencing is now possible at almost no cost. However, obtaining accurate gene predictions remains a target hard to achieve with the existing biotechnology. The goal of this project is to create a tool that identifies potential problems with the predicted genes, in order to make evidence about how the gene curation can be made or whether a certain predicted gene may not be considered in other analysis. Also, the prediction validation could be used for improving the results of the existing gene prediction tools.

The application takes as input a collection of mRNA / protein predictions (called predicted sequences) and identifies potential problems with each sequence, by matching and comparing them with sequences available in trusted databases (called reference sequences). The tool will determine if the following errors appear in the predicted sequence:

whether the predicted sequence does not have an acceptable length, according to the reference sequence set.
the occurrence of gaps or extra sections in the predicted sequence, according to the reference sequence set.
some of the conserved regions among the reference sequence are absent in the predicted sequence.

The main target users of this tool are the Biologists who want to validate the data obtained in their own laboratories. In the end, the application will be be easily installable as a RubyGem.

[GSoC 2013] Identifying problems with gene predictions

Welcome to the Gene Prediction Project's Blog!

2 comments:

Post a Comment

Pages

Labels

Blog Archive

Meta

About