Special Software from Greifswald Enhances Accuracy of Genome Annotation for Animals and Plants

Forschung

The precise determination of the structure of protein-coding genes in genome sequences is a key for the biological understanding of life. The success of numerous experiments depends to a great extent on error-free genome annotation. The cataloguing of protein-coding genes in eukaryotic genomes is therefore one of the greatest challenges faced by the Earth BioGenome Project. This aims to sequence the genomes of at least 1.5 million eukaryotic species. Eukaryotes have cells that have a cell nucleus. Eukaryotic organisms include animals, humans, plants, and fungi. Individual genome projects can be used for purposes such as: the targeted treatment of diseases transmitted by animals, the study of gene functions in insects or in the breeding of plants.

 

A central problem faced by many tools for genome annotation is the so-called supervised learning: the underlying mathematical models require training examples that consist of genes in the target species in order to adjust parameters to this target species. This is where the BRAKER3 team is able to build on the experience gained from previous software versions, also including the combined evidence from transcriptomics and protein data in this training step. In contrast to the previous versions of the tool, both evidence types can now be considered simultaneously.

 

In benchmark tests with 11 species, BRAKER3 clearly outperformed the previous versions. The improvement is particularly clear in species with large and complex genomes, e.g. the mouse and the chicken. Furthermore, the new version of the software is much more precise than alternative programmes that have been used extensively in the past.

 

“BRAKER3 represents a considerable advancement in the accuracy and automation capabilities of eukaryotic genome annotation, especially for large and structurally complex genomes,” explains Lars Gabriel from the University of Greifswald’s Institute of Mathematics and lead author of the publication. “The new software version is a tool that is already being used by a large and rapidly growing number of users. The team’s efforts to design the software so that it runs in isolated packages that contain all of the required components for the programme and on various computer systems without extra adjustments has been welcomed particularly positively by the international research community. This principle, which is known as ‘containerization’, was decisively influenced by the excellent high-performance computing infrastructure at Greifswald’s University Computer Centre,” says Dr. Katharina Hoff from the University of Greifswald’s Institute of Mathematics. She has been working on the development of BRAKER for many years.

 

“BRAKER3 marks a significant development in bioinformatics and provides academics all over the world access to a high-performance tool for genome annotation. During the next stages of development, the developers shall specifically enhance and train large language models, as genomes can be understood as a ‘language’ of biology whose encoded genes follow a strict grammar,” explains Prof. Dr. Mario Stanke, Head of the Bioinformatics Research Group at the University of Greifswald’s Institute of Mathematics.

Further Information

For further information, please contact PD Dr. Katharina Hoff at the University of Greifswald.
Paper: https://genome.cshlp.org/content/early/2024/05/28/gr.278090.123.abstract (IF 9.4) Pub Date: 12.6.2024, DOI 10.1101/gr.278090.123 
Research group: Bioinformatics at the Institute of Mathematics and Computer Science
Software: https://github.com/Gaius-Augustus/BRAKER

 

Contact at the University of Greifswald
PD Dr. Katharina Hoff
University of Greifswald
Institute of Mathematics and Computer Science
Walther-Rathenau-Straße 47
17489 Greifswald
Tel.: +49 3834 420 4624
katharina.hoffuni-greifswaldde
X: https://twitter.com/katharina_hoff
Bluesky: @katharinahoff.bsky.social
Fosstodon: @KatharinaHoff@fosstodon.org

 

 


Back