20:
940:. Pseudogenes are identified by means of a phylogenetic analysis. First, a species tree of the species of interest and a phylogenetic tree of the gene (or gene family) of interest are constructed. The two are then compared to identify a species that has lost the gene. Next, within the genome of the species where the gene was not found, a sequence is searched that is orthologous to the gene identified in the closest species. Finally, if this orthologous sequence has a disruption in its ORF (and it meets with other criteria, such as
2552:
773:
305:
216:
602:(CDS) are discontinuous, and, to ensure their proper identification, intronic regions must be filtered. To do so, annotation pipelines must find the exon-intron boundaries, and multiple methodologies have been developed for this purpose. One solution is to use known exon boundaries for alignment; for instance, many introns begin with GT and end with AG. This approach, however, cannot detect novel boundaries, so alternatives like
1142:
1049:
968:(WSSD). It aligns the original reads with the assembled genome and searches for regions with a higher read depth than the average, which usually are signals of duplication. Segmental duplications identified by this method but not by WGAC are likely collapsed duplications, which means that they were mistakenly aligned to the same region.
471:. Repeats are identified based on models of their structure, rather than repetition or similarity. They are capable of identifying real transposons (just like the homology-based ones), but are not biased by known elements. However, they are highly specific to each class of repeat, and, as such, are less universally applicable.
1356:(MGEs). The study of these elements is of great importance in the field of bioremediation, since recently the inoculation of wild or genetically modified strains with these MGEs has been sought in order to acquire these hydrocarbon degradation capacities. In 2013, Phale et al. published the genome annotation of a strain of
1040:
sequence is not. Therefore, by performing a multiple sequence alignment, more useful information can be obtained for their prediction. Homology search may also be employed to identify RNA genes, but this procedure is complicated, especially in eukaryotes, due to presence of a large number of repeats and pseudogenes.
809:, in which every node is a particular function, and every edge (or arrow) between two nodes indicates a parent-child or subcategory-category relationship. As of 2020, GO is the most widely used controlled vocabulary for functional annotation of genes, followed by the MIPS Functional Catalog (FunCat).
800:
Functional annotation of genes requires a controlled vocabulary (or ontology) to name the predicted functional features. However, because there are numerous ways to define gene functions, the annotation process may be hindered when it is performed by different research groups. As such, a standardized
1412:
analysis is of great importance in functional annotation, and specifically in bioremediation it can be applied to know the relationships between the genes of some microorganisms with their functions and their role in the remediation of certain contaminants. This was the approach of the investigation
1226:
Annotation projects often rely on previous annotations of an organism's genome; however, these older annotations may contain errors that can propagate to new annotations. As new genome analysis technologies are developed and richer databases become available, the annotation of some older genomes may
893:
Binary or multiclass classification methods for functional annotation generally produce less accurate results because they do not take into account the interrelations between GO terms. More advanced methods that consider these interrelations do so by either a flat or hierarchical approach, which are
832:
usually perform a similar function. However, orthologous sequences should be treated with caution because of two reasons: (1) they might have different names depending on when they were originally annotated, and (2) they may not perform the same functional role in two different organisms. Annotators
498:(A, C, G, or T) with other letters. By doing so, these regions will be marked as repetitive and downstream analyses will treat them accordingly. Repetitive regions may produce performance issues if they are not masked, and may even produce false evidence for gene annotation (for example, treating an
1039:
prediction of RNA genes in a single genome often yields inaccurate results (with an exception being miRNA), so multi-genome comparative methods are used instead. These methods are specifically concerned with the secondary structures of ncRNA, as they are conserved in related species even when their
837:
sequence when no paralogy, orthology or xenology was found. Homology-based methods have several drawbacks, such as errors in the database, low sensitivity/specificity, inability to distinguish between paralogy and homology, artificially high scores due to the presence of low complexity regions, and
1332:
Gene
Ontology is being used by researchers to establish a disease-gene relationship, as GO helps in the identification of novel genes, the alterations in their expression, distribution and function under a different set of conditions, such as diseased versus healthy. Databases of this disease-gene
233:
As more sequenced genomes began to be available in early and mid 2000s, coupled with the numerous protein sequences that were obtained experimentally, genome annotators began employing homology based methods, launching the third generation of genome annotation. These new methods allowed annotators
211:
start sites) connected by arrows representing the scanning of the sequence. To ensure a Markov model detects a genomic signal, it must first be trained on a series of known genomic signals. The output of Markov models in the context of annotation includes the probabilities of every kind of genomic
101:
and described in a published article. Although describing individual genes and their products or functions is sufficient to consider this description as an annotation, the depth of analysis reported in literature for different genomes vary widely, with some reports including additional information
1167:
aims to identify similarities and differences in genomic features, as well as to examine evolutionary relationships between organisms. Visualization tools capable of illustrating the comparative behavior between two or more genomes are essential for this approach, and can be classified into three
934:. Pseudogenes are identified by searching sequences that are similar to functional genes but contain mutations that produce a disruption in their ORF. This method cannot determine the evolutionary relationship between a pseudogene and its parent gene nor the elapsed time since the event happened.
1299:
community annotation. Supervised community annotation is short-lived and limited to the duration of the event, whereas the unsupervised counterpart does not have this limitation. However, the latter has been less successful than the former presumably due to a lack of time, motivation, incentive
1217:
Community annotation approaches are great techniques for quality control and standardization in genome annotation. An annotation jamboree that took part in 2002, led to the creation of the annotation standards used by the Sanger
Institute's Human and Vertebrate Analysis Project (HAVANA).
102:
that goes beyond a simple annotation. Furthermore, due to the size and complexity of sequenced genomes, DNA annotation is not performed manually, but is instead automated by computational means. However, the conclusions drawn from the obtained results require manual expert analysis.
796:, is involved. Every box is an ontology term that falls into one of the three GO categories and is color-coded respectively. Ontology terms are related to each other through specific qualifiers (such as "is a", "part of", etc.), which are represented by different kinds of arrows.
234:
not only to infer genomic elements through statistical means (as in previous generations) but could also perform their task by comparing the sequence being annotated with other already existing and validated sequences. These so-called combiner annotators, which perform both
431:
Identifying repeats is difficult for two main reasons: they are poorly conserved, and their boundaries are not clearly-defined. Because of this, repeat libraries must be built for the genome of interest, which can be accomplished with one of the following methods:
1333:
relationships of different organisms have been created, such as Plant-Pathogen
Ontology, Plant-Associated Microbe Gene Ontology or DisGeNET. And some others have been implemented in pre-existing databases like Rat Disease Ontology in the Rat Genome database.
1294:
when there is a coordinator who manages the project by requesting the annotation of specific items to a select number of experts. On the other hand, when anyone can enter a project and coordination is accomplished in a decentralized manner, it is called
549:
located in the corresponding genome, providing not only their locations, but also their rates of expression. However, transcripts provide insufficient information for gene prediction because they might be unobtainable from some genes, they may encode
424:(which are larger elements with several copies across the genome). Repeats are a major component of both prokaryotic and eukaryotic genomes; for instance, between 0% and over 42% of prokaryotic genomes consist of repeats and three quarters of the
1133:(focus on one organism and the annotations for particular species). The latter are not necessarily linked to a specific genome database but are general-purpose browsers that can be downloaded and installed as an application on a local computer.
1175:
This scheme only allows to show the alignment of two genomes, one genome is represented along the horizontal axis and the other along the vertical axis and the dots in the plot represent the genomic elements that are similar between these two
53:
direction and their length; they are color-coded based on the cellular function or component they are part of. Represented with arrows, the transcription directions for the inner and outer genes are listed clockwise and anticlockwise,
446:
in a self-genome comparison, thus requiring no prior information about repeat structure or sequences. The disadvantage of these methods is that they can identify any repeated sequence, not just transposons, and may include conserved
451:(CDS), making careful post-processing an indispensable step to remove these sequences. It may also leave out related regions that have degraded over time and may group elements that have no connection in their evolutionary history.
1285:
It is a combination of the jamboree and cottage industry models. It begins with an annotation workshop, followed by a decentralized collaboration to extend and refine the initial annotation. It has been used for multiple species
1276:
1201:
influences the quality of the annotation, so it is important to assess assembly quality before performing the subsequent annotation steps. In order to quantify the quality of a genome annotation, three metrics have been used:
1227:
be updated. This process, known as reannotation, can provide users with new information about the genome, including details about genes and protein functions. Re-annotation is therefore a useful approach in quality control.
48:
are placed in the middle black circle. The outer gray circle shows GC content in the every section of the genome. All individual genes are placed on the outermost circle according to their position in the genome, their
219:
A release timeline of genome annotators. The dotted boxes indicate the four different generations of genome annotators and their most representative characteristics. First generation (blue) where annotators used
113:, which assigns functions to these elements. This is not the only way in which it has been categorized, as several alternatives, such as dimension-based and level-based classifications, have also been proposed.
228:
methods and homology-based annotations, and the fourth generation (orange) in which an approach to identification of the non-coding regions of DNA and study at the population level represented by the pangenome
998:. They consist in the identification of homologous sequences with known DNA binding sites, or by aligning them with query proteins. Their performance is usually low because the DNA binding sequences are less
485:
regions. Although this strategy avoids the poorly-defined boundary problem that exists in other methods, it is highly dependent on assembly quality and the level of activity of transposons in the genomes in
78:, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of
1481:
Genome annotation is an active area of investigation and involves a number of different organizations in the life science community which publish the results of their efforts in publicly available
1304:
1182:
This representation uses multiple linear tracks to represent multiple genomes and their features where "track" is a concept that refers to a specific type of genomic feature at a genomic location.
610:
to predict new ones. Predictors of new exon boundaries usually require efficient data-compression and alignment algorithms, but they are prone to failure in boundaries located in regions with low
820:
search tools. Its premise is that high sequence conservation between two genomic elements implies that their function is conserved as well. Pairs of homologous sequences that appeared through
3590:"Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study"
2215:
Grosjean H, Fiers W (June 1982). "Preferential codon usage in prokaryotic genes: the optimal codon-anticodon interaction energy and the selective codon usage in efficiently expressed genes".
1100:
and EMBL. Some of these formats use controlled vocabularies and ontologies to define their descriptive terminologies and guarantee interoperability between analysis and visualization tools.
962:(WGAC). It aligns the entire genome to itself in order to identify repeated sequences after filtering out common repeats; it does not require having the original reads used for the assembly.
701:(also called statistical, intrinsic, or de novo). CDS prediction is based solely on the information that can be extracted from the DNA sequence. They rely on statistical methods such as the
2413:
1235:
Community annotation consists in the engagement of a community (both scientific and nonscientific) in genome annotation projects. It can be classified into the following six categories:
1114:
Genomic browsers are software products that simplify the analysis and visualization of large genomic sequence and annotation data to gain biological insight, via a graphical interface.
4549:"The Plant-Associated Microbe Gene Ontology (PAMGO) Consortium: community development of new Gene Ontology terms describing biological processes involved in microbe-host interactions"
691:
CDS predictors are faced with a more difficult problem because of the complex organization of eukaryotic genes. CDS prediction methods can be classified into three broad categories:
538:
of the organism being annotated with the genome. Although it is optional, it can improve gene sequence elucidation because RNAs and proteins are direct products of coding sequences.
711:(also called empirical, evidence-driven, or extrinsic). CDS prediction is based on similarity to known sequences. Specifically, it performs alignments of the analyzed sequence with
955:
are DNA segments of more than 1000 base pairs that are repeated in the genome with more than 90% sequence identity. Two strategies used for their identification are WGAC and WSSD:
1188:
This representation facilitates comparison of whole microbial or viral genomes. In this visualization mode, concentric circles and arcs are used to represent genomic sections.
212:
element in every single part of the genome, and an accurate Markov model will assign high probabilities to correct annotations and low probabilities to the incorrect ones.
894:
distinguished by the fact that the former does not take into account the ontology structure, while the latter does. Some of these methods compress the GO terms by
4969:
1316:
526:
The next step after genome masking usually involves aligning all available transcript and protein evidence with the analyzed genome, that is, aligning all known
805:(GO). It classifies functional properties into one of three categories (molecular function, biological process, and cellular component) and organizes them in a
636:
regions, and the last step of structural annotation consists in identifying these features within the genome. In fact, the primary task in genome annotation is
3452:
Saxena R, Bishnoi R, Singla D (2021). "Gene
Ontology: application and importance in functional annotation of the genomic data". In Singh B, Pathak RK (eds.).
347:
sequences contained in the genome are predicted with the help of databases of known DNA, RNA and protein sequences, as well as other supporting information.
514:, the letters of these regions are replaced with N's. This way, for example, soft masking can be used to exclude word matches and avoid initiating an
461:) of known repeats stored in a curated database. These methods are more likely to find real transposons, even in lower quantities, when compared with
738:
Functional annotation assigns functions to the genomic elements found by structural annotation, by relating them to biological processes such as the
1539:
2437:
640:, which is why numerous methods have been developed for this purpose. Gene prediction is a misleading term, as most gene predictors only identify
19:
3037:
Garber M, Grabherr MG, Guttman M, Trapnell C (June 2011). "Computational methods for transcriptome annotation and quantification using RNA-seq".
913:(ncDNA) are those that do not code for proteins. They include elements such as pseudogenes, segmental duplications, binding sites and RNA genes.
224:
methods at a local scale, second generation (red) with genome-wide ab initio methods, third generation (green) characterized by a combination of
648:(UTRs); for this reason, CDS prediction has been proposed as a more accurate term. CDS predictors detect genome features through methods called
442:. Repeats are identified by detecting and grouping pairs of sequences at different locations whose similarity is above a minimum threshold of
4523:
3704:
3469:
3285:
1950:
1777:
2626:
1279:, in which curators go through a training period prior to annotation, and are then given access to annotation tools to continue their work.
1730:
1796:
Mishra P, Maurya R, Avashthi H, Mittal S, Chandra M, Ramteke PW (2021). "Genome assembly and annotation". In Singh DB, Pathak RK (eds.).
870:
methods are also used to generate functional annotations for novel proteins based on GO terms. Generally, they consist in constructing a
191:
sequenced in 1995) introduced a second generation of annotators. Just like in the previous generation, they performed annotation through
1485:
accessible via the web and other electronic means. Here is an alphabetical listing of on-going projects relevant to genome annotation:
2644:
2431:
1813:
860:
and final location of any given protein. Probabilistic methods may be paired with a controlled vocabulary, such as GO; for example,
587:
281:
are also found in new genomes of the same clade. Both annotation strategies constitute the fourth generation of genome annotators.
1626:
Dominguez Del Angel V, Hjerde E, Sterck L, Capella-Gutierrez S, Notredame C, Vinnere
Pettersson O, et al. (5 February 2018).
4184:
1382:
database. This analysis concluded in the localization of the upper pathway genes of naphthalene degradation, right next to the
857:
5043:
Martin R, Hackl T, Hattab G, Fischer MG, Heider D (April 2021). Birol I (ed.). "MOSGA: Modular Open-Source Genome
Annotator".
127:
methods, which are based solely on the information that can be extracted from the DNA sequence on a local scale, that is, one
3082:"Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation"
2947:
1386:
encoding tRNA-Gly and integrase, as well as the identification of the genes encoding enzymes involved in the degradation of
1214:; although these measures are not explicitly used in annotation projects, but rather in discussions of prediction accuracy.
3681:
199:
are the driving force behind many algorithms used within annotators of this generation; these models can be thought of as
754:, etc. It may also be used as an additional quality check by identifying elements that may have been annotated by error.
132:
1374:
as a carbon and energy source. In order to find the MGEs of this bacterium, its genome was annotated using RAST and the
1054:
887:
319:
are masked by using a repeat library. Then, optionally, the masked sequence is aligned with all the available evidence (
98:
4994:"FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences"
1449:
can support a user-friendly web interface and software containerization such as MOSGA. Modern annotation pipelines for
2862:
Alexander RP, Fang G, Rozowsky J, Snyder M, Gerstein MB (August 2010). "Annotating non-coding regions of the genome".
1015:(ncRNA), produced by RNA genes, is a type of RNA that is not translated into a protein. It includes molecules such as
861:
767:
478:
1008:. They employ the three-dimensional structural information of proteins to predict the locations of DNA binding sites.
1474:
A variety of software tools have been developed that allow scientists to view and share genome annotations, such as
975:
are regions in the genome sequence that bind to and interact with specific proteins. They play an important role in
4863:"High quality draft genome sequence of the heavy metal resistant bacterium Halomonas zincidurans type strain B6(T)"
4806:"Insights into functional and evolutionary analysis of carbaryl metabolic pathway from Pseudomonas sp. strain C5pp"
984:
518:
in those regions, and hard masking, apart from all of this, can also exclude masked regions from alignment scores.
1052:
A snapshot of an annotated GBK file created with Prokka. It shows the components (features) of a small portion of
1460:
develops tools for automated annotation of database records based on the textual descriptions of those records.
273:
analysis techniques. Other genome annotators also began to focus on population-level studies represented by the
2504:
1933:
Abril JF, Castellano S (2019). "Genome
Annotation". In Ranganathan S, Nakai K, Schonbach C, Gribskov M (eds.).
1524:
1129:(integrate sequence and annotations of multiple organisms and promote cross-species comparative analysis) and
882:(SVM) is the most widely used binary classifier in functional annotation; however, other algorithms, such as
4510:. Methods in Molecular Biology. Vol. 1374 (2nd ed.). Totowa, N.J.: Humana Press. pp. 89–114.
1353:
1265:
417:
368:
204:
159:
50:
4698:"Catabolic mobile genetic elements and their potential use in bioaugmentation of polluted soils and waters"
2551:
2133:"The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression"
1059:
s genome, including their positions (structural annotation) and inferred functions (functional annotation).
705:(HMM). Some methods employ two or more genomes to infer local mutation rates and patterns along the genome.
420:, which include low-complexity sequences (such as AGAGAGAG, or monopolymeric segments like TTTTTTTTT), and
5116:"Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification"
1263:
Consists of a short intensive workshop with leading curators from the community. It was first used in the
1211:
879:
806:
781:
712:
527:
320:
187:
4598:
Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J, Ronzano F, Centeno E, Sanz F, Furlong LI (January 2020).
3272:. Methods in Molecular Biology. Vol. 1525 (Second ed.). New York: Springer. pp. 271–291.
1981:
Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP, Zaslavsky L, et al. (August 2016).
1160:. Functional annotations of proteins are displayed in distinct colors and homologies in different tones.
952:
924:
919:
are mutated copies of protein-coding genes that lost their coding function due to a disruption in their
875:
747:
292:
became available. As such, genome annotation remains a major challenge for scientists investigating the
208:
178:
during protein synthesis) allowing a more efficient translation. This was also known to be the case for
163:
5167:"RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation"
1883:
Reed JL, Famili I, Thiele I, Palsson BO (February 2006). "Towards multidimensional genome annotation".
772:
4914:"Biodegradation of DDT by Stenotrophomonas sp. DDT-1: Characterization and genome functional analysis"
4649:"The Disease Portals, disease-gene annotation and the RGD disease ontology at the Rat Genome Database"
3691:. Methods in Molecular Biology. Vol. 2324 (Second ed.). New York: Springer. pp. 21–34.
4925:
4817:
2578:
1482:
1446:
1207:
1203:
1164:
895:
834:
793:
555:
482:
285:
25:
2565:
Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, et al. (November 2012).
566:
based approaches are employed, which utilize information from expressed proteins often derived from
351:
Structural annotation describes the precise location of the different elements in a genome, such as
131:(ORF) at a time. They appeared as a necessity to handle the enormous amount of data produced by the
2988:"Review on the Computational Genome Annotation of Sequences Obtained by Next-Generation Sequencing"
883:
702:
673:
657:
645:
611:
607:
391:
376:
304:
4737:
4506:
Cooper L, Jaiswal P (2016). "The Plant
Ontology: A Tool for Plant Genomics". In Edwards D (ed.).
4337:
3718:
3062:
2887:
2706:
2650:
1956:
1908:
1711:
1367:
1358:
1092:
that support each gene model. Some commonly used formats for describing annotations are GenBank,
1085:
999:
920:
817:
515:
499:
443:
352:
239:
128:
1738:
1467:
has an automated procedure for statistically inferring associations between ontology terms and
788:. It shows the molecular functions, biological processes, and cellular components in which the
506:) Depending on the letters used for replacement, masking can be classified as soft or hard: in
5270:
5196:
5165:
Li W, O'Neill KR, Haft DH, DiCuccio M, Chetvernin V, Badretdin A, et al. (January 2021).
5147:
5070:
5025:
4951:
4894:
4843:
4786:
4729:
4678:
4629:
4580:
4529:
4519:
4488:
4439:
4388:
4329:
4294:
4245:
4166:
4112:
4063:
4022:
3940:
3902:
3867:
3816:
3767:
3710:
3700:
3662:
3621:
3570:
3521:
3465:
3431:
3375:
3326:
3291:
3281:
3250:
3201:
3152:
3111:
3054:
3019:
2928:
2879:
2839:
2798:
2747:
2698:
2640:
2604:
2540:
2485:
2427:
2379:
2330:
2281:
2232:
2197:
2162:
2110:
2061:
2012:
1946:
1900:
1865:
1809:
1773:
1703:
1659:
1597:
1399:
1395:
1257:
Annotation is decentralized and is the result of the effort from different part-time curators.
1198:
1097:
874:
for each GO term, which are then joined to make predictions on individual GO terms (forming a
871:
829:
825:
821:
813:
716:
567:
559:
458:
309:
270:
250:
243:
150:, created by Rodger Staden in 1977. It performed several tasks related to annotation, such as
136:
94:
59:
3641:"Predicting protein function from protein/protein interaction data: a probabilistic approach"
1319:
seeks to write articles that describe individual RNAs and RNA families in an accessible way.
1312:
5260:
5252:
5186:
5178:
5137:
5127:
5060:
5052:
5015:
5005:
4941:
4933:
4884:
4874:
4833:
4825:
4776:
4768:
4719:
4709:
4668:
4660:
4619:
4611:
4570:
4560:
4511:
4478:
4470:
4429:
4419:
4378:
4368:
4321:
4284:
4276:
4235:
4227:
4156:
4146:
4102:
4094:
4053:
4012:
4004:
3971:
3932:
3894:
3857:
3847:
3806:
3798:
3757:
3749:
3692:
3652:
3611:
3601:
3560:
3552:
3511:
3501:
3457:
3421:
3411:
3365:
3357:
3318:
3273:
3240:
3232:
3191:
3183:
3142:
3101:
3093:
3046:
3009:
2999:
2918:
2871:
2829:
2788:
2778:
2737:
2690:
2632:
2594:
2586:
2530:
2520:
2475:
2419:
2369:
2361:
2320:
2312:
2271:
2263:
2224:
2189:
2152:
2144:
2100:
2092:
2051:
2043:
2002:
1994:
1938:
1892:
1855:
1845:
1801:
1765:
1695:
1649:
1639:
1587:
1577:
1424:
1153:
972:
867:
603:
254:
215:
179:
3080:
Gupta N, Tanner S, Jaitly N, Adkins JN, Lipton M, Edwards R, et al. (September 2007).
1445:
genome can be annotated using various annotation tools such as FINDER. A modern annotation
3898:
3682:"Methods to Identify and Study the Evolution of Pseudogenes Using a Phylogenetic Approach"
3400:"Protein function prediction with gene ontology: from traditional to deep learning models"
2081:"Codon preference and its use in identifying protein coding regions in long DNA sequences"
1168:
categories based on the representation of the relationships between the compared genomes:
1089:
976:
841:
Functional annotation can be performed through probabilistic methods. The distribution of
661:
641:
637:
629:
623:
599:
546:
448:
356:
340:
313:
289:
284:
By the 2010s, the genome sequences of more than a thousand-human individuals (through the
258:
5241:"DcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more"
4457:
Daub J, Gardner PP, Tate J, Ramsköld D, Manske M, Scott WG, et al. (December 2008).
3785:
Hartasánchez DA, Brasó-Vives M, Heredia-Genestar JM, Pybus M, Navarro A (November 2018).
4929:
4821:
2582:
5265:
5240:
5191:
5166:
5142:
5115:
5020:
4993:
4946:
4913:
4889:
4862:
4838:
4805:
4781:
4756:
4714:
4697:
4673:
4648:
4624:
4599:
4575:
4548:
4483:
4458:
4434:
4407:
4383:
4356:
4289:
4264:
4240:
4215:
4107:
4083:"geneCo: a visualized comparative genomic method to analyze multiple genome structures"
4082:
4017:
3993:"PBrowse: a web-based platform for real-time collaborative exploration of genomic data"
3992:
3862:
3835:
3811:
3786:
3762:
3737:
3565:
3540:
3516:
3489:
3461:
3426:
3399:
3370:
3345:
3245:
3220:
3196:
3171:
3106:
3081:
3014:
2987:
2793:
2766:
2599:
2566:
2535:
2508:
2007:
1982:
1942:
1860:
1833:
1805:
1654:
1627:
1592:
1565:
1468:
1375:
1109:
1069:
853:
563:
200:
147:
143:
139:
90:
5114:
Schwengers O, Jelonek L, Dieckmann MA, Beyvers S, Blom J, Goesmann A (November 2021).
5056:
4192:
4161:
4134:
3616:
3589:
3268:
McHardy AC, Kloetgen A (2017). "Finding Genes in Genome
Sequence". In Keith JM (ed.).
2681:
Yandell M, Ence D (April 2012). "A beginner's guide to eukaryotic genome annotation".
2374:
2349:
2325:
2300:
2276:
2251:
2157:
2132:
2105:
2080:
2056:
2031:
1141:
545:
data is available, it may be used to annotate and quantify all of the genes and their
510:, repetitive regions are indicated with lowercase letters (a, c, g, or t), whereas in
416:
The first step of structural annotation consists in the identification and masking of
142:
techniques developed in the late 1970s. The first software used to analyze sequencing
5290:
5065:
4647:
Hayman GT, Laulederkind SJ, Smith JR, Wang SJ, Petri V, Nigam R, et al. (2016).
4231:
3722:
3657:
3640:
3066:
2742:
2725:
2228:
2193:
1960:
1514:
1409:
1084:, UTRs and alternative transcripts, and ideally should include information about the
1012:
988:
910:
899:
802:
777:
763:
633:
344:
83:
4741:
4406:
Huss JW, Orozco C, Goodale J, Wu C, Batalov S, Vickers TJ, et al. (July 2008).
4151:
4098:
3936:
3753:
3361:
3236:
3187:
3147:
3130:
2923:
2906:
1912:
1715:
5296:
4341:
2891:
2710:
1644:
1420:
1308:
1146:
684:
583:
554:
of more than one gene, and their start and stop codons cannot be determined due to
425:
387:
372:
336:
293:
196:
155:
97:, and is a necessary step in genome analysis before the sequence is deposited in a
4757:"Genome Sequence of Naphthalene-Degrading Soil Bacterium Pseudomonas putida CSV86"
3344:
Binns D, Dimmer E, Huntley R, Barrell D, O'Donovan C, Apweiler R (November 2009).
864:(PPI) networks usually place proteins with similar functions close to each other.
37:
5214:
4424:
3277:
2955:
2525:
2464:"Genome annotation past, present, and future: how to define an ORF at each locus"
1475:
4515:
3696:
2148:
1850:
1769:
1494:
1363:
1349:
1248:
1048:
991:. Binding site prediction involves the use of one of the following two methods:
945:
846:
842:
785:
677:
669:
490:
After the repetitive regions in a genome have been identified, they are masked.
380:
29:
5010:
3606:
3309:
Brent MR, Guigó R (June 2004). "Recent advances in gene structure prediction".
1566:"Chloroplot: An Online Program for the Versatile Plotting of Organelle Genomes"
801:
controlled vocabulary must be employed, the most comprehensive of which is the
249:
In the late 2000s, genome annotation shifted its attention towards identifying
185:
The advent of complete genomes in the 1990s (the first one being the genome of
74:
is the process of describing the structure and function of the components of a
4992:
Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM (April 2021).
4664:
4565:
4280:
3736:
Numanagic I, Gökkaya AS, Zhang L, Berger B, Alkan C, Hach F (September 2018).
3322:
1832:
Bright LA, Burgess SC, Chowdhary B, Swiderski CE, McCarthy FM (October 2009).
1582:
1450:
1387:
980:
916:
849:
751:
743:
739:
688:
681:
579:
495:
421:
384:
332:
277:; by doing so, for instance, annotation pipelines ensure that core genes of a
171:
151:
45:
4772:
3506:
2180:
Fickett JW (August 1996). "Finding genes by computer: the state of the art".
1471:
or combinations of domains from the existing gene/protein-level annotations.
1378:(PGAP), and the identification of nine mobile elements was possible with the
1303:
Knowledge (XXG) has multiple WikiProjects aimed at improving annotation. The
308:
Generalized flowchart of a structural genome annotation pipeline. First, the
2316:
2267:
2047:
1442:
1414:
1352:
degradation by some bacterial strains are encoded by genes located in their
1342:
274:
262:
123:
5274:
5200:
5182:
5151:
5074:
5029:
4955:
4912:
Pan X, Lin D, Zheng Y, Zhang Q, Yin Y, Cai L, et al. (February 2016).
4898:
4879:
4847:
4790:
4733:
4682:
4633:
4584:
4533:
4492:
4443:
4392:
4373:
4333:
4298:
4249:
4170:
4116:
4067:
4026:
3944:
3906:
3871:
3820:
3771:
3714:
3666:
3625:
3574:
3525:
3490:"A Literature Review of Gene Function Prediction by Modeling Gene Ontology"
3435:
3379:
3330:
3295:
3254:
3205:
3156:
3115:
3058:
3023:
3004:
2932:
2883:
2843:
2802:
2751:
2702:
2608:
2544:
2489:
2365:
2096:
2016:
1904:
1869:
1834:"Structural and functional-annotation of an equine whole genome oligoarray"
1707:
1663:
1601:
852:
indicates whether a protein is located in a solution or membrane. Specific
41:
5256:
5132:
5089:
4724:
4615:
4008:
2636:
2383:
2334:
2285:
2236:
2201:
2166:
2114:
729:. CDS prediction is done by a combination of both methods mentioned above.
253:
in DNA, which was achieved thanks to the appearance of methods to analyze
4058:
4041:
3852:
3802:
3787:"Effect of Collapsed Duplications on Diversity Estimates: What to Expect"
3556:
2834:
2817:
2783:
2065:
1998:
1544:
1391:
1251:
by experts is involved to interpret the results of an annotation project.
1028:
789:
175:
158:
counts. In fact, codon usage was the main strategy used by several early
63:
4474:
3991:
Szot PS, Yang A, Wang X, Parsania C, Röhm U, Wong KH, Ho JW (May 2017).
3416:
2590:
166:
regions in a genome contain codons with the most abundant corresponding
3097:
3050:
2480:
2463:
1534:
1519:
1509:
1504:
1499:
1419:
strain B6(T), a bacterium with thirty-one genes encoding resistance to
1371:
1345:
1125:. The former use information from databases and can be classified into
941:
720:
668:
that classify DNA sequences into coding and noncoding content. Whereas
542:
535:
328:
33:
4974:, NBIS -- National Bioinformatics Infrastructure Sweden, 13 April 2022
4937:
4829:
3923:
Seemann T (July 2014). "Prokka: rapid prokaryotic genome annotation".
3738:"Fast characterization of segmental duplications in genome assemblies"
477:. Repeats are identified as disruptions of one or more sequences in a
3976:
3959:
2818:"Discovering and detecting transposable elements in genome sequences"
1699:
1529:
1489:
1403:
1081:
1073:
1024:
591:
551:
364:
316:
75:
2875:
2694:
1896:
1686:
Stein L (July 2001). "Genome annotation: from sequence to biology".
4600:"The DisGeNET knowledge platform for disease genomics: 2019 update"
2423:
4755:
Phale PS, Paliwal V, Raju SC, Modak A, Purohit HJ (January 2013).
4325:
3221:"Evaluation of tools for long read RNA-seq splice-aware alignment"
1383:
1157:
1150:
1140:
1047:
278:
182:, which are often present in proteins expressed at a lower level.
2567:"An integrated map of genetic variation from 1,092 human genomes"
2250:
Grantham R, Gautier C, Gouy M, Mercier R, Pavé A (January 1980).
1433:
as its sole carbon and energy source, to mention a few examples.
1311:
that harvests gene data from research databases and creates gene
927:. They may be identified using one of the following two methods:
4355:
Mazumder R, Natale DA, Julio JA, Yeh LS, Wu CH (February 2010).
2350:"Microbial gene identification using interpolated Markov models"
1760:
Koonin E, Galperin MY (2003). "Genome
Annotation and Analysis".
1464:
1093:
1077:
1032:
1020:
1016:
595:
503:
360:
167:
162:(CDS) prediction methods, based on the assumption that the most
79:
16:
The process of describing the structure and function of a genome
3541:"Functional annotation prediction: all for one and one for all"
3129:
De Bona F, Ossowski S, Schneeberger K, Rätsch G (August 2008).
1457:
606:
algorithms exist that are trained on known exon boundaries and
465:
methods, but are biased towards previously identified families.
2726:"Genesis, effects and fates of repeats in prokaryotic genomes"
1430:
1072:
requires a descriptive output file, which should describe the
531:
324:
266:
4547:
Torto-Alalibo T, Collmer CW, Gwinn-Giglio M (February 2009).
4216:"Genome (re-)annotation and open-source annotation pipelines"
2907:"Search and clustering orders of magnitude faster than BLAST"
2509:"A user's guide to the encyclopedia of DNA elements (ENCODE)"
1379:
3885:
Griffiths-Jones S (2007). "Annotating noncoding RNA genes".
3836:"An overview of the prediction of protein DNA-binding sites"
1628:"Ten steps to get started in Genome Assembly and Annotation"
1564:
Zheng S, Poczai P, Hyvönen J, Tang J, Amiryousefi A (2020).
109:, which identifies and demarcates elements in a genome, and
4804:
Trivedi VD, Jangir PK, Sharma R, Phale PS (December 2016).
4459:"The RNA WikiProject: community annotation of RNA families"
4135:"The past, present and future of genome-wide re-annotation"
1242:
Annotation is performed by a completely automated pipeline.
948:, etc.), it means that the sequence is indeed a pseudogene.
4312:
Hartl DL (April 2000). "Fly meets shotgun: shotgun wins".
2628:
The dictionary of genomics, transcriptomics and proteomics
2348:
Salzberg SL, Delcher AL, Kasif S, White O (January 1998).
2724:
Treangen TJ, Abraham AL, Touchon M, Rocha EP (May 2009).
203:
where nodes represent different genomic signals (such as
4263:
Loveland JE, Gilbert JG, Griffiths E, Harrow JL (2012).
3219:
Križanovic K, Echchiki A, Roux J, Šikic M (March 2018).
1935:
Encyclopedia of Bioinformatics and Computational Biology
812:
Some conventional methods for functional annotation are
4408:"A gene wiki for community annotation of gene function"
3346:"QuickGO: a web-based tool for Gene Ontology searching"
1402:, hydroxyphenyl acetic acid, and the recognition of an
878:) for which confidence scores are later obtained. The
3488:
Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G (2020).
1145:
A linear comparative genome visualization of several
121:
The first generation of genome annotators used local
3131:"Optimal spliced alignments of short sequence reads"
1937:(1st ed.). Elsevier Science. pp. 195–209.
1080:
structures of each annotation, their start and stop
407:
Feature prediction (coding and noncoding sequences).
4042:"A brief introduction to web-based genome browsers"
3172:"TopHat: discovering splice junctions with RNA-Seq"
2131:Gribskov M, Devereux J, Burgess RR (January 1984).
598:(coding regions) are joined. Therefore, eukaryotic
1800:(1st ed.). Elsevier Science. pp. 49–66.
105:DNA annotation is classified into two categories:
44:. The number of genes, the genome length, and the
1362:(CSV86), a bacterium known for its preference of
195:methods, but now applied on a genome-wide scale.
2857:
2855:
2853:
2412:Soh J, Gordon PM, Sensen CW (4 September 2012).
1275:A variation of the museum model, applied in the
614:or high error-rates produced during sequencing.
582:genomes has an extra layer of difficulty due to
86:in a genome and determines what those genes do.
4861:Huo YY, Li ZY, Cheng H, Wang CS, Xu XW (2014).
4185:"Manual Annotation - Wellcome Sanger Institute"
4128:
4126:
3170:Trapnell C, Pachter L, Salzberg SL (May 2009).
2252:"Codon catalog usage and the genome hypothesis"
2126:
2124:
1976:
1974:
1972:
1970:
1764:(1st ed.). Springer US. pp. 193–226.
1621:
1619:
1617:
1615:
1613:
1611:
1540:Vertebrate and Genome Annotation Project (Vega)
838:significant variation within a protein family.
394:. The main steps of structural annotation are:
3918:
3916:
3454:Bioinformatics : methods and applications
2676:
2674:
2672:
2670:
2668:
2301:"GeneMark.hmm: new solutions for gene finding"
676:(ORFs), which are segments of DNA between the
656:that identify functional site signals such as
4696:Top EM, Springael D, Boon N (November 2002).
3447:
3445:
1983:"NCBI prokaryotic genome annotation pipeline"
1406:involved in glucose transport in the strain.
8:
3887:Annual Review of Genomics and Human Genetics
3456:. London: Academic Press. pp. 145–157.
2981:
2979:
2977:
2975:
2973:
2457:
2455:
1827:
1825:
1791:
1789:
238:and homology-based annotation, require fast
4040:Wang J, Kong L, Gao G, Luo J (March 2013).
3840:International Journal of Molecular Sciences
3588:Sinha S, Lynn AM, Desai DK (October 2020).
3483:
3481:
2816:Bergman CM, Quesneville H (November 2007).
2771:International Journal of Molecular Sciences
2620:
2618:
2299:Lukashin AV, Borodovsky M (February 1998).
1928:
1926:
1924:
1922:
1681:
1679:
1677:
1675:
1673:
1376:NCBI Prokaryotic Genome Annotation Pipeline
404:Splice identification (only in eukaryotes).
3539:Sasson O, Kaplan N, Linial M (June 2006).
2407:
2405:
2403:
2401:
2399:
2397:
2395:
2393:
89:Annotation is performed after a genome is
5264:
5190:
5141:
5131:
5064:
5019:
5009:
4945:
4888:
4878:
4837:
4780:
4723:
4713:
4672:
4623:
4574:
4564:
4482:
4433:
4423:
4382:
4372:
4288:
4239:
4160:
4150:
4106:
4057:
4016:
3975:
3861:
3851:
3810:
3761:
3656:
3615:
3605:
3564:
3515:
3505:
3425:
3415:
3393:
3391:
3389:
3369:
3244:
3195:
3146:
3105:
3013:
3003:
2922:
2833:
2792:
2782:
2741:
2598:
2534:
2524:
2479:
2373:
2324:
2275:
2156:
2104:
2055:
2006:
1859:
1849:
1731:"Medical Definition of Genome annotation"
1653:
1643:
1591:
1581:
3958:Valeev T, Yevshin I, Kolpakov F (2013).
2503:ENCODE Project Consortium (April 2011).
1798:Bioinformatics: Methods and Applications
771:
457:. Repeats are identified by similarity (
303:
214:
170:(the molecules responsible for carrying
18:
4265:"Community gene annotation in practice"
2079:Staden R, McLachlan AD (January 1982).
1556:
1458:National Center for Biomedical Ontology
966:Whole-genome Shotgun Sequence Detection
4081:Jung J, Kim JI, Yi G (December 2019).
906:Noncoding sequence function prediction
331:) of the organism being annotated. In
4214:Siezen RJ, van Hijum SA (July 2010).
3899:10.1146/annurev.genom.8.080706.092419
3311:Current Opinion in Structural Biology
1490:Encyclopedia of DNA elements (ENCODE)
1290:A community annotation is said to be
1117:Genomic browsers can be divided into
594:(non-coding regions) are removed and
428:are composed of repetitive elements.
7:
2032:"Sequence data handling by computer"
1453:genomes are Bakta, Prokka and PGAP.
1137:Comparative visualization of genomes
2986:Ejigu GF, Jung J (September 2020).
2631:(Fifth ed.). Weinheim: Wiley.
902:, thus boosting their performance.
780:(GO) ancestor chart organized as a
758:Coding sequence function prediction
494:means replacing the letters of the
4715:10.1111/j.1574-6941.2002.tb01009.x
3462:10.1016/B978-0-323-89775-4.00015-8
2418:. New York: Chapman and Hall/CRC.
1943:10.1016/B978-0-12-809633-8.20226-4
1806:10.1016/B978-0-323-89775-4.00013-4
1068:Visualization of annotations in a
398:Repeat identification and masking.
255:transcription factor binding sites
242:algorithms to identify regions of
14:
4357:"Community annotation in biology"
3834:Si J, Zhao R, Wu R (March 2015).
1429:DDT-1, a strain capable of using
996:Sequence similarity based methods
412:Repeat identification and masking
339:must be identified. Finally, the
5239:Fang H, Gough J (January 2013).
4232:10.1111/j.1751-7915.2010.00191.x
2743:10.1111/j.1574-6976.2009.00169.x
2550:
960:Whole-Genome Assembly Comparison
890:(CNN), have also been employed.
672:CDS predictors mostly deal with
5094:mosga.mathematik.uni-marburg.de
5057:10.1093/bioinformatics/btaa1003
4152:10.1186/gb-2002-3-2-comment2001
3680:Dainat J, Pontarotti P (2021).
2767:"Repetitive Elements in Humans"
1762:Sequence — Evolution — Function
858:posttranslational modifications
3658:10.1093/bioinformatics/btg1026
1645:10.12688/f1000research.13598.1
1380:Insertion Sequence (IS) Finder
560:translation initiation factors
401:Evidence alignment (optional).
1:
5251:(Database issue): D536–D544.
4867:Standards in Genomic Sciences
4133:Ouzounis CA, Karp PD (2002).
4099:10.1093/bioinformatics/btz596
3937:10.1093/bioinformatics/btu153
3754:10.1093/bioinformatics/bty586
3362:10.1093/bioinformatics/btp536
3237:10.1093/bioinformatics/btx668
3188:10.1093/bioinformatics/btp120
3148:10.1093/bioinformatics/btn300
2924:10.1093/bioinformatics/btq461
1277:Knockout Mouse Project (KOMP)
4425:10.1371/journal.pbio.0060175
3791:Genome Biology and Evolution
3639:Letovsky S, Kasif S (2003).
3278:10.1007/978-1-4939-6622-6_11
2526:10.1371/journal.pbio.1001046
2229:10.1016/0378-1119(82)90157-3
2194:10.1016/0168-9525(96)10038-X
1149:of phylogenetically related
1123:stand-alone genomic browsers
1055:Candidatus Carsonella ruddii
888:convolutional neural network
816:-based, which rely on local
588:post-transcriptional process
502:(ORF) in a transposon as an
4516:10.1007/978-1-4939-3167-5_5
4046:Briefings in Bioinformatics
3697:10.1007/978-1-0716-1503-4_2
2822:Briefings in Bioinformatics
1851:10.1186/1471-2105-10-S11-S8
1770:10.1007/978-1-4757-3783-7_6
1307:, for instance, operates a
862:protein-protein interaction
768:Protein function prediction
479:multiple sequence alignment
475:Comparative genomic methods
5315:
5011:10.1186/s12859-021-04120-9
3607:10.1186/s12859-020-03794-x
2462:Brent MR (December 2005).
2030:Staden R (November 1977).
1729:Davis CP (29 March 2021).
1269:genome annotation project.
1119:web-based genomic browsers
1107:
985:transcriptional regulation
761:
621:
5066:21.11116/0000-0006-FED4-D
4702:FEMS Microbiology Ecology
4566:10.1186/1471-2180-9-S1-S1
3323:10.1016/j.sbi.2004.05.007
2905:Edgar RC (October 2010).
2765:Liehr T (February 2021).
2730:FEMS Microbiology Reviews
2149:10.1093/nar/12.1part2.539
1583:10.3389/fgene.2020.576124
562:. To solve this problem,
4773:10.1128/genomeA.00234-12
3507:10.3389/fgene.2020.00400
2864:Nature Reviews. Genetics
2683:Nature Reviews. Genetics
1885:Nature Reviews. Genetics
1688:Nature Reviews. Genetics
1525:Mouse Genome Informatics
1515:Gene Ontology Consortium
1261:Party or jamboree model:
1186:Circular representation:
644:(CDS) and do not report
4665:10.1093/database/baw034
4281:10.1093/database/bas009
4220:Microbial Biotechnology
3960:"BioUML Genome Browser"
3687:. In Poliseno L (ed.).
1354:mobile genetic elements
1266:Drosophila melanogaster
1255:Cottage industry model:
1031:, as well as noncoding
1006:Structure based methods
856:provide information on
713:expressed sequence tags
628:A genome is divided in
528:expressed sequence tags
469:Structure-based methods
160:protein coding sequence
5245:Nucleic Acids Research
5171:Nucleic Acids Research
4880:10.1186/1944-3277-9-30
4604:Nucleic Acids Research
4374:10.1186/1745-6150-5-12
3997:Nucleic Acids Research
3651:(Suppl 1): i197–i204.
3398:Vu TT, Jung J (2021).
3005:10.3390/biology9090295
2354:Nucleic Acids Research
2305:Nucleic Acids Research
2256:Nucleic Acids Research
2137:Nucleic Acids Research
2085:Nucleic Acids Research
2036:Nucleic Acids Research
1987:Nucleic Acids Research
1423:, especially zinc and
1413:and identification of
1300:and/or communication.
1180:Linear representation:
1161:
1060:
953:Segmental duplications
938:Phylogeny-based method
880:support vector machine
807:directed acyclic graph
797:
782:directed acyclic graph
709:Homology-based methods
455:Homology-based methods
348:
230:
188:Haemophilus influenzae
55:
5133:10.1099/mgen.0.000685
3494:Frontiers in Genetics
2637:10.1002/9783527678679
2317:10.1093/nar/26.4.1107
2268:10.1093/nar/8.1.197-c
2048:10.1093/nar/4.11.4037
1570:Frontiers in Genetics
1463:As a general method,
1416:Halomonas zincidurans
1341:A great diversity of
1144:
1051:
932:Homology-based method
876:multiclass classifier
792:, a component of the
775:
734:Functional annotation
622:Further information:
574:Splice identification
444:sequence conservation
307:
300:Structural annotation
265:structure, and other
218:
111:functional annotation
107:structural annotation
22:
5219:ncbo.bioontology.org
5183:10.1093/nar/gkaa1105
5051:(22–23): 5514–5515.
4761:Genome Announcements
4508:Plant Bioinformatics
3853:10.3390/ijms16035194
3557:10.1110/ps.062185706
2784:10.3390/ijms22042072
2366:10.1093/nar/26.2.544
2097:10.1093/nar/10.1.141
1483:biological databases
1426:Stenotrophomonas sp.
1283:Gatekeeper approach:
1231:Community annotation
1165:Comparative genomics
896:matrix factorization
794:extracellular matrix
646:untranslated regions
286:1000 Genomes Project
26:Porphyra umbilicalis
5257:10.1093/nar/gks1080
5177:(D1): D1020–D1028.
4930:2016NatSR...621332P
4822:2016NatSR...638430T
4616:10.1093/nar/gkz1021
4475:10.1261/rna.1200508
4009:10.1093/nar/gkw1358
3417:10.7717/peerj.12019
2591:10.1038/nature11632
2583:2012Natur.491...56T
2143:(1 Pt 2): 539–549.
1315:on that basis. The
1197:The quality of the
1086:sequence alignments
1035:-like transcripts.
923:(ORF), making them
911:Noncoding sequences
884:k-nearest neighbors
703:hidden Markov model
674:open reading frames
608:quality information
353:open reading frames
296:and other genomes.
32:genome annotation (
23:A visualization of
5120:Microbial Genomics
4998:BMC Bioinformatics
4918:Scientific Reports
4810:Scientific Reports
4195:on 2 February 2023
4145:(2): COMMENT2001.
4059:10.1093/bib/bbs029
3803:10.1093/gbe/evy223
3594:BMC Bioinformatics
3098:10.1101/gr.6427907
3051:10.1038/nmeth.1613
2958:on 3 February 2020
2948:"Sequence masking"
2835:10.1093/bib/bbm048
2481:10.1101/gr.3866105
2182:Trends in Genetics
1999:10.1093/nar/gkw569
1838:BMC Bioinformatics
1741:on 9 February 2023
1576:(576124): 576124.
1368:aromatic compounds
1359:Pseudomonas putida
1273:Blessed annotator:
1162:
1061:
921:open reading frame
833:often refer to an
798:
618:Feature prediction
522:Evidence alignment
500:open reading frame
481:produced by large
349:
310:repetitive regions
251:non-coding regions
231:
129:open reading frame
56:
4938:10.1038/srep21332
4830:10.1038/srep38430
4610:(D1): D845–D855.
4525:978-1-4939-3167-5
4469:(12): 2462–2464.
4093:(24): 5303–5305.
3931:(14): 2068–2069.
3797:(11): 2899–2905.
3748:(17): i706–i714.
3706:978-1-0716-1503-4
3471:978-0-323-89775-4
3356:(22): 3045–3046.
3287:978-1-4939-6622-6
3141:(16): i174–i180.
2917:(19): 2460–2461.
2474:(12): 1777–1786.
2415:Genome Annotation
2042:(11): 4037–4051.
1993:(14): 6614–6624.
1952:978-0-12-811432-2
1779:978-1-4757-3783-7
1400:phenylacetic acid
1396:4-hydroxybenzoate
1328:Disease diagnosis
1199:sequence assembly
973:DNA binding sites
872:binary classifier
717:complementary DNA
612:sequence coverage
568:mass spectrometry
377:regulatory motifs
271:regulatory region
180:synonymous codons
72:genome annotation
60:molecular biology
5304:
5279:
5278:
5268:
5236:
5230:
5229:
5227:
5225:
5215:"NCBO Annotator"
5211:
5205:
5204:
5194:
5162:
5156:
5155:
5145:
5135:
5111:
5105:
5104:
5102:
5100:
5085:
5079:
5078:
5068:
5040:
5034:
5033:
5023:
5013:
4989:
4983:
4982:
4981:
4979:
4966:
4960:
4959:
4949:
4909:
4903:
4902:
4892:
4882:
4858:
4852:
4851:
4841:
4801:
4795:
4794:
4784:
4752:
4746:
4745:
4727:
4717:
4693:
4687:
4686:
4676:
4644:
4638:
4637:
4627:
4595:
4589:
4588:
4578:
4568:
4553:BMC Microbiology
4544:
4538:
4537:
4503:
4497:
4496:
4486:
4454:
4448:
4447:
4437:
4427:
4403:
4397:
4396:
4386:
4376:
4352:
4346:
4345:
4309:
4303:
4302:
4292:
4275:(2012): bas009.
4260:
4254:
4253:
4243:
4211:
4205:
4204:
4202:
4200:
4191:. Archived from
4189:www.sanger.ac.uk
4181:
4175:
4174:
4164:
4154:
4130:
4121:
4120:
4110:
4078:
4072:
4071:
4061:
4037:
4031:
4030:
4020:
3988:
3982:
3981:
3979:
3955:
3949:
3948:
3920:
3911:
3910:
3882:
3876:
3875:
3865:
3855:
3846:(3): 5194–5215.
3831:
3825:
3824:
3814:
3782:
3776:
3775:
3765:
3733:
3727:
3726:
3686:
3677:
3671:
3670:
3660:
3636:
3630:
3629:
3619:
3609:
3585:
3579:
3578:
3568:
3551:(6): 1557–1562.
3536:
3530:
3529:
3519:
3509:
3485:
3476:
3475:
3449:
3440:
3439:
3429:
3419:
3395:
3384:
3383:
3373:
3341:
3335:
3334:
3306:
3300:
3299:
3265:
3259:
3258:
3248:
3216:
3210:
3209:
3199:
3182:(9): 1105–1111.
3167:
3161:
3160:
3150:
3126:
3120:
3119:
3109:
3092:(9): 1362–1377.
3077:
3071:
3070:
3034:
3028:
3027:
3017:
3007:
2983:
2968:
2967:
2965:
2963:
2954:. Archived from
2943:
2937:
2936:
2926:
2902:
2896:
2895:
2859:
2848:
2847:
2837:
2813:
2807:
2806:
2796:
2786:
2762:
2756:
2755:
2745:
2721:
2715:
2714:
2678:
2663:
2662:
2660:
2658:
2653:on 4 August 2022
2649:. Archived from
2622:
2613:
2612:
2602:
2562:
2556:
2555:
2554:
2548:
2538:
2528:
2500:
2494:
2493:
2483:
2459:
2450:
2449:
2447:
2445:
2440:on 18 April 2023
2436:. Archived from
2409:
2388:
2387:
2377:
2345:
2339:
2338:
2328:
2311:(4): 1107–1115.
2296:
2290:
2289:
2279:
2247:
2241:
2240:
2212:
2206:
2205:
2177:
2171:
2170:
2160:
2128:
2119:
2118:
2108:
2076:
2070:
2069:
2059:
2027:
2021:
2020:
2010:
1978:
1965:
1964:
1930:
1917:
1916:
1880:
1874:
1873:
1863:
1853:
1844:(Suppl 11): S8.
1829:
1820:
1819:
1793:
1784:
1783:
1757:
1751:
1750:
1748:
1746:
1737:. Archived from
1726:
1720:
1719:
1700:10.1038/35080529
1683:
1668:
1667:
1657:
1647:
1623:
1606:
1605:
1595:
1585:
1561:
1305:Gene WikiProject
1131:species-specific
1127:multiple-species
1090:gene predictions
868:Machine learning
790:matrilin complex
652:, which include
642:coding sequences
604:machine learning
600:coding sequences
449:coding sequences
357:coding sequences
5314:
5313:
5307:
5306:
5305:
5303:
5302:
5301:
5287:
5286:
5283:
5282:
5238:
5237:
5233:
5223:
5221:
5213:
5212:
5208:
5164:
5163:
5159:
5113:
5112:
5108:
5098:
5096:
5087:
5086:
5082:
5042:
5041:
5037:
4991:
4990:
4986:
4977:
4975:
4968:
4967:
4963:
4911:
4910:
4906:
4860:
4859:
4855:
4803:
4802:
4798:
4754:
4753:
4749:
4695:
4694:
4690:
4646:
4645:
4641:
4597:
4596:
4592:
4559:(Suppl 1): S1.
4546:
4545:
4541:
4526:
4505:
4504:
4500:
4456:
4455:
4451:
4405:
4404:
4400:
4354:
4353:
4349:
4314:Nature Genetics
4311:
4310:
4306:
4262:
4261:
4257:
4213:
4212:
4208:
4198:
4196:
4183:
4182:
4178:
4132:
4131:
4124:
4080:
4079:
4075:
4039:
4038:
4034:
3990:
3989:
3985:
3964:Virtual Biology
3957:
3956:
3952:
3922:
3921:
3914:
3884:
3883:
3879:
3833:
3832:
3828:
3784:
3783:
3779:
3735:
3734:
3730:
3707:
3684:
3679:
3678:
3674:
3638:
3637:
3633:
3587:
3586:
3582:
3545:Protein Science
3538:
3537:
3533:
3487:
3486:
3479:
3472:
3451:
3450:
3443:
3397:
3396:
3387:
3343:
3342:
3338:
3308:
3307:
3303:
3288:
3267:
3266:
3262:
3218:
3217:
3213:
3169:
3168:
3164:
3128:
3127:
3123:
3086:Genome Research
3079:
3078:
3074:
3036:
3035:
3031:
2985:
2984:
2971:
2961:
2959:
2945:
2944:
2940:
2904:
2903:
2899:
2876:10.1038/nrg2814
2861:
2860:
2851:
2815:
2814:
2810:
2764:
2763:
2759:
2723:
2722:
2718:
2695:10.1038/nrg3174
2680:
2679:
2666:
2656:
2654:
2647:
2625:Kahl G (2015).
2624:
2623:
2616:
2577:(7422): 56–65.
2564:
2563:
2559:
2549:
2519:(4): e1001046.
2502:
2501:
2497:
2468:Genome Research
2461:
2460:
2453:
2443:
2441:
2434:
2411:
2410:
2391:
2347:
2346:
2342:
2298:
2297:
2293:
2249:
2248:
2244:
2214:
2213:
2209:
2179:
2178:
2174:
2130:
2129:
2122:
2078:
2077:
2073:
2029:
2028:
2024:
1980:
1979:
1968:
1953:
1932:
1931:
1920:
1897:10.1038/nrg1769
1882:
1881:
1877:
1831:
1830:
1823:
1816:
1795:
1794:
1787:
1780:
1759:
1758:
1754:
1744:
1742:
1728:
1727:
1723:
1685:
1684:
1671:
1625:
1624:
1609:
1563:
1562:
1558:
1553:
1469:protein domains
1439:
1339:
1330:
1325:
1317:RNA WikiProject
1249:Manual curation
1233:
1224:
1195:
1193:Quality control
1139:
1112:
1106:
1104:Genome browsers
1066:
1046:
989:viral infection
977:DNA replication
944:data analysis,
908:
854:sequence motifs
770:
760:
736:
666:content sensors
638:gene prediction
626:
624:Gene prediction
620:
576:
524:
414:
302:
290:model organisms
259:DNA methylation
201:directed graphs
119:
17:
12:
11:
5:
5312:
5311:
5308:
5300:
5299:
5289:
5288:
5281:
5280:
5231:
5206:
5157:
5106:
5080:
5045:Bioinformatics
5035:
4984:
4961:
4904:
4853:
4796:
4767:(1): 234–235.
4747:
4725:1854/LU-348539
4708:(2): 199–208.
4688:
4639:
4590:
4539:
4524:
4498:
4449:
4398:
4361:Biology Direct
4347:
4320:(4): 327–328.
4304:
4255:
4226:(4): 362–369.
4206:
4176:
4139:Genome Biology
4122:
4087:Bioinformatics
4073:
4052:(2): 131–143.
4032:
3983:
3977:10.12704/vb/e8
3950:
3925:Bioinformatics
3912:
3877:
3826:
3777:
3742:Bioinformatics
3728:
3705:
3672:
3645:Bioinformatics
3631:
3580:
3531:
3477:
3470:
3441:
3385:
3350:Bioinformatics
3336:
3317:(3): 264–272.
3301:
3286:
3270:Bioinformatics
3260:
3231:(5): 748–754.
3225:Bioinformatics
3211:
3176:Bioinformatics
3162:
3135:Bioinformatics
3121:
3072:
3045:(6): 469–477.
3039:Nature Methods
3029:
2969:
2938:
2911:Bioinformatics
2897:
2870:(8): 559–571.
2849:
2828:(6): 382–392.
2808:
2757:
2736:(3): 539–571.
2716:
2689:(5): 329–342.
2664:
2645:
2614:
2557:
2495:
2451:
2432:
2424:10.1201/b12682
2389:
2360:(2): 544–548.
2340:
2291:
2262:(1): r49–r62.
2242:
2223:(3): 199–209.
2207:
2188:(8): 316–320.
2172:
2120:
2091:(1): 141–156.
2071:
2022:
1966:
1951:
1918:
1891:(2): 130–141.
1875:
1821:
1814:
1785:
1778:
1752:
1721:
1694:(7): 493–503.
1669:
1607:
1555:
1554:
1552:
1549:
1548:
1547:
1542:
1537:
1532:
1527:
1522:
1517:
1512:
1507:
1502:
1497:
1492:
1438:
1435:
1338:
1337:Bioremediation
1335:
1329:
1326:
1324:
1321:
1288:
1287:
1280:
1270:
1258:
1252:
1243:
1240:Factory model:
1232:
1229:
1223:
1220:
1194:
1191:
1190:
1189:
1183:
1177:
1138:
1135:
1110:Genome browser
1108:Main article:
1105:
1102:
1070:genome browser
1065:
1062:
1045:
1042:
1010:
1009:
1003:
970:
969:
963:
950:
949:
935:
925:untranslatable
907:
904:
759:
756:
735:
732:
731:
730:
724:
706:
654:signal sensors
619:
616:
578:Annotation of
575:
572:
564:proteogenomics
523:
520:
488:
487:
472:
466:
452:
413:
410:
409:
408:
405:
402:
399:
301:
298:
288:) and several
148:Staden Package
140:DNA sequencing
118:
115:
84:coding regions
68:DNA annotation
15:
13:
10:
9:
6:
4:
3:
2:
5310:
5309:
5298:
5295:
5294:
5292:
5285:
5276:
5272:
5267:
5262:
5258:
5254:
5250:
5246:
5242:
5235:
5232:
5220:
5216:
5210:
5207:
5202:
5198:
5193:
5188:
5184:
5180:
5176:
5172:
5168:
5161:
5158:
5153:
5149:
5144:
5139:
5134:
5129:
5125:
5121:
5117:
5110:
5107:
5095:
5091:
5084:
5081:
5076:
5072:
5067:
5062:
5058:
5054:
5050:
5046:
5039:
5036:
5031:
5027:
5022:
5017:
5012:
5007:
5003:
4999:
4995:
4988:
4985:
4973:
4972:
4965:
4962:
4957:
4953:
4948:
4943:
4939:
4935:
4931:
4927:
4923:
4919:
4915:
4908:
4905:
4900:
4896:
4891:
4886:
4881:
4876:
4872:
4868:
4864:
4857:
4854:
4849:
4845:
4840:
4835:
4831:
4827:
4823:
4819:
4815:
4811:
4807:
4800:
4797:
4792:
4788:
4783:
4778:
4774:
4770:
4766:
4762:
4758:
4751:
4748:
4743:
4739:
4735:
4731:
4726:
4721:
4716:
4711:
4707:
4703:
4699:
4692:
4689:
4684:
4680:
4675:
4670:
4666:
4662:
4658:
4654:
4650:
4643:
4640:
4635:
4631:
4626:
4621:
4617:
4613:
4609:
4605:
4601:
4594:
4591:
4586:
4582:
4577:
4572:
4567:
4562:
4558:
4554:
4550:
4543:
4540:
4535:
4531:
4527:
4521:
4517:
4513:
4509:
4502:
4499:
4494:
4490:
4485:
4480:
4476:
4472:
4468:
4464:
4460:
4453:
4450:
4445:
4441:
4436:
4431:
4426:
4421:
4417:
4413:
4409:
4402:
4399:
4394:
4390:
4385:
4380:
4375:
4370:
4366:
4362:
4358:
4351:
4348:
4343:
4339:
4335:
4331:
4327:
4326:10.1038/74125
4323:
4319:
4315:
4308:
4305:
4300:
4296:
4291:
4286:
4282:
4278:
4274:
4270:
4266:
4259:
4256:
4251:
4247:
4242:
4237:
4233:
4229:
4225:
4221:
4217:
4210:
4207:
4194:
4190:
4186:
4180:
4177:
4172:
4168:
4163:
4158:
4153:
4148:
4144:
4140:
4136:
4129:
4127:
4123:
4118:
4114:
4109:
4104:
4100:
4096:
4092:
4088:
4084:
4077:
4074:
4069:
4065:
4060:
4055:
4051:
4047:
4043:
4036:
4033:
4028:
4024:
4019:
4014:
4010:
4006:
4002:
3998:
3994:
3987:
3984:
3978:
3973:
3969:
3965:
3961:
3954:
3951:
3946:
3942:
3938:
3934:
3930:
3926:
3919:
3917:
3913:
3908:
3904:
3900:
3896:
3892:
3888:
3881:
3878:
3873:
3869:
3864:
3859:
3854:
3849:
3845:
3841:
3837:
3830:
3827:
3822:
3818:
3813:
3808:
3804:
3800:
3796:
3792:
3788:
3781:
3778:
3773:
3769:
3764:
3759:
3755:
3751:
3747:
3743:
3739:
3732:
3729:
3724:
3720:
3716:
3712:
3708:
3702:
3698:
3694:
3690:
3683:
3676:
3673:
3668:
3664:
3659:
3654:
3650:
3646:
3642:
3635:
3632:
3627:
3623:
3618:
3613:
3608:
3603:
3599:
3595:
3591:
3584:
3581:
3576:
3572:
3567:
3562:
3558:
3554:
3550:
3546:
3542:
3535:
3532:
3527:
3523:
3518:
3513:
3508:
3503:
3499:
3495:
3491:
3484:
3482:
3478:
3473:
3467:
3463:
3459:
3455:
3448:
3446:
3442:
3437:
3433:
3428:
3423:
3418:
3413:
3409:
3405:
3401:
3394:
3392:
3390:
3386:
3381:
3377:
3372:
3367:
3363:
3359:
3355:
3351:
3347:
3340:
3337:
3332:
3328:
3324:
3320:
3316:
3312:
3305:
3302:
3297:
3293:
3289:
3283:
3279:
3275:
3271:
3264:
3261:
3256:
3252:
3247:
3242:
3238:
3234:
3230:
3226:
3222:
3215:
3212:
3207:
3203:
3198:
3193:
3189:
3185:
3181:
3177:
3173:
3166:
3163:
3158:
3154:
3149:
3144:
3140:
3136:
3132:
3125:
3122:
3117:
3113:
3108:
3103:
3099:
3095:
3091:
3087:
3083:
3076:
3073:
3068:
3064:
3060:
3056:
3052:
3048:
3044:
3040:
3033:
3030:
3025:
3021:
3016:
3011:
3006:
3001:
2997:
2993:
2989:
2982:
2980:
2978:
2976:
2974:
2970:
2957:
2953:
2949:
2942:
2939:
2934:
2930:
2925:
2920:
2916:
2912:
2908:
2901:
2898:
2893:
2889:
2885:
2881:
2877:
2873:
2869:
2865:
2858:
2856:
2854:
2850:
2845:
2841:
2836:
2831:
2827:
2823:
2819:
2812:
2809:
2804:
2800:
2795:
2790:
2785:
2780:
2776:
2772:
2768:
2761:
2758:
2753:
2749:
2744:
2739:
2735:
2731:
2727:
2720:
2717:
2712:
2708:
2704:
2700:
2696:
2692:
2688:
2684:
2677:
2675:
2673:
2671:
2669:
2665:
2652:
2648:
2646:9783527678679
2642:
2638:
2634:
2630:
2629:
2621:
2619:
2615:
2610:
2606:
2601:
2596:
2592:
2588:
2584:
2580:
2576:
2572:
2568:
2561:
2558:
2553:
2546:
2542:
2537:
2532:
2527:
2522:
2518:
2514:
2510:
2506:
2499:
2496:
2491:
2487:
2482:
2477:
2473:
2469:
2465:
2458:
2456:
2452:
2439:
2435:
2433:9780429064012
2429:
2425:
2421:
2417:
2416:
2408:
2406:
2404:
2402:
2400:
2398:
2396:
2394:
2390:
2385:
2381:
2376:
2371:
2367:
2363:
2359:
2355:
2351:
2344:
2341:
2336:
2332:
2327:
2322:
2318:
2314:
2310:
2306:
2302:
2295:
2292:
2287:
2283:
2278:
2273:
2269:
2265:
2261:
2257:
2253:
2246:
2243:
2238:
2234:
2230:
2226:
2222:
2218:
2211:
2208:
2203:
2199:
2195:
2191:
2187:
2183:
2176:
2173:
2168:
2164:
2159:
2154:
2150:
2146:
2142:
2138:
2134:
2127:
2125:
2121:
2116:
2112:
2107:
2102:
2098:
2094:
2090:
2086:
2082:
2075:
2072:
2067:
2063:
2058:
2053:
2049:
2045:
2041:
2037:
2033:
2026:
2023:
2018:
2014:
2009:
2004:
2000:
1996:
1992:
1988:
1984:
1977:
1975:
1973:
1971:
1967:
1962:
1958:
1954:
1948:
1944:
1940:
1936:
1929:
1927:
1925:
1923:
1919:
1914:
1910:
1906:
1902:
1898:
1894:
1890:
1886:
1879:
1876:
1871:
1867:
1862:
1857:
1852:
1847:
1843:
1839:
1835:
1828:
1826:
1822:
1817:
1815:9780323897754
1811:
1807:
1803:
1799:
1792:
1790:
1786:
1781:
1775:
1771:
1767:
1763:
1756:
1753:
1740:
1736:
1732:
1725:
1722:
1717:
1713:
1709:
1705:
1701:
1697:
1693:
1689:
1682:
1680:
1678:
1676:
1674:
1670:
1665:
1661:
1656:
1651:
1646:
1641:
1637:
1633:
1632:F1000Research
1629:
1622:
1620:
1618:
1616:
1614:
1612:
1608:
1603:
1599:
1594:
1589:
1584:
1579:
1575:
1571:
1567:
1560:
1557:
1550:
1546:
1543:
1541:
1538:
1536:
1533:
1531:
1528:
1526:
1523:
1521:
1518:
1516:
1513:
1511:
1508:
1506:
1503:
1501:
1498:
1496:
1493:
1491:
1488:
1487:
1486:
1484:
1479:
1477:
1472:
1470:
1466:
1461:
1459:
1454:
1452:
1448:
1444:
1436:
1434:
1432:
1428:
1427:
1422:
1418:
1417:
1411:
1410:Gene Ontology
1407:
1405:
1401:
1397:
1393:
1389:
1385:
1381:
1377:
1373:
1369:
1365:
1361:
1360:
1355:
1351:
1347:
1344:
1336:
1334:
1327:
1322:
1320:
1318:
1314:
1310:
1306:
1301:
1298:
1293:
1284:
1281:
1278:
1274:
1271:
1268:
1267:
1262:
1259:
1256:
1253:
1250:
1247:
1246:Museum model:
1244:
1241:
1238:
1237:
1236:
1230:
1228:
1222:Re-annotation
1221:
1219:
1215:
1213:
1209:
1205:
1200:
1192:
1187:
1184:
1181:
1178:
1174:
1171:
1170:
1169:
1166:
1159:
1155:
1152:
1148:
1143:
1136:
1134:
1132:
1128:
1124:
1120:
1115:
1111:
1103:
1101:
1099:
1095:
1091:
1087:
1083:
1079:
1075:
1071:
1063:
1058:
1056:
1050:
1044:Visualization
1043:
1041:
1038:
1034:
1030:
1026:
1022:
1018:
1014:
1013:Noncoding RNA
1007:
1004:
1001:
997:
994:
993:
992:
990:
986:
982:
978:
974:
967:
964:
961:
958:
957:
956:
954:
947:
943:
939:
936:
933:
930:
929:
928:
926:
922:
918:
914:
912:
905:
903:
901:
897:
891:
889:
885:
881:
877:
873:
869:
865:
863:
859:
855:
851:
848:
844:
839:
836:
831:
827:
823:
819:
815:
810:
808:
804:
803:Gene Ontology
795:
791:
787:
783:
779:
778:Gene Ontology
774:
769:
765:
764:Gene Ontology
757:
755:
753:
749:
745:
741:
733:
728:
725:
722:
718:
714:
710:
707:
704:
700:
698:
694:
693:
692:
690:
686:
683:
679:
675:
671:
667:
663:
659:
655:
651:
647:
643:
639:
635:
631:
625:
617:
615:
613:
609:
605:
601:
597:
593:
589:
585:
581:
573:
571:
569:
565:
561:
557:
553:
548:
544:
539:
537:
533:
529:
521:
519:
517:
513:
509:
505:
501:
497:
493:
484:
480:
476:
473:
470:
467:
464:
460:
456:
453:
450:
445:
441:
439:
435:
434:
433:
429:
427:
423:
419:
411:
406:
403:
400:
397:
396:
395:
393:
389:
386:
382:
378:
374:
370:
366:
362:
358:
354:
346:
342:
338:
334:
330:
326:
322:
318:
315:
311:
306:
299:
297:
295:
291:
287:
282:
280:
276:
272:
268:
264:
260:
256:
252:
247:
245:
241:
237:
227:
223:
217:
213:
210:
206:
205:transcription
202:
198:
197:Markov models
194:
190:
189:
183:
181:
177:
173:
169:
165:
161:
157:
153:
149:
145:
141:
138:
134:
133:Maxam-Gilbert
130:
126:
125:
116:
114:
112:
108:
103:
100:
96:
92:
87:
85:
81:
77:
73:
69:
65:
61:
54:respectively.
52:
51:transcription
47:
43:
39:
35:
31:
28:
27:
21:
5284:
5248:
5244:
5234:
5222:. Retrieved
5218:
5209:
5174:
5170:
5160:
5123:
5119:
5109:
5097:. Retrieved
5093:
5083:
5048:
5044:
5038:
5001:
4997:
4987:
4976:, retrieved
4970:
4964:
4924:(1): 21332.
4921:
4917:
4907:
4870:
4866:
4856:
4816:(1): 38430.
4813:
4809:
4799:
4764:
4760:
4750:
4705:
4701:
4691:
4656:
4652:
4642:
4607:
4603:
4593:
4556:
4552:
4542:
4507:
4501:
4466:
4462:
4452:
4415:
4412:PLOS Biology
4411:
4401:
4364:
4360:
4350:
4317:
4313:
4307:
4272:
4268:
4258:
4223:
4219:
4209:
4197:. Retrieved
4193:the original
4188:
4179:
4142:
4138:
4090:
4086:
4076:
4049:
4045:
4035:
4000:
3996:
3986:
3967:
3963:
3953:
3928:
3924:
3890:
3886:
3880:
3843:
3839:
3829:
3794:
3790:
3780:
3745:
3741:
3731:
3688:
3675:
3648:
3644:
3634:
3597:
3593:
3583:
3548:
3544:
3534:
3497:
3493:
3453:
3407:
3403:
3353:
3349:
3339:
3314:
3310:
3304:
3269:
3263:
3228:
3224:
3214:
3179:
3175:
3165:
3138:
3134:
3124:
3089:
3085:
3075:
3042:
3038:
3032:
2995:
2991:
2960:. Retrieved
2956:the original
2951:
2941:
2914:
2910:
2900:
2867:
2863:
2825:
2821:
2811:
2774:
2770:
2760:
2733:
2729:
2719:
2686:
2682:
2655:. Retrieved
2651:the original
2627:
2574:
2570:
2560:
2516:
2513:PLOS Biology
2512:
2498:
2471:
2467:
2442:. Retrieved
2438:the original
2414:
2357:
2353:
2343:
2308:
2304:
2294:
2259:
2255:
2245:
2220:
2216:
2210:
2185:
2181:
2175:
2140:
2136:
2088:
2084:
2074:
2039:
2035:
2025:
1990:
1986:
1934:
1888:
1884:
1878:
1841:
1837:
1797:
1761:
1755:
1743:. Retrieved
1739:the original
1734:
1724:
1691:
1687:
1638:(148): 148.
1635:
1631:
1573:
1569:
1559:
1480:
1473:
1462:
1455:
1440:
1425:
1421:heavy metals
1415:
1408:
1357:
1348:involved in
1340:
1331:
1323:Applications
1302:
1297:unsupervised
1296:
1291:
1289:
1282:
1272:
1264:
1260:
1254:
1245:
1239:
1234:
1225:
1216:
1196:
1185:
1179:
1176:annotations.
1172:
1163:
1147:type species
1130:
1126:
1122:
1118:
1116:
1113:
1067:
1064:File formats
1053:
1036:
1011:
1005:
995:
971:
965:
959:
951:
937:
931:
915:
909:
892:
866:
840:
811:
799:
737:
726:
708:
696:
695:
665:
653:
649:
627:
584:RNA splicing
577:
540:
525:
512:hard masking
511:
508:soft masking
507:
491:
489:
474:
468:
462:
454:
437:
436:
430:
426:human genome
415:
373:splice sites
350:
337:splice sites
283:
248:
235:
232:
225:
221:
192:
186:
184:
122:
120:
110:
106:
104:
88:
82:and all the
71:
67:
57:
40:) made with
24:
4418:(7): e175.
3893:: 279–298.
3689:Pseudogenes
2777:(4): 2072.
1735:MedicineNet
1495:Entrez Gene
1451:prokaryotic
1441:Genes in a
1364:naphthalene
1350:hydrocarbon
946:dN/dS ratio
917:Pseudogenes
850:amino acids
847:hydrophobic
843:hydrophilic
784:taken from
776:An example
748:development
719:(cDNA), or
670:prokaryotic
662:polyA sites
556:frameshifts
496:nucleotides
422:transposons
209:translation
172:amino acids
36:accession:
30:chloroplast
5224:8 February
5088:Martin R.
5004:(1): 205.
4873:(30): 30.
4659:: baw034.
4003:(9): e67.
3600:(1): 466.
3410:: e12019.
2998:(9): 295.
2952:drive5.com
1551:References
1443:eukaryotic
1388:salicylate
1366:and other
1292:supervised
1173:Dot Plots:
886:(kNN) and
762:See also:
752:metabolism
744:cell death
740:cell cycle
723:sequences.
689:eukaryotic
580:eukaryotic
333:eukaryotic
164:translated
46:GC content
42:Chloroplot
38:MF385003.1
4367:(1): 12.
3970:(1): 15.
3723:235625288
3067:205419756
2946:Edgar R.
2505:Becker PB
1961:226248103
1343:catabolic
1208:precision
1037:Ab initio
1000:conserved
835:analogous
826:orthology
818:alignment
727:Combiners
697:Ab initio
658:promoters
634:noncoding
590:in which
516:alignment
486:question.
483:insertion
392:promoters
345:noncoding
335:genomes,
314:assembled
275:pangenome
263:chromatin
240:alignment
236:ab initio
226:ab initio
222:ab initio
193:ab initio
124:ab initio
95:assembled
91:sequenced
5291:Category
5275:23161684
5201:33270901
5152:34739369
5099:25 April
5075:33258916
5030:33879057
4978:25 April
4956:26888254
4899:25945155
4848:27924916
4791:23469351
4742:15173391
4734:19709279
4683:27009807
4653:Database
4634:31680165
4585:19278549
4534:26519402
4493:18945806
4444:18613750
4393:20167071
4334:10742085
4299:22434843
4269:Database
4250:21255336
4199:28 March
4171:11864365
4117:31350879
4068:22764121
4027:28100700
3945:24642063
3907:17506659
3872:25756377
3821:30364947
3772:30423092
3715:34165706
3667:12855458
3626:33076816
3575:16672244
3526:32391061
3436:34513334
3380:19744993
3331:15193305
3296:27896725
3255:29069314
3206:19289445
3157:18689821
3116:17690205
3059:21623353
3024:32962098
2962:25 April
2933:20709691
2884:20628352
2844:17932080
2803:33669810
2752:19396957
2703:22510764
2657:24 April
2609:23128226
2545:21526222
2490:16339376
2444:18 April
2017:27342282
1913:13107786
1905:16418748
1870:19811692
1745:17 April
1716:12044602
1708:11433356
1664:29568489
1602:33101394
1545:WormBase
1447:pipeline
1437:Software
1392:benzoate
1212:accuracy
1154:families
1029:microRNA
830:xenology
822:paralogy
814:homology
715:(ESTs),
547:isoforms
536:proteins
530:(ESTs),
459:homology
355:(ORFs),
329:proteins
244:homology
176:ribosome
99:database
64:genetics
5266:3531119
5192:7779008
5143:8743544
5090:"MOSGA"
5021:8056616
4947:4758049
4926:Bibcode
4890:4286145
4839:5141477
4818:Bibcode
4782:3587945
4674:4805243
4625:7145631
4576:2654661
4484:2590952
4435:2443188
4384:2834641
4342:5354139
4290:3308165
4241:3815804
4108:6954651
4018:5605237
3863:4394471
3812:6239678
3763:6129265
3566:2242553
3517:7193026
3500:: 400.
3427:8395570
3371:2773257
3246:6192213
3197:2672628
3107:1950905
3015:7565776
2992:Biology
2892:6617359
2794:7922087
2711:3352427
2600:3498066
2579:Bibcode
2536:3079585
2507:(ed.).
2384:9421513
2335:9461475
2286:6986610
2237:6751939
2202:8783942
2167:6694906
2115:7063399
2008:5001611
1861:3226197
1655:5850084
1593:7545089
1535:Uniprot
1520:GeneRIF
1510:GENCODE
1505:FlyBase
1500:Ensembl
1372:glucose
1346:enzymes
1096:, GTF,
942:RNA-Seq
900:hashing
786:QuickGO
721:protein
699:methods
650:sensors
592:introns
552:operons
543:RNA-Seq
492:Masking
463:de novo
440:methods
438:De novo
418:repeats
369:repeats
365:introns
359:(CDS),
261:sites,
174:to the
146:is the
117:History
34:GenBank
5273:
5263:
5199:
5189:
5150:
5140:
5126:(11).
5073:
5028:
5018:
4954:
4944:
4897:
4887:
4846:
4836:
4789:
4779:
4740:
4732:
4681:
4671:
4632:
4622:
4583:
4573:
4532:
4522:
4491:
4481:
4442:
4432:
4391:
4381:
4340:
4332:
4297:
4287:
4248:
4238:
4169:
4162:139008
4159:
4115:
4105:
4066:
4025:
4015:
3943:
3905:
3870:
3860:
3819:
3809:
3770:
3760:
3721:
3713:
3703:
3665:
3624:
3617:574302
3614:
3573:
3563:
3524:
3514:
3468:
3434:
3424:
3378:
3368:
3329:
3294:
3284:
3253:
3243:
3204:
3194:
3155:
3114:
3104:
3065:
3057:
3022:
3012:
2931:
2890:
2882:
2842:
2801:
2791:
2750:
2709:
2701:
2643:
2607:
2597:
2571:Nature
2543:
2533:
2488:
2430:
2382:
2375:147303
2372:
2333:
2326:147337
2323:
2284:
2277:327256
2274:
2235:
2200:
2165:
2158:321069
2155:
2113:
2106:326122
2103:
2066:593900
2064:
2057:343220
2054:
2015:
2005:
1959:
1949:
1911:
1903:
1868:
1858:
1812:
1776:
1714:
1706:
1662:
1652:
1600:
1590:
1530:RefSeq
1404:operon
1204:recall
1158:genera
1082:codons
1074:intron
1027:, and
1025:snoRNA
987:, and
981:repair
898:or by
685:codons
664:, and
630:coding
390:, and
388:codons
341:coding
327:, and
317:genome
312:of an
229:begun.
137:Sanger
76:genome
4738:S2CID
4338:S2CID
3719:S2CID
3685:(PDF)
3404:PeerJ
3063:S2CID
2888:S2CID
2707:S2CID
1957:S2CID
1909:S2CID
1712:S2CID
1476:MAKER
1384:genes
1370:over
1313:stubs
1286:data.
1151:viral
828:, or
678:start
596:exons
381:start
361:exons
294:human
279:clade
168:tRNAs
156:codon
144:reads
80:genes
5271:PMID
5226:2023
5197:PMID
5148:PMID
5101:2022
5071:PMID
5026:PMID
4980:2022
4971:GAAS
4952:PMID
4895:PMID
4844:PMID
4787:PMID
4730:PMID
4679:PMID
4657:2016
4630:PMID
4581:PMID
4530:PMID
4520:ISBN
4489:PMID
4440:PMID
4389:PMID
4330:PMID
4295:PMID
4273:2012
4246:PMID
4201:2023
4167:PMID
4113:PMID
4064:PMID
4023:PMID
3941:PMID
3903:PMID
3868:PMID
3817:PMID
3768:PMID
3711:PMID
3701:ISBN
3663:PMID
3622:PMID
3571:PMID
3522:PMID
3466:ISBN
3432:PMID
3376:PMID
3327:PMID
3292:PMID
3282:ISBN
3251:PMID
3202:PMID
3153:PMID
3112:PMID
3055:PMID
3020:PMID
2964:2023
2929:PMID
2880:PMID
2840:PMID
2799:PMID
2748:PMID
2699:PMID
2659:2023
2641:ISBN
2605:PMID
2541:PMID
2486:PMID
2446:2023
2428:ISBN
2380:PMID
2331:PMID
2282:PMID
2233:PMID
2217:Gene
2198:PMID
2163:PMID
2111:PMID
2062:PMID
2013:PMID
1947:ISBN
1901:PMID
1866:PMID
1810:ISBN
1774:ISBN
1747:2023
1704:PMID
1660:PMID
1598:PMID
1465:dcGO
1456:The
1210:and
1156:and
1121:and
1094:GFF3
1088:and
1078:exon
1033:mRNA
1021:rRNA
1017:tRNA
979:and
845:and
766:and
682:stop
680:and
660:and
632:and
586:, a
558:and
534:and
532:RNAs
504:exon
385:stop
383:and
343:and
325:RNAs
321:ESTs
269:and
207:and
154:and
152:base
135:and
93:and
62:and
5297:DNA
5261:PMC
5253:doi
5187:PMC
5179:doi
5138:PMC
5128:doi
5061:hdl
5053:doi
5016:PMC
5006:doi
4942:PMC
4934:doi
4885:PMC
4875:doi
4834:PMC
4826:doi
4777:PMC
4769:doi
4720:hdl
4710:doi
4669:PMC
4661:doi
4620:PMC
4612:doi
4571:PMC
4561:doi
4512:doi
4479:PMC
4471:doi
4463:RNA
4430:PMC
4420:doi
4379:PMC
4369:doi
4322:doi
4285:PMC
4277:doi
4236:PMC
4228:doi
4157:PMC
4147:doi
4103:PMC
4095:doi
4054:doi
4013:PMC
4005:doi
3972:doi
3933:doi
3895:doi
3858:PMC
3848:doi
3807:PMC
3799:doi
3758:PMC
3750:doi
3693:doi
3653:doi
3612:PMC
3602:doi
3561:PMC
3553:doi
3512:PMC
3502:doi
3458:doi
3422:PMC
3412:doi
3366:PMC
3358:doi
3319:doi
3274:doi
3241:PMC
3233:doi
3192:PMC
3184:doi
3143:doi
3102:PMC
3094:doi
3047:doi
3010:PMC
3000:doi
2919:doi
2872:doi
2830:doi
2789:PMC
2779:doi
2738:doi
2691:doi
2633:doi
2595:PMC
2587:doi
2575:491
2531:PMC
2521:doi
2476:doi
2420:doi
2370:PMC
2362:doi
2321:PMC
2313:doi
2272:PMC
2264:doi
2225:doi
2190:doi
2153:PMC
2145:doi
2101:PMC
2093:doi
2052:PMC
2044:doi
2003:PMC
1995:doi
1939:doi
1893:doi
1856:PMC
1846:doi
1802:doi
1766:doi
1696:doi
1650:PMC
1640:doi
1588:PMC
1578:doi
1431:DDT
1309:bot
1098:BED
541:If
267:RNA
70:or
58:In
5293::
5269:.
5259:.
5249:41
5247:.
5243:.
5217:.
5195:.
5185:.
5175:49
5173:.
5169:.
5146:.
5136:.
5122:.
5118:.
5092:.
5069:.
5059:.
5049:36
5047:.
5024:.
5014:.
5002:22
5000:.
4996:.
4950:.
4940:.
4932:.
4920:.
4916:.
4893:.
4883:.
4869:.
4865:.
4842:.
4832:.
4824:.
4812:.
4808:.
4785:.
4775:.
4763:.
4759:.
4736:.
4728:.
4718:.
4706:42
4704:.
4700:.
4677:.
4667:.
4655:.
4651:.
4628:.
4618:.
4608:48
4606:.
4602:.
4579:.
4569:.
4555:.
4551:.
4528:.
4518:.
4487:.
4477:.
4467:14
4465:.
4461:.
4438:.
4428:.
4414:.
4410:.
4387:.
4377:.
4363:.
4359:.
4336:.
4328:.
4318:24
4316:.
4293:.
4283:.
4271:.
4267:.
4244:.
4234:.
4222:.
4218:.
4187:.
4165:.
4155:.
4141:.
4137:.
4125:^
4111:.
4101:.
4091:35
4089:.
4085:.
4062:.
4050:14
4048:.
4044:.
4021:.
4011:.
4001:45
3999:.
3995:.
3966:.
3962:.
3939:.
3929:30
3927:.
3915:^
3901:.
3889:.
3866:.
3856:.
3844:16
3842:.
3838:.
3815:.
3805:.
3795:10
3793:.
3789:.
3766:.
3756:.
3746:34
3744:.
3740:.
3717:.
3709:.
3699:.
3661:.
3649:19
3647:.
3643:.
3620:.
3610:.
3598:21
3596:.
3592:.
3569:.
3559:.
3549:15
3547:.
3543:.
3520:.
3510:.
3498:11
3496:.
3492:.
3480:^
3464:.
3444:^
3430:.
3420:.
3406:.
3402:.
3388:^
3374:.
3364:.
3354:25
3352:.
3348:.
3325:.
3315:14
3313:.
3290:.
3280:.
3249:.
3239:.
3229:34
3227:.
3223:.
3200:.
3190:.
3180:25
3178:.
3174:.
3151:.
3139:24
3137:.
3133:.
3110:.
3100:.
3090:17
3088:.
3084:.
3061:.
3053:.
3041:.
3018:.
3008:.
2994:.
2990:.
2972:^
2950:.
2927:.
2915:26
2913:.
2909:.
2886:.
2878:.
2868:11
2866:.
2852:^
2838:.
2824:.
2820:.
2797:.
2787:.
2775:22
2773:.
2769:.
2746:.
2734:33
2732:.
2728:.
2705:.
2697:.
2687:13
2685:.
2667:^
2639:.
2617:^
2603:.
2593:.
2585:.
2573:.
2569:.
2539:.
2529:.
2515:.
2511:.
2484:.
2472:15
2470:.
2466:.
2454:^
2426:.
2392:^
2378:.
2368:.
2358:26
2356:.
2352:.
2329:.
2319:.
2309:26
2307:.
2303:.
2280:.
2270:.
2258:.
2254:.
2231:.
2221:18
2219:.
2196:.
2186:12
2184:.
2161:.
2151:.
2141:12
2139:.
2135:.
2123:^
2109:.
2099:.
2089:10
2087:.
2083:.
2060:.
2050:.
2038:.
2034:.
2011:.
2001:.
1991:44
1989:.
1985:.
1969:^
1955:.
1945:.
1921:^
1907:.
1899:.
1887:.
1864:.
1854:.
1842:10
1840:.
1836:.
1824:^
1808:.
1788:^
1772:.
1733:.
1710:.
1702:.
1690:.
1672:^
1658:.
1648:.
1634:.
1630:.
1610:^
1596:.
1586:.
1574:11
1572:.
1568:.
1478:.
1398:,
1394:,
1390:,
1206:,
1023:,
1019:,
983:,
824:,
750:,
746:,
742:,
687:,
570:.
379:,
375:,
371:,
367:,
363:,
323:,
257:,
246:.
66:,
5277:.
5255::
5228:.
5203:.
5181::
5154:.
5130::
5124:7
5103:.
5077:.
5063::
5055::
5032:.
5008::
4958:.
4936::
4928::
4922:6
4901:.
4877::
4871:9
4850:.
4828::
4820::
4814:6
4793:.
4771::
4765:1
4744:.
4722::
4712::
4685:.
4663::
4636:.
4614::
4587:.
4563::
4557:9
4536:.
4514::
4495:.
4473::
4446:.
4422::
4416:6
4395:.
4371::
4365:5
4344:.
4324::
4301:.
4279::
4252:.
4230::
4224:3
4203:.
4173:.
4149::
4143:3
4119:.
4097::
4070:.
4056::
4029:.
4007::
3980:.
3974::
3968:1
3947:.
3935::
3909:.
3897::
3891:8
3874:.
3850::
3823:.
3801::
3774:.
3752::
3725:.
3695::
3669:.
3655::
3628:.
3604::
3577:.
3555::
3528:.
3504::
3474:.
3460::
3438:.
3414::
3408:9
3382:.
3360::
3333:.
3321::
3298:.
3276::
3257:.
3235::
3208:.
3186::
3159:.
3145::
3118:.
3096::
3069:.
3049::
3043:8
3026:.
3002::
2996:9
2966:.
2935:.
2921::
2894:.
2874::
2846:.
2832::
2826:8
2805:.
2781::
2754:.
2740::
2713:.
2693::
2661:.
2635::
2611:.
2589::
2581::
2547:.
2523::
2517:9
2492:.
2478::
2448:.
2422::
2386:.
2364::
2337:.
2315::
2288:.
2266::
2260:8
2239:.
2227::
2204:.
2192::
2169:.
2147::
2117:.
2095::
2068:.
2046::
2040:4
2019:.
1997::
1963:.
1941::
1915:.
1895::
1889:7
1872:.
1848::
1818:.
1804::
1782:.
1768::
1749:.
1718:.
1698::
1692:2
1666:.
1642::
1636:7
1604:.
1580::
1076:-
1057:'
1002:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.