Knowledge (XXG)

DNA annotation

Source 📝

20: 940:. Pseudogenes are identified by means of a phylogenetic analysis. First, a species tree of the species of interest and a phylogenetic tree of the gene (or gene family) of interest are constructed. The two are then compared to identify a species that has lost the gene. Next, within the genome of the species where the gene was not found, a sequence is searched that is orthologous to the gene identified in the closest species. Finally, if this orthologous sequence has a disruption in its ORF (and it meets with other criteria, such as 2552: 773: 305: 216: 602:(CDS) are discontinuous, and, to ensure their proper identification, intronic regions must be filtered. To do so, annotation pipelines must find the exon-intron boundaries, and multiple methodologies have been developed for this purpose. One solution is to use known exon boundaries for alignment; for instance, many introns begin with GT and end with AG. This approach, however, cannot detect novel boundaries, so alternatives like 1142: 1049: 968:(WSSD). It aligns the original reads with the assembled genome and searches for regions with a higher read depth than the average, which usually are signals of duplication. Segmental duplications identified by this method but not by WGAC are likely collapsed duplications, which means that they were mistakenly aligned to the same region. 471:. Repeats are identified based on models of their structure, rather than repetition or similarity. They are capable of identifying real transposons (just like the homology-based ones), but are not biased by known elements. However, they are highly specific to each class of repeat, and, as such, are less universally applicable. 1356:(MGEs). The study of these elements is of great importance in the field of bioremediation, since recently the inoculation of wild or genetically modified strains with these MGEs has been sought in order to acquire these hydrocarbon degradation capacities. In 2013, Phale et al. published the genome annotation of a strain of 1040:
sequence is not. Therefore, by performing a multiple sequence alignment, more useful information can be obtained for their prediction. Homology search may also be employed to identify RNA genes, but this procedure is complicated, especially in eukaryotes, due to presence of a large number of repeats and pseudogenes.
809:, in which every node is a particular function, and every edge (or arrow) between two nodes indicates a parent-child or subcategory-category relationship. As of 2020, GO is the most widely used controlled vocabulary for functional annotation of genes, followed by the MIPS Functional Catalog (FunCat). 800:
Functional annotation of genes requires a controlled vocabulary (or ontology) to name the predicted functional features. However, because there are numerous ways to define gene functions, the annotation process may be hindered when it is performed by different research groups. As such, a standardized
1412:
analysis is of great importance in functional annotation, and specifically in bioremediation it can be applied to know the relationships between the genes of some microorganisms with their functions and their role in the remediation of certain contaminants. This was the approach of the investigation
1226:
Annotation projects often rely on previous annotations of an organism's genome; however, these older annotations may contain errors that can propagate to new annotations. As new genome analysis technologies are developed and richer databases become available, the annotation of some older genomes may
893:
Binary or multiclass classification methods for functional annotation generally produce less accurate results because they do not take into account the interrelations between GO terms. More advanced methods that consider these interrelations do so by either a flat or hierarchical approach, which are
832:
usually perform a similar function. However, orthologous sequences should be treated with caution because of two reasons: (1) they might have different names depending on when they were originally annotated, and (2) they may not perform the same functional role in two different organisms. Annotators
498:(A, C, G, or T) with other letters. By doing so, these regions will be marked as repetitive and downstream analyses will treat them accordingly. Repetitive regions may produce performance issues if they are not masked, and may even produce false evidence for gene annotation (for example, treating an 1039:
prediction of RNA genes in a single genome often yields inaccurate results (with an exception being miRNA), so multi-genome comparative methods are used instead. These methods are specifically concerned with the secondary structures of ncRNA, as they are conserved in related species even when their
837:
sequence when no paralogy, orthology or xenology was found. Homology-based methods have several drawbacks, such as errors in the database, low sensitivity/specificity, inability to distinguish between paralogy and homology, artificially high scores due to the presence of low complexity regions, and
1332:
Gene Ontology is being used by researchers to establish a disease-gene relationship, as GO helps in the identification of novel genes, the alterations in their expression, distribution and function under a different set of conditions, such as diseased versus healthy. Databases of this disease-gene
233:
As more sequenced genomes began to be available in early and mid 2000s, coupled with the numerous protein sequences that were obtained experimentally, genome annotators began employing homology based methods, launching the third generation of genome annotation. These new methods allowed annotators
211:
start sites) connected by arrows representing the scanning of the sequence. To ensure a Markov model detects a genomic signal, it must first be trained on a series of known genomic signals. The output of Markov models in the context of annotation includes the probabilities of every kind of genomic
101:
and described in a published article. Although describing individual genes and their products or functions is sufficient to consider this description as an annotation, the depth of analysis reported in literature for different genomes vary widely, with some reports including additional information
1167:
aims to identify similarities and differences in genomic features, as well as to examine evolutionary relationships between organisms. Visualization tools capable of illustrating the comparative behavior between two or more genomes are essential for this approach, and can be classified into three
934:. Pseudogenes are identified by searching sequences that are similar to functional genes but contain mutations that produce a disruption in their ORF. This method cannot determine the evolutionary relationship between a pseudogene and its parent gene nor the elapsed time since the event happened. 1299:
community annotation. Supervised community annotation is short-lived and limited to the duration of the event, whereas the unsupervised counterpart does not have this limitation. However, the latter has been less successful than the former presumably due to a lack of time, motivation, incentive
1217:
Community annotation approaches are great techniques for quality control and standardization in genome annotation. An annotation jamboree that took part in 2002, led to the creation of the annotation standards used by the Sanger Institute's Human and Vertebrate Analysis Project (HAVANA).
102:
that goes beyond a simple annotation. Furthermore, due to the size and complexity of sequenced genomes, DNA annotation is not performed manually, but is instead automated by computational means. However, the conclusions drawn from the obtained results require manual expert analysis.
796:, is involved. Every box is an ontology term that falls into one of the three GO categories and is color-coded respectively. Ontology terms are related to each other through specific qualifiers (such as "is a", "part of", etc.), which are represented by different kinds of arrows. 234:
not only to infer genomic elements through statistical means (as in previous generations) but could also perform their task by comparing the sequence being annotated with other already existing and validated sequences. These so-called combiner annotators, which perform both
431:
Identifying repeats is difficult for two main reasons: they are poorly conserved, and their boundaries are not clearly-defined. Because of this, repeat libraries must be built for the genome of interest, which can be accomplished with one of the following methods:
1333:
relationships of different organisms have been created, such as Plant-Pathogen Ontology, Plant-Associated Microbe Gene Ontology or DisGeNET. And some others have been implemented in pre-existing databases like Rat Disease Ontology in the Rat Genome database.
1294:
when there is a coordinator who manages the project by requesting the annotation of specific items to a select number of experts. On the other hand, when anyone can enter a project and coordination is accomplished in a decentralized manner, it is called
549:
located in the corresponding genome, providing not only their locations, but also their rates of expression. However, transcripts provide insufficient information for gene prediction because they might be unobtainable from some genes, they may encode
424:(which are larger elements with several copies across the genome). Repeats are a major component of both prokaryotic and eukaryotic genomes; for instance, between 0% and over 42% of prokaryotic genomes consist of repeats and three quarters of the 1133:(focus on one organism and the annotations for particular species). The latter are not necessarily linked to a specific genome database but are general-purpose browsers that can be downloaded and installed as an application on a local computer. 1175:
This scheme only allows to show the alignment of two genomes, one genome is represented along the horizontal axis and the other along the vertical axis and the dots in the plot represent the genomic elements that are similar between these two
53:
direction and their length; they are color-coded based on the cellular function or component they are part of. Represented with arrows, the transcription directions for the inner and outer genes are listed clockwise and anticlockwise,
446:
in a self-genome comparison, thus requiring no prior information about repeat structure or sequences. The disadvantage of these methods is that they can identify any repeated sequence, not just transposons, and may include conserved
451:(CDS), making careful post-processing an indispensable step to remove these sequences. It may also leave out related regions that have degraded over time and may group elements that have no connection in their evolutionary history. 1285:
It is a combination of the jamboree and cottage industry models. It begins with an annotation workshop, followed by a decentralized collaboration to extend and refine the initial annotation. It has been used for multiple species
1276: 1201:
influences the quality of the annotation, so it is important to assess assembly quality before performing the subsequent annotation steps. In order to quantify the quality of a genome annotation, three metrics have been used:
1227:
be updated. This process, known as reannotation, can provide users with new information about the genome, including details about genes and protein functions. Re-annotation is therefore a useful approach in quality control.
48:
are placed in the middle black circle. The outer gray circle shows GC content in the every section of the genome. All individual genes are placed on the outermost circle according to their position in the genome, their
219:
A release timeline of genome annotators. The dotted boxes indicate the four different generations of genome annotators and their most representative characteristics. First generation (blue) where annotators used
113:, which assigns functions to these elements. This is not the only way in which it has been categorized, as several alternatives, such as dimension-based and level-based classifications, have also been proposed. 228:
methods and homology-based annotations, and the fourth generation (orange) in which an approach to identification of the non-coding regions of DNA and study at the population level represented by the pangenome
998:. They consist in the identification of homologous sequences with known DNA binding sites, or by aligning them with query proteins. Their performance is usually low because the DNA binding sequences are less 485:
regions. Although this strategy avoids the poorly-defined boundary problem that exists in other methods, it is highly dependent on assembly quality and the level of activity of transposons in the genomes in
78:, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of 1481:
Genome annotation is an active area of investigation and involves a number of different organizations in the life science community which publish the results of their efforts in publicly available
1304: 1182:
This representation uses multiple linear tracks to represent multiple genomes and their features where "track" is a concept that refers to a specific type of genomic feature at a genomic location.
610:
to predict new ones. Predictors of new exon boundaries usually require efficient data-compression and alignment algorithms, but they are prone to failure in boundaries located in regions with low
820:
search tools. Its premise is that high sequence conservation between two genomic elements implies that their function is conserved as well. Pairs of homologous sequences that appeared through
3590:"Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study" 2215:
Grosjean H, Fiers W (June 1982). "Preferential codon usage in prokaryotic genes: the optimal codon-anticodon interaction energy and the selective codon usage in efficiently expressed genes".
1100:
and EMBL. Some of these formats use controlled vocabularies and ontologies to define their descriptive terminologies and guarantee interoperability between analysis and visualization tools.
962:(WGAC). It aligns the entire genome to itself in order to identify repeated sequences after filtering out common repeats; it does not require having the original reads used for the assembly. 701:(also called statistical, intrinsic, or de novo). CDS prediction is based solely on the information that can be extracted from the DNA sequence. They rely on statistical methods such as the 2413: 1235:
Community annotation consists in the engagement of a community (both scientific and nonscientific) in genome annotation projects. It can be classified into the following six categories:
1114:
Genomic browsers are software products that simplify the analysis and visualization of large genomic sequence and annotation data to gain biological insight, via a graphical interface.
4549:"The Plant-Associated Microbe Gene Ontology (PAMGO) Consortium: community development of new Gene Ontology terms describing biological processes involved in microbe-host interactions" 691:
CDS predictors are faced with a more difficult problem because of the complex organization of eukaryotic genes. CDS prediction methods can be classified into three broad categories:
538:
of the organism being annotated with the genome. Although it is optional, it can improve gene sequence elucidation because RNAs and proteins are direct products of coding sequences.
711:(also called empirical, evidence-driven, or extrinsic). CDS prediction is based on similarity to known sequences. Specifically, it performs alignments of the analyzed sequence with 955:
are DNA segments of more than 1000 base pairs that are repeated in the genome with more than 90% sequence identity. Two strategies used for their identification are WGAC and WSSD:
1188:
This representation facilitates comparison of whole microbial or viral genomes. In this visualization mode, concentric circles and arcs are used to represent genomic sections.
212:
element in every single part of the genome, and an accurate Markov model will assign high probabilities to correct annotations and low probabilities to the incorrect ones.
894:
distinguished by the fact that the former does not take into account the ontology structure, while the latter does. Some of these methods compress the GO terms by
4969: 1316: 526:
The next step after genome masking usually involves aligning all available transcript and protein evidence with the analyzed genome, that is, aligning all known
805:(GO). It classifies functional properties into one of three categories (molecular function, biological process, and cellular component) and organizes them in a 636:
regions, and the last step of structural annotation consists in identifying these features within the genome. In fact, the primary task in genome annotation is
3452:
Saxena R, Bishnoi R, Singla D (2021). "Gene Ontology: application and importance in functional annotation of the genomic data". In Singh B, Pathak RK (eds.).
347:
sequences contained in the genome are predicted with the help of databases of known DNA, RNA and protein sequences, as well as other supporting information.
514:, the letters of these regions are replaced with N's. This way, for example, soft masking can be used to exclude word matches and avoid initiating an 461:) of known repeats stored in a curated database. These methods are more likely to find real transposons, even in lower quantities, when compared with 738:
Functional annotation assigns functions to the genomic elements found by structural annotation, by relating them to biological processes such as the
1539: 2437: 640:, which is why numerous methods have been developed for this purpose. Gene prediction is a misleading term, as most gene predictors only identify 19: 3037:
Garber M, Grabherr MG, Guttman M, Trapnell C (June 2011). "Computational methods for transcriptome annotation and quantification using RNA-seq".
913:(ncDNA) are those that do not code for proteins. They include elements such as pseudogenes, segmental duplications, binding sites and RNA genes. 224:
methods at a local scale, second generation (red) with genome-wide ab initio methods, third generation (green) characterized by a combination of
648:(UTRs); for this reason, CDS prediction has been proposed as a more accurate term. CDS predictors detect genome features through methods called 442:. Repeats are identified by detecting and grouping pairs of sequences at different locations whose similarity is above a minimum threshold of 4523: 3704: 3469: 3285: 1950: 1777: 2626: 1279:, in which curators go through a training period prior to annotation, and are then given access to annotation tools to continue their work. 1730: 1796:
Mishra P, Maurya R, Avashthi H, Mittal S, Chandra M, Ramteke PW (2021). "Genome assembly and annotation". In Singh DB, Pathak RK (eds.).
870:
methods are also used to generate functional annotations for novel proteins based on GO terms. Generally, they consist in constructing a
191:
sequenced in 1995) introduced a second generation of annotators. Just like in the previous generation, they performed annotation through
1485:
accessible via the web and other electronic means. Here is an alphabetical listing of on-going projects relevant to genome annotation:
2644: 2431: 1813: 860:
and final location of any given protein. Probabilistic methods may be paired with a controlled vocabulary, such as GO; for example,
587: 281:
are also found in new genomes of the same clade. Both annotation strategies constitute the fourth generation of genome annotators.
1626:
Dominguez Del Angel V, Hjerde E, Sterck L, Capella-Gutierrez S, Notredame C, Vinnere Pettersson O, et al. (5 February 2018).
4184: 1382:
database. This analysis concluded in the localization of the upper pathway genes of naphthalene degradation, right next to the
857: 5043:
Martin R, Hackl T, Hattab G, Fischer MG, Heider D (April 2021). Birol I (ed.). "MOSGA: Modular Open-Source Genome Annotator".
127:
methods, which are based solely on the information that can be extracted from the DNA sequence on a local scale, that is, one
3082:"Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation" 2947: 1386:
encoding tRNA-Gly and integrase, as well as the identification of the genes encoding enzymes involved in the degradation of
1214:; although these measures are not explicitly used in annotation projects, but rather in discussions of prediction accuracy. 3681: 199:
are the driving force behind many algorithms used within annotators of this generation; these models can be thought of as
754:, etc. It may also be used as an additional quality check by identifying elements that may have been annotated by error. 132: 1374:
as a carbon and energy source. In order to find the MGEs of this bacterium, its genome was annotated using RAST and the
1054: 887: 319:
are masked by using a repeat library. Then, optionally, the masked sequence is aligned with all the available evidence (
98: 4994:"FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences" 1449:
can support a user-friendly web interface and software containerization such as MOSGA. Modern annotation pipelines for
2862:
Alexander RP, Fang G, Rozowsky J, Snyder M, Gerstein MB (August 2010). "Annotating non-coding regions of the genome".
1015:(ncRNA), produced by RNA genes, is a type of RNA that is not translated into a protein. It includes molecules such as 861: 767: 478: 1008:. They employ the three-dimensional structural information of proteins to predict the locations of DNA binding sites. 1474:
A variety of software tools have been developed that allow scientists to view and share genome annotations, such as
975:
are regions in the genome sequence that bind to and interact with specific proteins. They play an important role in
4863:"High quality draft genome sequence of the heavy metal resistant bacterium Halomonas zincidurans type strain B6(T)" 4806:"Insights into functional and evolutionary analysis of carbaryl metabolic pathway from Pseudomonas sp. strain C5pp" 984: 518:
in those regions, and hard masking, apart from all of this, can also exclude masked regions from alignment scores.
1052:
A snapshot of an annotated GBK file created with Prokka. It shows the components (features) of a small portion of
1460:
develops tools for automated annotation of database records based on the textual descriptions of those records.
273:
analysis techniques. Other genome annotators also began to focus on population-level studies represented by the
2504: 1933:
Abril JF, Castellano S (2019). "Genome Annotation". In Ranganathan S, Nakai K, Schonbach C, Gribskov M (eds.).
1524: 1129:(integrate sequence and annotations of multiple organisms and promote cross-species comparative analysis) and 882:(SVM) is the most widely used binary classifier in functional annotation; however, other algorithms, such as 4510:. Methods in Molecular Biology. Vol. 1374 (2nd ed.). Totowa, N.J.: Humana Press. pp. 89–114. 1353: 1265: 417: 368: 204: 159: 50: 4698:"Catabolic mobile genetic elements and their potential use in bioaugmentation of polluted soils and waters" 2551: 2133:"The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression" 1059:
s genome, including their positions (structural annotation) and inferred functions (functional annotation).
705:(HMM). Some methods employ two or more genomes to infer local mutation rates and patterns along the genome. 420:, which include low-complexity sequences (such as AGAGAGAG, or monopolymeric segments like TTTTTTTTT), and 5116:"Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification" 1263:
Consists of a short intensive workshop with leading curators from the community. It was first used in the
1211: 879: 806: 781: 712: 527: 320: 187: 4598:
Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J, Ronzano F, Centeno E, Sanz F, Furlong LI (January 2020).
3272:. Methods in Molecular Biology. Vol. 1525 (Second ed.). New York: Springer. pp. 271–291. 1981:
Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP, Zaslavsky L, et al. (August 2016).
1160:. Functional annotations of proteins are displayed in distinct colors and homologies in different tones. 952: 924: 919:
are mutated copies of protein-coding genes that lost their coding function due to a disruption in their
875: 747: 292:
became available. As such, genome annotation remains a major challenge for scientists investigating the
208: 178:
during protein synthesis) allowing a more efficient translation. This was also known to be the case for
163: 5167:"RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation" 1883:
Reed JL, Famili I, Thiele I, Palsson BO (February 2006). "Towards multidimensional genome annotation".
772: 4914:"Biodegradation of DDT by Stenotrophomonas sp. DDT-1: Characterization and genome functional analysis" 4649:"The Disease Portals, disease-gene annotation and the RGD disease ontology at the Rat Genome Database" 3691:. Methods in Molecular Biology. Vol. 2324 (Second ed.). New York: Springer. pp. 21–34. 4925: 4817: 2578: 1482: 1446: 1207: 1203: 1164: 895: 834: 793: 555: 482: 285: 25: 2565:
Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, et al. (November 2012).
566:
based approaches are employed, which utilize information from expressed proteins often derived from
351:
Structural annotation describes the precise location of the different elements in a genome, such as
131:(ORF) at a time. They appeared as a necessity to handle the enormous amount of data produced by the 2988:"Review on the Computational Genome Annotation of Sequences Obtained by Next-Generation Sequencing" 883: 702: 673: 657: 645: 611: 607: 391: 376: 304: 4737: 4506:
Cooper L, Jaiswal P (2016). "The Plant Ontology: A Tool for Plant Genomics". In Edwards D (ed.).
4337: 3718: 3062: 2887: 2706: 2650: 1956: 1908: 1711: 1367: 1358: 1092:
that support each gene model. Some commonly used formats for describing annotations are GenBank,
1085: 999: 920: 817: 515: 499: 443: 352: 239: 128: 1738: 1467:
has an automated procedure for statistically inferring associations between ontology terms and
788:. It shows the molecular functions, biological processes, and cellular components in which the 506:) Depending on the letters used for replacement, masking can be classified as soft or hard: in 5270: 5196: 5165:
Li W, O'Neill KR, Haft DH, DiCuccio M, Chetvernin V, Badretdin A, et al. (January 2021).
5147: 5070: 5025: 4951: 4894: 4843: 4786: 4729: 4678: 4629: 4580: 4529: 4519: 4488: 4439: 4388: 4329: 4294: 4245: 4166: 4112: 4063: 4022: 3940: 3902: 3867: 3816: 3767: 3710: 3700: 3662: 3621: 3570: 3521: 3465: 3431: 3375: 3326: 3291: 3281: 3250: 3201: 3152: 3111: 3054: 3019: 2928: 2879: 2839: 2798: 2747: 2698: 2640: 2604: 2540: 2485: 2427: 2379: 2330: 2281: 2232: 2197: 2162: 2110: 2061: 2012: 1946: 1900: 1865: 1809: 1773: 1703: 1659: 1597: 1399: 1395: 1257:
Annotation is decentralized and is the result of the effort from different part-time curators.
1198: 1097: 874:
for each GO term, which are then joined to make predictions on individual GO terms (forming a
871: 829: 825: 821: 813: 716: 567: 559: 458: 309: 270: 250: 243: 150:, created by Rodger Staden in 1977. It performed several tasks related to annotation, such as 136: 94: 59: 3641:"Predicting protein function from protein/protein interaction data: a probabilistic approach" 1319:
seeks to write articles that describe individual RNAs and RNA families in an accessible way.
1312: 5260: 5252: 5186: 5178: 5137: 5127: 5060: 5052: 5015: 5005: 4941: 4933: 4884: 4874: 4833: 4825: 4776: 4768: 4719: 4709: 4668: 4660: 4619: 4611: 4570: 4560: 4511: 4478: 4470: 4429: 4419: 4378: 4368: 4321: 4284: 4276: 4235: 4227: 4156: 4146: 4102: 4094: 4053: 4012: 4004: 3971: 3932: 3894: 3857: 3847: 3806: 3798: 3757: 3749: 3692: 3652: 3611: 3601: 3560: 3552: 3511: 3501: 3457: 3421: 3411: 3365: 3357: 3318: 3273: 3240: 3232: 3191: 3183: 3142: 3101: 3093: 3046: 3009: 2999: 2918: 2871: 2829: 2788: 2778: 2737: 2690: 2632: 2594: 2586: 2530: 2520: 2475: 2419: 2369: 2361: 2320: 2312: 2271: 2263: 2224: 2189: 2152: 2144: 2100: 2092: 2051: 2043: 2002: 1994: 1938: 1892: 1855: 1845: 1801: 1765: 1695: 1649: 1639: 1587: 1577: 1424: 1153: 972: 867: 603: 254: 215: 179: 3080:
Gupta N, Tanner S, Jaitly N, Adkins JN, Lipton M, Edwards R, et al. (September 2007).
1445:
genome can be annotated using various annotation tools such as FINDER. A modern annotation
3898: 3682:"Methods to Identify and Study the Evolution of Pseudogenes Using a Phylogenetic Approach" 3400:"Protein function prediction with gene ontology: from traditional to deep learning models" 2081:"Codon preference and its use in identifying protein coding regions in long DNA sequences" 1168:
categories based on the representation of the relationships between the compared genomes:
1089: 976: 841:
Functional annotation can be performed through probabilistic methods. The distribution of
661: 641: 637: 629: 623: 599: 546: 448: 356: 340: 313: 289: 284:
By the 2010s, the genome sequences of more than a thousand-human individuals (through the
258: 5241:"DcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more" 4457:
Daub J, Gardner PP, Tate J, Ramsköld D, Manske M, Scott WG, et al. (December 2008).
3785:
Hartasánchez DA, Brasó-Vives M, Heredia-Genestar JM, Pybus M, Navarro A (November 2018).
4929: 4821: 2582: 5265: 5240: 5191: 5166: 5142: 5115: 5020: 4993: 4946: 4913: 4889: 4862: 4838: 4805: 4781: 4756: 4714: 4697: 4673: 4648: 4624: 4599: 4575: 4548: 4483: 4458: 4434: 4407: 4383: 4356: 4289: 4264: 4240: 4215: 4107: 4083:"geneCo: a visualized comparative genomic method to analyze multiple genome structures" 4082: 4017: 3993:"PBrowse: a web-based platform for real-time collaborative exploration of genomic data" 3992: 3862: 3835: 3811: 3786: 3762: 3737: 3565: 3540: 3516: 3489: 3461: 3426: 3399: 3370: 3345: 3245: 3220: 3196: 3171: 3106: 3081: 3014: 2987: 2793: 2766: 2599: 2566: 2535: 2508: 2007: 1982: 1942: 1860: 1833: 1805: 1654: 1627: 1592: 1565: 1468: 1375: 1109: 1069: 853: 563: 200: 147: 143: 139: 90: 5114:
Schwengers O, Jelonek L, Dieckmann MA, Beyvers S, Blom J, Goesmann A (November 2021).
5056: 4192: 4161: 4134: 3616: 3589: 3268:
McHardy AC, Kloetgen A (2017). "Finding Genes in Genome Sequence". In Keith JM (ed.).
2681:
Yandell M, Ence D (April 2012). "A beginner's guide to eukaryotic genome annotation".
2374: 2349: 2325: 2300: 2276: 2251: 2157: 2132: 2105: 2080: 2056: 2031: 1141: 545:
data is available, it may be used to annotate and quantify all of the genes and their
510:, repetitive regions are indicated with lowercase letters (a, c, g, or t), whereas in 416:
The first step of structural annotation consists in the identification and masking of
142:
techniques developed in the late 1970s. The first software used to analyze sequencing
5290: 5065: 4647:
Hayman GT, Laulederkind SJ, Smith JR, Wang SJ, Petri V, Nigam R, et al. (2016).
4231: 3722: 3657: 3640: 3066: 2742: 2725: 2228: 2193: 1960: 1514: 1409: 1084:, UTRs and alternative transcripts, and ideally should include information about the 1012: 988: 910: 899: 802: 777: 763: 633: 344: 83: 4741: 4406:
Huss JW, Orozco C, Goodale J, Wu C, Batalov S, Vickers TJ, et al. (July 2008).
4151: 4098: 3936: 3753: 3361: 3236: 3187: 3147: 3130: 2923: 2906: 1912: 1715: 5296: 4341: 2891: 2710: 1644: 1420: 1308: 1146: 684: 583: 554:
of more than one gene, and their start and stop codons cannot be determined due to
425: 387: 372: 336: 293: 196: 155: 97:, and is a necessary step in genome analysis before the sequence is deposited in a 4757:"Genome Sequence of Naphthalene-Degrading Soil Bacterium Pseudomonas putida CSV86" 3344:
Binns D, Dimmer E, Huntley R, Barrell D, O'Donovan C, Apweiler R (November 2009).
864:(PPI) networks usually place proteins with similar functions close to each other. 37: 5214: 4424: 3277: 2955: 2525: 2464:"Genome annotation past, present, and future: how to define an ORF at each locus" 1475: 4515: 3696: 2148: 1850: 1769: 1494: 1363: 1349: 1248: 1048: 991:. Binding site prediction involves the use of one of the following two methods: 945: 846: 842: 785: 677: 669: 490:
After the repetitive regions in a genome have been identified, they are masked.
380: 29: 5010: 3606: 3309:
Brent MR, Guigó R (June 2004). "Recent advances in gene structure prediction".
1566:"Chloroplot: An Online Program for the Versatile Plotting of Organelle Genomes" 801:
controlled vocabulary must be employed, the most comprehensive of which is the
249:
In the late 2000s, genome annotation shifted its attention towards identifying
185:
The advent of complete genomes in the 1990s (the first one being the genome of
74:
is the process of describing the structure and function of the components of a
4992:
Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM (April 2021).
4664: 4565: 4280: 3736:
Numanagic I, Gökkaya AS, Zhang L, Berger B, Alkan C, Hach F (September 2018).
3322: 1832:
Bright LA, Burgess SC, Chowdhary B, Swiderski CE, McCarthy FM (October 2009).
1582: 1450: 1387: 980: 916: 849: 751: 743: 739: 688: 681: 579: 495: 421: 384: 332: 277:; by doing so, for instance, annotation pipelines ensure that core genes of a 171: 151: 45: 4772: 3506: 2180:
Fickett JW (August 1996). "Finding genes by computer: the state of the art".
1471:
or combinations of domains from the existing gene/protein-level annotations.
1378:(PGAP), and the identification of nine mobile elements was possible with the 1303:
Knowledge (XXG) has multiple WikiProjects aimed at improving annotation. The
308:
Generalized flowchart of a structural genome annotation pipeline. First, the
2316: 2267: 2047: 1442: 1414: 1352:
degradation by some bacterial strains are encoded by genes located in their
1342: 274: 262: 123: 5274: 5200: 5182: 5151: 5074: 5029: 4955: 4912:
Pan X, Lin D, Zheng Y, Zhang Q, Yin Y, Cai L, et al. (February 2016).
4898: 4879: 4847: 4790: 4733: 4682: 4633: 4584: 4533: 4492: 4443: 4392: 4373: 4333: 4298: 4249: 4170: 4116: 4067: 4026: 3944: 3906: 3871: 3820: 3771: 3714: 3666: 3625: 3574: 3525: 3490:"A Literature Review of Gene Function Prediction by Modeling Gene Ontology" 3435: 3379: 3330: 3295: 3254: 3205: 3156: 3115: 3058: 3023: 3004: 2932: 2883: 2843: 2802: 2751: 2702: 2608: 2544: 2489: 2365: 2096: 2016: 1904: 1869: 1834:"Structural and functional-annotation of an equine whole genome oligoarray" 1707: 1663: 1601: 852:
indicates whether a protein is located in a solution or membrane. Specific
41: 5256: 5132: 5089: 4724: 4615: 4008: 2636: 2383: 2334: 2285: 2236: 2201: 2166: 2114: 729:. CDS prediction is done by a combination of both methods mentioned above. 253:
in DNA, which was achieved thanks to the appearance of methods to analyze
4058: 4041: 3852: 3802: 3787:"Effect of Collapsed Duplications on Diversity Estimates: What to Expect" 3556: 2834: 2817: 2783: 2065: 1998: 1544: 1391: 1251:
by experts is involved to interpret the results of an annotation project.
1028: 789: 175: 158:
counts. In fact, codon usage was the main strategy used by several early
63: 4474: 3991:
Szot PS, Yang A, Wang X, Parsania C, Röhm U, Wong KH, Ho JW (May 2017).
3416: 2590: 166:
regions in a genome contain codons with the most abundant corresponding
3097: 3050: 2480: 2463: 1534: 1519: 1509: 1504: 1499: 1419:
strain B6(T), a bacterium with thirty-one genes encoding resistance to
1371: 1345: 1125:. The former use information from databases and can be classified into 941: 720: 668:
that classify DNA sequences into coding and noncoding content. Whereas
542: 535: 328: 33: 4974:, NBIS -- National Bioinformatics Infrastructure Sweden, 13 April 2022 4937: 4829: 3923:
Seemann T (July 2014). "Prokka: rapid prokaryotic genome annotation".
3738:"Fast characterization of segmental duplications in genome assemblies" 477:. Repeats are identified as disruptions of one or more sequences in a 3976: 3959: 2818:"Discovering and detecting transposable elements in genome sequences" 1699: 1529: 1489: 1403: 1081: 1073: 1024: 591: 551: 364: 316: 75: 2875: 2694: 1896: 1686:
Stein L (July 2001). "Genome annotation: from sequence to biology".
4600:"The DisGeNET knowledge platform for disease genomics: 2019 update" 2423: 4755:
Phale PS, Paliwal V, Raju SC, Modak A, Purohit HJ (January 2013).
4325: 3221:"Evaluation of tools for long read RNA-seq splice-aware alignment" 1383: 1157: 1150: 1140: 1047: 278: 182:, which are often present in proteins expressed at a lower level. 2567:"An integrated map of genetic variation from 1,092 human genomes" 2250:
Grantham R, Gautier C, Gouy M, Mercier R, Pavé A (January 1980).
1433:
as its sole carbon and energy source, to mention a few examples.
1311:
that harvests gene data from research databases and creates gene
927:. They may be identified using one of the following two methods: 4355:
Mazumder R, Natale DA, Julio JA, Yeh LS, Wu CH (February 2010).
2350:"Microbial gene identification using interpolated Markov models" 1760:
Koonin E, Galperin MY (2003). "Genome Annotation and Analysis".
1464: 1093: 1077: 1032: 1020: 1016: 595: 503: 360: 167: 162:(CDS) prediction methods, based on the assumption that the most 79: 16:
The process of describing the structure and function of a genome
3541:"Functional annotation prediction: all for one and one for all" 3129:
De Bona F, Ossowski S, Schneeberger K, Rätsch G (August 2008).
1457: 606:
algorithms exist that are trained on known exon boundaries and
465:
methods, but are biased towards previously identified families.
2726:"Genesis, effects and fates of repeats in prokaryotic genomes" 1430: 1072:
requires a descriptive output file, which should describe the
531: 324: 266: 4547:
Torto-Alalibo T, Collmer CW, Gwinn-Giglio M (February 2009).
4216:"Genome (re-)annotation and open-source annotation pipelines" 2907:"Search and clustering orders of magnitude faster than BLAST" 2509:"A user's guide to the encyclopedia of DNA elements (ENCODE)" 1379: 3885:
Griffiths-Jones S (2007). "Annotating noncoding RNA genes".
3836:"An overview of the prediction of protein DNA-binding sites" 1628:"Ten steps to get started in Genome Assembly and Annotation" 1564:
Zheng S, Poczai P, Hyvönen J, Tang J, Amiryousefi A (2020).
109:, which identifies and demarcates elements in a genome, and 4804:
Trivedi VD, Jangir PK, Sharma R, Phale PS (December 2016).
4459:"The RNA WikiProject: community annotation of RNA families" 4135:"The past, present and future of genome-wide re-annotation" 1242:
Annotation is performed by a completely automated pipeline.
948:, etc.), it means that the sequence is indeed a pseudogene. 4312:
Hartl DL (April 2000). "Fly meets shotgun: shotgun wins".
2628:
The dictionary of genomics, transcriptomics and proteomics
2348:
Salzberg SL, Delcher AL, Kasif S, White O (January 1998).
2724:
Treangen TJ, Abraham AL, Touchon M, Rocha EP (May 2009).
203:
where nodes represent different genomic signals (such as
4263:
Loveland JE, Gilbert JG, Griffiths E, Harrow JL (2012).
3219:
Križanovic K, Echchiki A, Roux J, Šikic M (March 2018).
1935:
Encyclopedia of Bioinformatics and Computational Biology
812:
Some conventional methods for functional annotation are
4408:"A gene wiki for community annotation of gene function" 3346:"QuickGO: a web-based tool for Gene Ontology searching" 1402:, hydroxyphenyl acetic acid, and the recognition of an 878:) for which confidence scores are later obtained. The 3488:
Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G (2020).
1145:
A linear comparative genome visualization of several
121:
The first generation of genome annotators used local
3131:"Optimal spliced alignments of short sequence reads" 1937:(1st ed.). Elsevier Science. pp. 195–209. 1080:
structures of each annotation, their start and stop
407:
Feature prediction (coding and noncoding sequences).
4042:"A brief introduction to web-based genome browsers" 3172:"TopHat: discovering splice junctions with RNA-Seq" 2131:Gribskov M, Devereux J, Burgess RR (January 1984). 598:(coding regions) are joined. Therefore, eukaryotic 1800:(1st ed.). Elsevier Science. pp. 49–66. 105:DNA annotation is classified into two categories: 44:. The number of genes, the genome length, and the 1362:(CSV86), a bacterium known for its preference of 195:methods, but now applied on a genome-wide scale. 2857: 2855: 2853: 2412:Soh J, Gordon PM, Sensen CW (4 September 2012). 1275:A variation of the museum model, applied in the 614:or high error-rates produced during sequencing. 582:genomes has an extra layer of difficulty due to 86:in a genome and determines what those genes do. 4861:Huo YY, Li ZY, Cheng H, Wang CS, Xu XW (2014). 4185:"Manual Annotation - Wellcome Sanger Institute" 4128: 4126: 3170:Trapnell C, Pachter L, Salzberg SL (May 2009). 2252:"Codon catalog usage and the genome hypothesis" 2126: 2124: 1976: 1974: 1972: 1970: 1764:(1st ed.). Springer US. pp. 193–226. 1621: 1619: 1617: 1615: 1613: 1611: 1540:Vertebrate and Genome Annotation Project (Vega) 838:significant variation within a protein family. 394:. The main steps of structural annotation are: 3918: 3916: 3454:Bioinformatics : methods and applications 2676: 2674: 2672: 2670: 2668: 2301:"GeneMark.hmm: new solutions for gene finding" 676:(ORFs), which are segments of DNA between the 656:that identify functional site signals such as 4696:Top EM, Springael D, Boon N (November 2002). 3447: 3445: 1983:"NCBI prokaryotic genome annotation pipeline" 1406:involved in glucose transport in the strain. 8: 3887:Annual Review of Genomics and Human Genetics 3456:. London: Academic Press. pp. 145–157. 2981: 2979: 2977: 2975: 2973: 2457: 2455: 1827: 1825: 1791: 1789: 238:and homology-based annotation, require fast 4040:Wang J, Kong L, Gao G, Luo J (March 2013). 3840:International Journal of Molecular Sciences 3588:Sinha S, Lynn AM, Desai DK (October 2020). 3483: 3481: 2816:Bergman CM, Quesneville H (November 2007). 2771:International Journal of Molecular Sciences 2620: 2618: 2299:Lukashin AV, Borodovsky M (February 1998). 1928: 1926: 1924: 1922: 1681: 1679: 1677: 1675: 1673: 1376:NCBI Prokaryotic Genome Annotation Pipeline 404:Splice identification (only in eukaryotes). 3539:Sasson O, Kaplan N, Linial M (June 2006). 2407: 2405: 2403: 2401: 2399: 2397: 2395: 2393: 89:Annotation is performed after a genome is 5264: 5190: 5141: 5131: 5064: 5019: 5009: 4945: 4888: 4878: 4837: 4780: 4723: 4713: 4672: 4623: 4574: 4564: 4482: 4433: 4423: 4382: 4372: 4288: 4239: 4160: 4150: 4106: 4057: 4016: 3975: 3861: 3851: 3810: 3761: 3656: 3615: 3605: 3564: 3515: 3505: 3425: 3415: 3393: 3391: 3389: 3369: 3244: 3195: 3146: 3105: 3013: 3003: 2922: 2833: 2792: 2782: 2741: 2598: 2534: 2524: 2479: 2373: 2324: 2275: 2156: 2104: 2055: 2006: 1859: 1849: 1731:"Medical Definition of Genome annotation" 1653: 1643: 1591: 1581: 3958:Valeev T, Yevshin I, Kolpakov F (2013). 2503:ENCODE Project Consortium (April 2011). 1798:Bioinformatics: Methods and Applications 771: 457:. Repeats are identified by similarity ( 303: 214: 170:(the molecules responsible for carrying 18: 4265:"Community gene annotation in practice" 2079:Staden R, McLachlan AD (January 1982). 1556: 1458:National Center for Biomedical Ontology 966:Whole-genome Shotgun Sequence Detection 4081:Jung J, Kim JI, Yi G (December 2019). 906:Noncoding sequence function prediction 331:) of the organism being annotated. In 4214:Siezen RJ, van Hijum SA (July 2010). 3899:10.1146/annurev.genom.8.080706.092419 3311:Current Opinion in Structural Biology 1490:Encyclopedia of DNA elements (ENCODE) 1290:A community annotation is said to be 1117:Genomic browsers can be divided into 594:(non-coding regions) are removed and 428:are composed of repetitive elements. 7: 2032:"Sequence data handling by computer" 1453:genomes are Bakta, Prokka and PGAP. 1137:Comparative visualization of genomes 2986:Ejigu GF, Jung J (September 2020). 2631:(Fifth ed.). Weinheim: Wiley. 902:, thus boosting their performance. 780:(GO) ancestor chart organized as a 758:Coding sequence function prediction 494:means replacing the letters of the 4715:10.1111/j.1574-6941.2002.tb01009.x 3462:10.1016/B978-0-323-89775-4.00015-8 2418:. New York: Chapman and Hall/CRC. 1943:10.1016/B978-0-12-809633-8.20226-4 1806:10.1016/B978-0-323-89775-4.00013-4 1068:Visualization of annotations in a 398:Repeat identification and masking. 255:transcription factor binding sites 242:algorithms to identify regions of 14: 4357:"Community annotation in biology" 3834:Si J, Zhao R, Wu R (March 2015). 1429:DDT-1, a strain capable of using 996:Sequence similarity based methods 412:Repeat identification and masking 339:must be identified. Finally, the 5239:Fang H, Gough J (January 2013). 4232:10.1111/j.1751-7915.2010.00191.x 2743:10.1111/j.1574-6976.2009.00169.x 2550: 960:Whole-Genome Assembly Comparison 890:(CNN), have also been employed. 672:CDS predictors mostly deal with 5094:mosga.mathematik.uni-marburg.de 5057:10.1093/bioinformatics/btaa1003 4152:10.1186/gb-2002-3-2-comment2001 3680:Dainat J, Pontarotti P (2021). 2767:"Repetitive Elements in Humans" 1762:Sequence — Evolution — Function 858:posttranslational modifications 3658:10.1093/bioinformatics/btg1026 1645:10.12688/f1000research.13598.1 1380:Insertion Sequence (IS) Finder 560:translation initiation factors 401:Evidence alignment (optional). 1: 5251:(Database issue): D536–D544. 4867:Standards in Genomic Sciences 4133:Ouzounis CA, Karp PD (2002). 4099:10.1093/bioinformatics/btz596 3937:10.1093/bioinformatics/btu153 3754:10.1093/bioinformatics/bty586 3362:10.1093/bioinformatics/btp536 3237:10.1093/bioinformatics/btx668 3188:10.1093/bioinformatics/btp120 3148:10.1093/bioinformatics/btn300 2924:10.1093/bioinformatics/btq461 1277:Knockout Mouse Project (KOMP) 4425:10.1371/journal.pbio.0060175 3791:Genome Biology and Evolution 3639:Letovsky S, Kasif S (2003). 3278:10.1007/978-1-4939-6622-6_11 2526:10.1371/journal.pbio.1001046 2229:10.1016/0378-1119(82)90157-3 2194:10.1016/0168-9525(96)10038-X 1149:of phylogenetically related 1123:stand-alone genomic browsers 1055:Candidatus Carsonella ruddii 888:convolutional neural network 816:-based, which rely on local 588:post-transcriptional process 502:(ORF) in a transposon as an 4516:10.1007/978-1-4939-3167-5_5 4046:Briefings in Bioinformatics 3697:10.1007/978-1-0716-1503-4_2 2822:Briefings in Bioinformatics 1851:10.1186/1471-2105-10-S11-S8 1770:10.1007/978-1-4757-3783-7_6 1307:, for instance, operates a 862:protein-protein interaction 768:Protein function prediction 479:multiple sequence alignment 475:Comparative genomic methods 5315: 5011:10.1186/s12859-021-04120-9 3607:10.1186/s12859-020-03794-x 2462:Brent MR (December 2005). 2030:Staden R (November 1977). 1729:Davis CP (29 March 2021). 1269:genome annotation project. 1119:web-based genomic browsers 1107: 985:transcriptional regulation 761: 621: 5066:21.11116/0000-0006-FED4-D 4702:FEMS Microbiology Ecology 4566:10.1186/1471-2180-9-S1-S1 3323:10.1016/j.sbi.2004.05.007 2905:Edgar RC (October 2010). 2765:Liehr T (February 2021). 2730:FEMS Microbiology Reviews 2149:10.1093/nar/12.1part2.539 1583:10.3389/fgene.2020.576124 562:. To solve this problem, 4773:10.1128/genomeA.00234-12 3507:10.3389/fgene.2020.00400 2864:Nature Reviews. Genetics 2683:Nature Reviews. Genetics 1885:Nature Reviews. Genetics 1688:Nature Reviews. Genetics 1525:Mouse Genome Informatics 1515:Gene Ontology Consortium 1261:Party or jamboree model: 1186:Circular representation: 644:(CDS) and do not report 4665:10.1093/database/baw034 4281:10.1093/database/bas009 4220:Microbial Biotechnology 3960:"BioUML Genome Browser" 3687:. In Poliseno L (ed.). 1354:mobile genetic elements 1266:Drosophila melanogaster 1255:Cottage industry model: 1031:, as well as noncoding 1006:Structure based methods 856:provide information on 713:expressed sequence tags 628:A genome is divided in 528:expressed sequence tags 469:Structure-based methods 160:protein coding sequence 5245:Nucleic Acids Research 5171:Nucleic Acids Research 4880:10.1186/1944-3277-9-30 4604:Nucleic Acids Research 4374:10.1186/1745-6150-5-12 3997:Nucleic Acids Research 3651:(Suppl 1): i197–i204. 3398:Vu TT, Jung J (2021). 3005:10.3390/biology9090295 2354:Nucleic Acids Research 2305:Nucleic Acids Research 2256:Nucleic Acids Research 2137:Nucleic Acids Research 2085:Nucleic Acids Research 2036:Nucleic Acids Research 1987:Nucleic Acids Research 1423:, especially zinc and 1413:and identification of 1300:and/or communication. 1180:Linear representation: 1161: 1060: 953:Segmental duplications 938:Phylogeny-based method 880:support vector machine 807:directed acyclic graph 797: 782:directed acyclic graph 709:Homology-based methods 455:Homology-based methods 348: 230: 188:Haemophilus influenzae 55: 5133:10.1099/mgen.0.000685 3494:Frontiers in Genetics 2637:10.1002/9783527678679 2317:10.1093/nar/26.4.1107 2268:10.1093/nar/8.1.197-c 2048:10.1093/nar/4.11.4037 1570:Frontiers in Genetics 1463:As a general method, 1416:Halomonas zincidurans 1341:A great diversity of 1144: 1051: 932:Homology-based method 876:multiclass classifier 792:, a component of the 775: 734:Functional annotation 622:Further information: 574:Splice identification 444:sequence conservation 307: 300:Structural annotation 265:structure, and other 218: 111:functional annotation 107:structural annotation 22: 5219:ncbo.bioontology.org 5183:10.1093/nar/gkaa1105 5051:(22–23): 5514–5515. 4761:Genome Announcements 4508:Plant Bioinformatics 3853:10.3390/ijms16035194 3557:10.1110/ps.062185706 2784:10.3390/ijms22042072 2366:10.1093/nar/26.2.544 2097:10.1093/nar/10.1.141 1483:biological databases 1426:Stenotrophomonas sp. 1283:Gatekeeper approach: 1231:Community annotation 1165:Comparative genomics 896:matrix factorization 794:extracellular matrix 646:untranslated regions 286:1000 Genomes Project 26:Porphyra umbilicalis 5257:10.1093/nar/gks1080 5177:(D1): D1020–D1028. 4930:2016NatSR...621332P 4822:2016NatSR...638430T 4616:10.1093/nar/gkz1021 4475:10.1261/rna.1200508 4009:10.1093/nar/gkw1358 3417:10.7717/peerj.12019 2591:10.1038/nature11632 2583:2012Natur.491...56T 2143:(1 Pt 2): 539–549. 1315:on that basis. The 1197:The quality of the 1086:sequence alignments 1035:-like transcripts. 923:(ORF), making them 911:Noncoding sequences 884:k-nearest neighbors 703:hidden Markov model 674:open reading frames 608:quality information 353:open reading frames 296:and other genomes. 32:genome annotation ( 23:A visualization of 5120:Microbial Genomics 4998:BMC Bioinformatics 4918:Scientific Reports 4810:Scientific Reports 4195:on 2 February 2023 4145:(2): COMMENT2001. 4059:10.1093/bib/bbs029 3803:10.1093/gbe/evy223 3594:BMC Bioinformatics 3098:10.1101/gr.6427907 3051:10.1038/nmeth.1613 2958:on 3 February 2020 2948:"Sequence masking" 2835:10.1093/bib/bbm048 2481:10.1101/gr.3866105 2182:Trends in Genetics 1999:10.1093/nar/gkw569 1838:BMC Bioinformatics 1741:on 9 February 2023 1576:(576124): 576124. 1368:aromatic compounds 1359:Pseudomonas putida 1273:Blessed annotator: 1162: 1061: 921:open reading frame 833:often refer to an 798: 618:Feature prediction 522:Evidence alignment 500:open reading frame 481:produced by large 349: 310:repetitive regions 251:non-coding regions 231: 129:open reading frame 56: 4938:10.1038/srep21332 4830:10.1038/srep38430 4610:(D1): D845–D855. 4525:978-1-4939-3167-5 4469:(12): 2462–2464. 4093:(24): 5303–5305. 3931:(14): 2068–2069. 3797:(11): 2899–2905. 3748:(17): i706–i714. 3706:978-1-0716-1503-4 3471:978-0-323-89775-4 3356:(22): 3045–3046. 3287:978-1-4939-6622-6 3141:(16): i174–i180. 2917:(19): 2460–2461. 2474:(12): 1777–1786. 2415:Genome Annotation 2042:(11): 4037–4051. 1993:(14): 6614–6624. 1952:978-0-12-811432-2 1779:978-1-4757-3783-7 1400:phenylacetic acid 1396:4-hydroxybenzoate 1328:Disease diagnosis 1199:sequence assembly 973:DNA binding sites 872:binary classifier 717:complementary DNA 612:sequence coverage 568:mass spectrometry 377:regulatory motifs 271:regulatory region 180:synonymous codons 72:genome annotation 60:molecular biology 5304: 5279: 5278: 5268: 5236: 5230: 5229: 5227: 5225: 5215:"NCBO Annotator" 5211: 5205: 5204: 5194: 5162: 5156: 5155: 5145: 5135: 5111: 5105: 5104: 5102: 5100: 5085: 5079: 5078: 5068: 5040: 5034: 5033: 5023: 5013: 4989: 4983: 4982: 4981: 4979: 4966: 4960: 4959: 4949: 4909: 4903: 4902: 4892: 4882: 4858: 4852: 4851: 4841: 4801: 4795: 4794: 4784: 4752: 4746: 4745: 4727: 4717: 4693: 4687: 4686: 4676: 4644: 4638: 4637: 4627: 4595: 4589: 4588: 4578: 4568: 4553:BMC Microbiology 4544: 4538: 4537: 4503: 4497: 4496: 4486: 4454: 4448: 4447: 4437: 4427: 4403: 4397: 4396: 4386: 4376: 4352: 4346: 4345: 4309: 4303: 4302: 4292: 4275:(2012): bas009. 4260: 4254: 4253: 4243: 4211: 4205: 4204: 4202: 4200: 4191:. Archived from 4189:www.sanger.ac.uk 4181: 4175: 4174: 4164: 4154: 4130: 4121: 4120: 4110: 4078: 4072: 4071: 4061: 4037: 4031: 4030: 4020: 3988: 3982: 3981: 3979: 3955: 3949: 3948: 3920: 3911: 3910: 3882: 3876: 3875: 3865: 3855: 3846:(3): 5194–5215. 3831: 3825: 3824: 3814: 3782: 3776: 3775: 3765: 3733: 3727: 3726: 3686: 3677: 3671: 3670: 3660: 3636: 3630: 3629: 3619: 3609: 3585: 3579: 3578: 3568: 3551:(6): 1557–1562. 3536: 3530: 3529: 3519: 3509: 3485: 3476: 3475: 3449: 3440: 3439: 3429: 3419: 3395: 3384: 3383: 3373: 3341: 3335: 3334: 3306: 3300: 3299: 3265: 3259: 3258: 3248: 3216: 3210: 3209: 3199: 3182:(9): 1105–1111. 3167: 3161: 3160: 3150: 3126: 3120: 3119: 3109: 3092:(9): 1362–1377. 3077: 3071: 3070: 3034: 3028: 3027: 3017: 3007: 2983: 2968: 2967: 2965: 2963: 2954:. Archived from 2943: 2937: 2936: 2926: 2902: 2896: 2895: 2859: 2848: 2847: 2837: 2813: 2807: 2806: 2796: 2786: 2762: 2756: 2755: 2745: 2721: 2715: 2714: 2678: 2663: 2662: 2660: 2658: 2653:on 4 August 2022 2649:. Archived from 2622: 2613: 2612: 2602: 2562: 2556: 2555: 2554: 2548: 2538: 2528: 2500: 2494: 2493: 2483: 2459: 2450: 2449: 2447: 2445: 2440:on 18 April 2023 2436:. Archived from 2409: 2388: 2387: 2377: 2345: 2339: 2338: 2328: 2311:(4): 1107–1115. 2296: 2290: 2289: 2279: 2247: 2241: 2240: 2212: 2206: 2205: 2177: 2171: 2170: 2160: 2128: 2119: 2118: 2108: 2076: 2070: 2069: 2059: 2027: 2021: 2020: 2010: 1978: 1965: 1964: 1930: 1917: 1916: 1880: 1874: 1873: 1863: 1853: 1844:(Suppl 11): S8. 1829: 1820: 1819: 1793: 1784: 1783: 1757: 1751: 1750: 1748: 1746: 1737:. Archived from 1726: 1720: 1719: 1700:10.1038/35080529 1683: 1668: 1667: 1657: 1647: 1623: 1606: 1605: 1595: 1585: 1561: 1305:Gene WikiProject 1131:species-specific 1127:multiple-species 1090:gene predictions 868:Machine learning 790:matrilin complex 652:, which include 642:coding sequences 604:machine learning 600:coding sequences 449:coding sequences 357:coding sequences 5314: 5313: 5307: 5306: 5305: 5303: 5302: 5301: 5287: 5286: 5283: 5282: 5238: 5237: 5233: 5223: 5221: 5213: 5212: 5208: 5164: 5163: 5159: 5113: 5112: 5108: 5098: 5096: 5087: 5086: 5082: 5042: 5041: 5037: 4991: 4990: 4986: 4977: 4975: 4968: 4967: 4963: 4911: 4910: 4906: 4860: 4859: 4855: 4803: 4802: 4798: 4754: 4753: 4749: 4695: 4694: 4690: 4646: 4645: 4641: 4597: 4596: 4592: 4559:(Suppl 1): S1. 4546: 4545: 4541: 4526: 4505: 4504: 4500: 4456: 4455: 4451: 4405: 4404: 4400: 4354: 4353: 4349: 4314:Nature Genetics 4311: 4310: 4306: 4262: 4261: 4257: 4213: 4212: 4208: 4198: 4196: 4183: 4182: 4178: 4132: 4131: 4124: 4080: 4079: 4075: 4039: 4038: 4034: 3990: 3989: 3985: 3964:Virtual Biology 3957: 3956: 3952: 3922: 3921: 3914: 3884: 3883: 3879: 3833: 3832: 3828: 3784: 3783: 3779: 3735: 3734: 3730: 3707: 3684: 3679: 3678: 3674: 3638: 3637: 3633: 3587: 3586: 3582: 3545:Protein Science 3538: 3537: 3533: 3487: 3486: 3479: 3472: 3451: 3450: 3443: 3397: 3396: 3387: 3343: 3342: 3338: 3308: 3307: 3303: 3288: 3267: 3266: 3262: 3218: 3217: 3213: 3169: 3168: 3164: 3128: 3127: 3123: 3086:Genome Research 3079: 3078: 3074: 3036: 3035: 3031: 2985: 2984: 2971: 2961: 2959: 2945: 2944: 2940: 2904: 2903: 2899: 2876:10.1038/nrg2814 2861: 2860: 2851: 2815: 2814: 2810: 2764: 2763: 2759: 2723: 2722: 2718: 2695:10.1038/nrg3174 2680: 2679: 2666: 2656: 2654: 2647: 2625:Kahl G (2015). 2624: 2623: 2616: 2577:(7422): 56–65. 2564: 2563: 2559: 2549: 2519:(4): e1001046. 2502: 2501: 2497: 2468:Genome Research 2461: 2460: 2453: 2443: 2441: 2434: 2411: 2410: 2391: 2347: 2346: 2342: 2298: 2297: 2293: 2249: 2248: 2244: 2214: 2213: 2209: 2179: 2178: 2174: 2130: 2129: 2122: 2078: 2077: 2073: 2029: 2028: 2024: 1980: 1979: 1968: 1953: 1932: 1931: 1920: 1897:10.1038/nrg1769 1882: 1881: 1877: 1831: 1830: 1823: 1816: 1795: 1794: 1787: 1780: 1759: 1758: 1754: 1744: 1742: 1728: 1727: 1723: 1685: 1684: 1671: 1625: 1624: 1609: 1563: 1562: 1558: 1553: 1469:protein domains 1439: 1339: 1330: 1325: 1317:RNA WikiProject 1249:Manual curation 1233: 1224: 1195: 1193:Quality control 1139: 1112: 1106: 1104:Genome browsers 1066: 1046: 989:viral infection 977:DNA replication 944:data analysis, 908: 854:sequence motifs 770: 760: 736: 666:content sensors 638:gene prediction 626: 624:Gene prediction 620: 576: 524: 414: 302: 290:model organisms 259:DNA methylation 201:directed graphs 119: 17: 12: 11: 5: 5312: 5311: 5308: 5300: 5299: 5289: 5288: 5281: 5280: 5231: 5206: 5157: 5106: 5080: 5045:Bioinformatics 5035: 4984: 4961: 4904: 4853: 4796: 4767:(1): 234–235. 4747: 4725:1854/LU-348539 4708:(2): 199–208. 4688: 4639: 4590: 4539: 4524: 4498: 4449: 4398: 4361:Biology Direct 4347: 4320:(4): 327–328. 4304: 4255: 4226:(4): 362–369. 4206: 4176: 4139:Genome Biology 4122: 4087:Bioinformatics 4073: 4052:(2): 131–143. 4032: 3983: 3977:10.12704/vb/e8 3950: 3925:Bioinformatics 3912: 3877: 3826: 3777: 3742:Bioinformatics 3728: 3705: 3672: 3645:Bioinformatics 3631: 3580: 3531: 3477: 3470: 3441: 3385: 3350:Bioinformatics 3336: 3317:(3): 264–272. 3301: 3286: 3270:Bioinformatics 3260: 3231:(5): 748–754. 3225:Bioinformatics 3211: 3176:Bioinformatics 3162: 3135:Bioinformatics 3121: 3072: 3045:(6): 469–477. 3039:Nature Methods 3029: 2969: 2938: 2911:Bioinformatics 2897: 2870:(8): 559–571. 2849: 2828:(6): 382–392. 2808: 2757: 2736:(3): 539–571. 2716: 2689:(5): 329–342. 2664: 2645: 2614: 2557: 2495: 2451: 2432: 2424:10.1201/b12682 2389: 2360:(2): 544–548. 2340: 2291: 2262:(1): r49–r62. 2242: 2223:(3): 199–209. 2207: 2188:(8): 316–320. 2172: 2120: 2091:(1): 141–156. 2071: 2022: 1966: 1951: 1918: 1891:(2): 130–141. 1875: 1821: 1814: 1785: 1778: 1752: 1721: 1694:(7): 493–503. 1669: 1607: 1555: 1554: 1552: 1549: 1548: 1547: 1542: 1537: 1532: 1527: 1522: 1517: 1512: 1507: 1502: 1497: 1492: 1438: 1435: 1338: 1337:Bioremediation 1335: 1329: 1326: 1324: 1321: 1288: 1287: 1280: 1270: 1258: 1252: 1243: 1240:Factory model: 1232: 1229: 1223: 1220: 1194: 1191: 1190: 1189: 1183: 1177: 1138: 1135: 1110:Genome browser 1108:Main article: 1105: 1102: 1070:genome browser 1065: 1062: 1045: 1042: 1010: 1009: 1003: 970: 969: 963: 950: 949: 935: 925:untranslatable 907: 904: 759: 756: 735: 732: 731: 730: 724: 706: 654:signal sensors 619: 616: 578:Annotation of 575: 572: 564:proteogenomics 523: 520: 488: 487: 472: 466: 452: 413: 410: 409: 408: 405: 402: 399: 301: 298: 288:) and several 148:Staden Package 140:DNA sequencing 118: 115: 84:coding regions 68:DNA annotation 15: 13: 10: 9: 6: 4: 3: 2: 5310: 5309: 5298: 5295: 5294: 5292: 5285: 5276: 5272: 5267: 5262: 5258: 5254: 5250: 5246: 5242: 5235: 5232: 5220: 5216: 5210: 5207: 5202: 5198: 5193: 5188: 5184: 5180: 5176: 5172: 5168: 5161: 5158: 5153: 5149: 5144: 5139: 5134: 5129: 5125: 5121: 5117: 5110: 5107: 5095: 5091: 5084: 5081: 5076: 5072: 5067: 5062: 5058: 5054: 5050: 5046: 5039: 5036: 5031: 5027: 5022: 5017: 5012: 5007: 5003: 4999: 4995: 4988: 4985: 4973: 4972: 4965: 4962: 4957: 4953: 4948: 4943: 4939: 4935: 4931: 4927: 4923: 4919: 4915: 4908: 4905: 4900: 4896: 4891: 4886: 4881: 4876: 4872: 4868: 4864: 4857: 4854: 4849: 4845: 4840: 4835: 4831: 4827: 4823: 4819: 4815: 4811: 4807: 4800: 4797: 4792: 4788: 4783: 4778: 4774: 4770: 4766: 4762: 4758: 4751: 4748: 4743: 4739: 4735: 4731: 4726: 4721: 4716: 4711: 4707: 4703: 4699: 4692: 4689: 4684: 4680: 4675: 4670: 4666: 4662: 4658: 4654: 4650: 4643: 4640: 4635: 4631: 4626: 4621: 4617: 4613: 4609: 4605: 4601: 4594: 4591: 4586: 4582: 4577: 4572: 4567: 4562: 4558: 4554: 4550: 4543: 4540: 4535: 4531: 4527: 4521: 4517: 4513: 4509: 4502: 4499: 4494: 4490: 4485: 4480: 4476: 4472: 4468: 4464: 4460: 4453: 4450: 4445: 4441: 4436: 4431: 4426: 4421: 4417: 4413: 4409: 4402: 4399: 4394: 4390: 4385: 4380: 4375: 4370: 4366: 4362: 4358: 4351: 4348: 4343: 4339: 4335: 4331: 4327: 4326:10.1038/74125 4323: 4319: 4315: 4308: 4305: 4300: 4296: 4291: 4286: 4282: 4278: 4274: 4270: 4266: 4259: 4256: 4251: 4247: 4242: 4237: 4233: 4229: 4225: 4221: 4217: 4210: 4207: 4194: 4190: 4186: 4180: 4177: 4172: 4168: 4163: 4158: 4153: 4148: 4144: 4140: 4136: 4129: 4127: 4123: 4118: 4114: 4109: 4104: 4100: 4096: 4092: 4088: 4084: 4077: 4074: 4069: 4065: 4060: 4055: 4051: 4047: 4043: 4036: 4033: 4028: 4024: 4019: 4014: 4010: 4006: 4002: 3998: 3994: 3987: 3984: 3978: 3973: 3969: 3965: 3961: 3954: 3951: 3946: 3942: 3938: 3934: 3930: 3926: 3919: 3917: 3913: 3908: 3904: 3900: 3896: 3892: 3888: 3881: 3878: 3873: 3869: 3864: 3859: 3854: 3849: 3845: 3841: 3837: 3830: 3827: 3822: 3818: 3813: 3808: 3804: 3800: 3796: 3792: 3788: 3781: 3778: 3773: 3769: 3764: 3759: 3755: 3751: 3747: 3743: 3739: 3732: 3729: 3724: 3720: 3716: 3712: 3708: 3702: 3698: 3694: 3690: 3683: 3676: 3673: 3668: 3664: 3659: 3654: 3650: 3646: 3642: 3635: 3632: 3627: 3623: 3618: 3613: 3608: 3603: 3599: 3595: 3591: 3584: 3581: 3576: 3572: 3567: 3562: 3558: 3554: 3550: 3546: 3542: 3535: 3532: 3527: 3523: 3518: 3513: 3508: 3503: 3499: 3495: 3491: 3484: 3482: 3478: 3473: 3467: 3463: 3459: 3455: 3448: 3446: 3442: 3437: 3433: 3428: 3423: 3418: 3413: 3409: 3405: 3401: 3394: 3392: 3390: 3386: 3381: 3377: 3372: 3367: 3363: 3359: 3355: 3351: 3347: 3340: 3337: 3332: 3328: 3324: 3320: 3316: 3312: 3305: 3302: 3297: 3293: 3289: 3283: 3279: 3275: 3271: 3264: 3261: 3256: 3252: 3247: 3242: 3238: 3234: 3230: 3226: 3222: 3215: 3212: 3207: 3203: 3198: 3193: 3189: 3185: 3181: 3177: 3173: 3166: 3163: 3158: 3154: 3149: 3144: 3140: 3136: 3132: 3125: 3122: 3117: 3113: 3108: 3103: 3099: 3095: 3091: 3087: 3083: 3076: 3073: 3068: 3064: 3060: 3056: 3052: 3048: 3044: 3040: 3033: 3030: 3025: 3021: 3016: 3011: 3006: 3001: 2997: 2993: 2989: 2982: 2980: 2978: 2976: 2974: 2970: 2957: 2953: 2949: 2942: 2939: 2934: 2930: 2925: 2920: 2916: 2912: 2908: 2901: 2898: 2893: 2889: 2885: 2881: 2877: 2873: 2869: 2865: 2858: 2856: 2854: 2850: 2845: 2841: 2836: 2831: 2827: 2823: 2819: 2812: 2809: 2804: 2800: 2795: 2790: 2785: 2780: 2776: 2772: 2768: 2761: 2758: 2753: 2749: 2744: 2739: 2735: 2731: 2727: 2720: 2717: 2712: 2708: 2704: 2700: 2696: 2692: 2688: 2684: 2677: 2675: 2673: 2671: 2669: 2665: 2652: 2648: 2646:9783527678679 2642: 2638: 2634: 2630: 2629: 2621: 2619: 2615: 2610: 2606: 2601: 2596: 2592: 2588: 2584: 2580: 2576: 2572: 2568: 2561: 2558: 2553: 2546: 2542: 2537: 2532: 2527: 2522: 2518: 2514: 2510: 2506: 2499: 2496: 2491: 2487: 2482: 2477: 2473: 2469: 2465: 2458: 2456: 2452: 2439: 2435: 2433:9780429064012 2429: 2425: 2421: 2417: 2416: 2408: 2406: 2404: 2402: 2400: 2398: 2396: 2394: 2390: 2385: 2381: 2376: 2371: 2367: 2363: 2359: 2355: 2351: 2344: 2341: 2336: 2332: 2327: 2322: 2318: 2314: 2310: 2306: 2302: 2295: 2292: 2287: 2283: 2278: 2273: 2269: 2265: 2261: 2257: 2253: 2246: 2243: 2238: 2234: 2230: 2226: 2222: 2218: 2211: 2208: 2203: 2199: 2195: 2191: 2187: 2183: 2176: 2173: 2168: 2164: 2159: 2154: 2150: 2146: 2142: 2138: 2134: 2127: 2125: 2121: 2116: 2112: 2107: 2102: 2098: 2094: 2090: 2086: 2082: 2075: 2072: 2067: 2063: 2058: 2053: 2049: 2045: 2041: 2037: 2033: 2026: 2023: 2018: 2014: 2009: 2004: 2000: 1996: 1992: 1988: 1984: 1977: 1975: 1973: 1971: 1967: 1962: 1958: 1954: 1948: 1944: 1940: 1936: 1929: 1927: 1925: 1923: 1919: 1914: 1910: 1906: 1902: 1898: 1894: 1890: 1886: 1879: 1876: 1871: 1867: 1862: 1857: 1852: 1847: 1843: 1839: 1835: 1828: 1826: 1822: 1817: 1815:9780323897754 1811: 1807: 1803: 1799: 1792: 1790: 1786: 1781: 1775: 1771: 1767: 1763: 1756: 1753: 1740: 1736: 1732: 1725: 1722: 1717: 1713: 1709: 1705: 1701: 1697: 1693: 1689: 1682: 1680: 1678: 1676: 1674: 1670: 1665: 1661: 1656: 1651: 1646: 1641: 1637: 1633: 1632:F1000Research 1629: 1622: 1620: 1618: 1616: 1614: 1612: 1608: 1603: 1599: 1594: 1589: 1584: 1579: 1575: 1571: 1567: 1560: 1557: 1550: 1546: 1543: 1541: 1538: 1536: 1533: 1531: 1528: 1526: 1523: 1521: 1518: 1516: 1513: 1511: 1508: 1506: 1503: 1501: 1498: 1496: 1493: 1491: 1488: 1487: 1486: 1484: 1479: 1477: 1472: 1470: 1466: 1461: 1459: 1454: 1452: 1448: 1444: 1436: 1434: 1432: 1428: 1427: 1422: 1418: 1417: 1411: 1410:Gene Ontology 1407: 1405: 1401: 1397: 1393: 1389: 1385: 1381: 1377: 1373: 1369: 1365: 1361: 1360: 1355: 1351: 1347: 1344: 1336: 1334: 1327: 1322: 1320: 1318: 1314: 1310: 1306: 1301: 1298: 1293: 1284: 1281: 1278: 1274: 1271: 1268: 1267: 1262: 1259: 1256: 1253: 1250: 1247: 1246:Museum model: 1244: 1241: 1238: 1237: 1236: 1230: 1228: 1222:Re-annotation 1221: 1219: 1215: 1213: 1209: 1205: 1200: 1192: 1187: 1184: 1181: 1178: 1174: 1171: 1170: 1169: 1166: 1159: 1155: 1152: 1148: 1143: 1136: 1134: 1132: 1128: 1124: 1120: 1115: 1111: 1103: 1101: 1099: 1095: 1091: 1087: 1083: 1079: 1075: 1071: 1063: 1058: 1056: 1050: 1044:Visualization 1043: 1041: 1038: 1034: 1030: 1026: 1022: 1018: 1014: 1013:Noncoding RNA 1007: 1004: 1001: 997: 994: 993: 992: 990: 986: 982: 978: 974: 967: 964: 961: 958: 957: 956: 954: 947: 943: 939: 936: 933: 930: 929: 928: 926: 922: 918: 914: 912: 905: 903: 901: 897: 891: 889: 885: 881: 877: 873: 869: 865: 863: 859: 855: 851: 848: 844: 839: 836: 831: 827: 823: 819: 815: 810: 808: 804: 803:Gene Ontology 795: 791: 787: 783: 779: 778:Gene Ontology 774: 769: 765: 764:Gene Ontology 757: 755: 753: 749: 745: 741: 733: 728: 725: 722: 718: 714: 710: 707: 704: 700: 698: 694: 693: 692: 690: 686: 683: 679: 675: 671: 667: 663: 659: 655: 651: 647: 643: 639: 635: 631: 625: 617: 615: 613: 609: 605: 601: 597: 593: 589: 585: 581: 573: 571: 569: 565: 561: 557: 553: 548: 544: 539: 537: 533: 529: 521: 519: 517: 513: 509: 505: 501: 497: 493: 484: 480: 476: 473: 470: 467: 464: 460: 456: 453: 450: 445: 441: 439: 435: 434: 433: 429: 427: 423: 419: 411: 406: 403: 400: 397: 396: 395: 393: 389: 386: 382: 378: 374: 370: 366: 362: 358: 354: 346: 342: 338: 334: 330: 326: 322: 318: 315: 311: 306: 299: 297: 295: 291: 287: 282: 280: 276: 272: 268: 264: 260: 256: 252: 247: 245: 241: 237: 227: 223: 217: 213: 210: 206: 205:transcription 202: 198: 197:Markov models 194: 190: 189: 183: 181: 177: 173: 169: 165: 161: 157: 153: 149: 145: 141: 138: 134: 133:Maxam-Gilbert 130: 126: 125: 116: 114: 112: 108: 103: 100: 96: 92: 87: 85: 81: 77: 73: 69: 65: 61: 54:respectively. 52: 51:transcription 47: 43: 39: 35: 31: 28: 27: 21: 5284: 5248: 5244: 5234: 5222:. Retrieved 5218: 5209: 5174: 5170: 5160: 5123: 5119: 5109: 5097:. Retrieved 5093: 5083: 5048: 5044: 5038: 5001: 4997: 4987: 4976:, retrieved 4970: 4964: 4924:(1): 21332. 4921: 4917: 4907: 4870: 4866: 4856: 4816:(1): 38430. 4813: 4809: 4799: 4764: 4760: 4750: 4705: 4701: 4691: 4656: 4652: 4642: 4607: 4603: 4593: 4556: 4552: 4542: 4507: 4501: 4466: 4462: 4452: 4415: 4412:PLOS Biology 4411: 4401: 4364: 4360: 4350: 4317: 4313: 4307: 4272: 4268: 4258: 4223: 4219: 4209: 4197:. Retrieved 4193:the original 4188: 4179: 4142: 4138: 4090: 4086: 4076: 4049: 4045: 4035: 4000: 3996: 3986: 3967: 3963: 3953: 3928: 3924: 3890: 3886: 3880: 3843: 3839: 3829: 3794: 3790: 3780: 3745: 3741: 3731: 3688: 3675: 3648: 3644: 3634: 3597: 3593: 3583: 3548: 3544: 3534: 3497: 3493: 3453: 3407: 3403: 3353: 3349: 3339: 3314: 3310: 3304: 3269: 3263: 3228: 3224: 3214: 3179: 3175: 3165: 3138: 3134: 3124: 3089: 3085: 3075: 3042: 3038: 3032: 2995: 2991: 2960:. Retrieved 2956:the original 2951: 2941: 2914: 2910: 2900: 2867: 2863: 2825: 2821: 2811: 2774: 2770: 2760: 2733: 2729: 2719: 2686: 2682: 2655:. Retrieved 2651:the original 2627: 2574: 2570: 2560: 2516: 2513:PLOS Biology 2512: 2498: 2471: 2467: 2442:. Retrieved 2438:the original 2414: 2357: 2353: 2343: 2308: 2304: 2294: 2259: 2255: 2245: 2220: 2216: 2210: 2185: 2181: 2175: 2140: 2136: 2088: 2084: 2074: 2039: 2035: 2025: 1990: 1986: 1934: 1888: 1884: 1878: 1841: 1837: 1797: 1761: 1755: 1743:. Retrieved 1739:the original 1734: 1724: 1691: 1687: 1638:(148): 148. 1635: 1631: 1573: 1569: 1559: 1480: 1473: 1462: 1455: 1440: 1425: 1421:heavy metals 1415: 1408: 1357: 1348:involved in 1340: 1331: 1323:Applications 1302: 1297:unsupervised 1296: 1291: 1289: 1282: 1272: 1264: 1260: 1254: 1245: 1239: 1234: 1225: 1216: 1196: 1185: 1179: 1176:annotations. 1172: 1163: 1147:type species 1130: 1126: 1122: 1118: 1116: 1113: 1067: 1064:File formats 1053: 1036: 1011: 1005: 995: 971: 965: 959: 951: 937: 931: 915: 909: 892: 866: 840: 811: 799: 737: 726: 708: 696: 695: 665: 653: 649: 627: 584:RNA splicing 577: 540: 525: 512:hard masking 511: 508:soft masking 507: 491: 489: 474: 468: 462: 454: 437: 436: 430: 426:human genome 415: 373:splice sites 350: 337:splice sites 283: 248: 235: 232: 225: 221: 192: 186: 184: 122: 120: 110: 106: 104: 88: 82:and all the 71: 67: 57: 40:) made with 24: 4418:(7): e175. 3893:: 279–298. 3689:Pseudogenes 2777:(4): 2072. 1735:MedicineNet 1495:Entrez Gene 1451:prokaryotic 1441:Genes in a 1364:naphthalene 1350:hydrocarbon 946:dN/dS ratio 917:Pseudogenes 850:amino acids 847:hydrophobic 843:hydrophilic 784:taken from 776:An example 748:development 719:(cDNA), or 670:prokaryotic 662:polyA sites 556:frameshifts 496:nucleotides 422:transposons 209:translation 172:amino acids 36:accession: 30:chloroplast 5224:8 February 5088:Martin R. 5004:(1): 205. 4873:(30): 30. 4659:: baw034. 4003:(9): e67. 3600:(1): 466. 3410:: e12019. 2998:(9): 295. 2952:drive5.com 1551:References 1443:eukaryotic 1388:salicylate 1366:and other 1292:supervised 1173:Dot Plots: 886:(kNN) and 762:See also: 752:metabolism 744:cell death 740:cell cycle 723:sequences. 689:eukaryotic 580:eukaryotic 333:eukaryotic 164:translated 46:GC content 42:Chloroplot 38:MF385003.1 4367:(1): 12. 3970:(1): 15. 3723:235625288 3067:205419756 2946:Edgar R. 2505:Becker PB 1961:226248103 1343:catabolic 1208:precision 1037:Ab initio 1000:conserved 835:analogous 826:orthology 818:alignment 727:Combiners 697:Ab initio 658:promoters 634:noncoding 590:in which 516:alignment 486:question. 483:insertion 392:promoters 345:noncoding 335:genomes, 314:assembled 275:pangenome 263:chromatin 240:alignment 236:ab initio 226:ab initio 222:ab initio 193:ab initio 124:ab initio 95:assembled 91:sequenced 5291:Category 5275:23161684 5201:33270901 5152:34739369 5099:25 April 5075:33258916 5030:33879057 4978:25 April 4956:26888254 4899:25945155 4848:27924916 4791:23469351 4742:15173391 4734:19709279 4683:27009807 4653:Database 4634:31680165 4585:19278549 4534:26519402 4493:18945806 4444:18613750 4393:20167071 4334:10742085 4299:22434843 4269:Database 4250:21255336 4199:28 March 4171:11864365 4117:31350879 4068:22764121 4027:28100700 3945:24642063 3907:17506659 3872:25756377 3821:30364947 3772:30423092 3715:34165706 3667:12855458 3626:33076816 3575:16672244 3526:32391061 3436:34513334 3380:19744993 3331:15193305 3296:27896725 3255:29069314 3206:19289445 3157:18689821 3116:17690205 3059:21623353 3024:32962098 2962:25 April 2933:20709691 2884:20628352 2844:17932080 2803:33669810 2752:19396957 2703:22510764 2657:24 April 2609:23128226 2545:21526222 2490:16339376 2444:18 April 2017:27342282 1913:13107786 1905:16418748 1870:19811692 1745:17 April 1716:12044602 1708:11433356 1664:29568489 1602:33101394 1545:WormBase 1447:pipeline 1437:Software 1392:benzoate 1212:accuracy 1154:families 1029:microRNA 830:xenology 822:paralogy 814:homology 715:(ESTs), 547:isoforms 536:proteins 530:(ESTs), 459:homology 355:(ORFs), 329:proteins 244:homology 176:ribosome 99:database 64:genetics 5266:3531119 5192:7779008 5143:8743544 5090:"MOSGA" 5021:8056616 4947:4758049 4926:Bibcode 4890:4286145 4839:5141477 4818:Bibcode 4782:3587945 4674:4805243 4625:7145631 4576:2654661 4484:2590952 4435:2443188 4384:2834641 4342:5354139 4290:3308165 4241:3815804 4108:6954651 4018:5605237 3863:4394471 3812:6239678 3763:6129265 3566:2242553 3517:7193026 3500:: 400. 3427:8395570 3371:2773257 3246:6192213 3197:2672628 3107:1950905 3015:7565776 2992:Biology 2892:6617359 2794:7922087 2711:3352427 2600:3498066 2579:Bibcode 2536:3079585 2507:(ed.). 2384:9421513 2335:9461475 2286:6986610 2237:6751939 2202:8783942 2167:6694906 2115:7063399 2008:5001611 1861:3226197 1655:5850084 1593:7545089 1535:Uniprot 1520:GeneRIF 1510:GENCODE 1505:FlyBase 1500:Ensembl 1372:glucose 1346:enzymes 1096:, GTF, 942:RNA-Seq 900:hashing 786:QuickGO 721:protein 699:methods 650:sensors 592:introns 552:operons 543:RNA-Seq 492:Masking 463:de novo 440:methods 438:De novo 418:repeats 369:repeats 365:introns 359:(CDS), 261:sites, 174:to the 146:is the 117:History 34:GenBank 5273:  5263:  5199:  5189:  5150:  5140:  5126:(11). 5073:  5028:  5018:  4954:  4944:  4897:  4887:  4846:  4836:  4789:  4779:  4740:  4732:  4681:  4671:  4632:  4622:  4583:  4573:  4532:  4522:  4491:  4481:  4442:  4432:  4391:  4381:  4340:  4332:  4297:  4287:  4248:  4238:  4169:  4162:139008 4159:  4115:  4105:  4066:  4025:  4015:  3943:  3905:  3870:  3860:  3819:  3809:  3770:  3760:  3721:  3713:  3703:  3665:  3624:  3617:574302 3614:  3573:  3563:  3524:  3514:  3468:  3434:  3424:  3378:  3368:  3329:  3294:  3284:  3253:  3243:  3204:  3194:  3155:  3114:  3104:  3065:  3057:  3022:  3012:  2931:  2890:  2882:  2842:  2801:  2791:  2750:  2709:  2701:  2643:  2607:  2597:  2571:Nature 2543:  2533:  2488:  2430:  2382:  2375:147303 2372:  2333:  2326:147337 2323:  2284:  2277:327256 2274:  2235:  2200:  2165:  2158:321069 2155:  2113:  2106:326122 2103:  2066:593900 2064:  2057:343220 2054:  2015:  2005:  1959:  1949:  1911:  1903:  1868:  1858:  1812:  1776:  1714:  1706:  1662:  1652:  1600:  1590:  1530:RefSeq 1404:operon 1204:recall 1158:genera 1082:codons 1074:intron 1027:, and 1025:snoRNA 987:, and 981:repair 898:or by 685:codons 664:, and 630:coding 390:, and 388:codons 341:coding 327:, and 317:genome 312:of an 229:begun. 137:Sanger 76:genome 4738:S2CID 4338:S2CID 3719:S2CID 3685:(PDF) 3404:PeerJ 3063:S2CID 2888:S2CID 2707:S2CID 1957:S2CID 1909:S2CID 1712:S2CID 1476:MAKER 1384:genes 1370:over 1313:stubs 1286:data. 1151:viral 828:, or 678:start 596:exons 381:start 361:exons 294:human 279:clade 168:tRNAs 156:codon 144:reads 80:genes 5271:PMID 5226:2023 5197:PMID 5148:PMID 5101:2022 5071:PMID 5026:PMID 4980:2022 4971:GAAS 4952:PMID 4895:PMID 4844:PMID 4787:PMID 4730:PMID 4679:PMID 4657:2016 4630:PMID 4581:PMID 4530:PMID 4520:ISBN 4489:PMID 4440:PMID 4389:PMID 4330:PMID 4295:PMID 4273:2012 4246:PMID 4201:2023 4167:PMID 4113:PMID 4064:PMID 4023:PMID 3941:PMID 3903:PMID 3868:PMID 3817:PMID 3768:PMID 3711:PMID 3701:ISBN 3663:PMID 3622:PMID 3571:PMID 3522:PMID 3466:ISBN 3432:PMID 3376:PMID 3327:PMID 3292:PMID 3282:ISBN 3251:PMID 3202:PMID 3153:PMID 3112:PMID 3055:PMID 3020:PMID 2964:2023 2929:PMID 2880:PMID 2840:PMID 2799:PMID 2748:PMID 2699:PMID 2659:2023 2641:ISBN 2605:PMID 2541:PMID 2486:PMID 2446:2023 2428:ISBN 2380:PMID 2331:PMID 2282:PMID 2233:PMID 2217:Gene 2198:PMID 2163:PMID 2111:PMID 2062:PMID 2013:PMID 1947:ISBN 1901:PMID 1866:PMID 1810:ISBN 1774:ISBN 1747:2023 1704:PMID 1660:PMID 1598:PMID 1465:dcGO 1456:The 1210:and 1156:and 1121:and 1094:GFF3 1088:and 1078:exon 1033:mRNA 1021:rRNA 1017:tRNA 979:and 845:and 766:and 682:stop 680:and 660:and 632:and 586:, a 558:and 534:and 532:RNAs 504:exon 385:stop 383:and 343:and 325:RNAs 321:ESTs 269:and 207:and 154:and 152:base 135:and 93:and 62:and 5297:DNA 5261:PMC 5253:doi 5187:PMC 5179:doi 5138:PMC 5128:doi 5061:hdl 5053:doi 5016:PMC 5006:doi 4942:PMC 4934:doi 4885:PMC 4875:doi 4834:PMC 4826:doi 4777:PMC 4769:doi 4720:hdl 4710:doi 4669:PMC 4661:doi 4620:PMC 4612:doi 4571:PMC 4561:doi 4512:doi 4479:PMC 4471:doi 4463:RNA 4430:PMC 4420:doi 4379:PMC 4369:doi 4322:doi 4285:PMC 4277:doi 4236:PMC 4228:doi 4157:PMC 4147:doi 4103:PMC 4095:doi 4054:doi 4013:PMC 4005:doi 3972:doi 3933:doi 3895:doi 3858:PMC 3848:doi 3807:PMC 3799:doi 3758:PMC 3750:doi 3693:doi 3653:doi 3612:PMC 3602:doi 3561:PMC 3553:doi 3512:PMC 3502:doi 3458:doi 3422:PMC 3412:doi 3366:PMC 3358:doi 3319:doi 3274:doi 3241:PMC 3233:doi 3192:PMC 3184:doi 3143:doi 3102:PMC 3094:doi 3047:doi 3010:PMC 3000:doi 2919:doi 2872:doi 2830:doi 2789:PMC 2779:doi 2738:doi 2691:doi 2633:doi 2595:PMC 2587:doi 2575:491 2531:PMC 2521:doi 2476:doi 2420:doi 2370:PMC 2362:doi 2321:PMC 2313:doi 2272:PMC 2264:doi 2225:doi 2190:doi 2153:PMC 2145:doi 2101:PMC 2093:doi 2052:PMC 2044:doi 2003:PMC 1995:doi 1939:doi 1893:doi 1856:PMC 1846:doi 1802:doi 1766:doi 1696:doi 1650:PMC 1640:doi 1588:PMC 1578:doi 1431:DDT 1309:bot 1098:BED 541:If 267:RNA 70:or 58:In 5293:: 5269:. 5259:. 5249:41 5247:. 5243:. 5217:. 5195:. 5185:. 5175:49 5173:. 5169:. 5146:. 5136:. 5122:. 5118:. 5092:. 5069:. 5059:. 5049:36 5047:. 5024:. 5014:. 5002:22 5000:. 4996:. 4950:. 4940:. 4932:. 4920:. 4916:. 4893:. 4883:. 4869:. 4865:. 4842:. 4832:. 4824:. 4812:. 4808:. 4785:. 4775:. 4763:. 4759:. 4736:. 4728:. 4718:. 4706:42 4704:. 4700:. 4677:. 4667:. 4655:. 4651:. 4628:. 4618:. 4608:48 4606:. 4602:. 4579:. 4569:. 4555:. 4551:. 4528:. 4518:. 4487:. 4477:. 4467:14 4465:. 4461:. 4438:. 4428:. 4414:. 4410:. 4387:. 4377:. 4363:. 4359:. 4336:. 4328:. 4318:24 4316:. 4293:. 4283:. 4271:. 4267:. 4244:. 4234:. 4222:. 4218:. 4187:. 4165:. 4155:. 4141:. 4137:. 4125:^ 4111:. 4101:. 4091:35 4089:. 4085:. 4062:. 4050:14 4048:. 4044:. 4021:. 4011:. 4001:45 3999:. 3995:. 3966:. 3962:. 3939:. 3929:30 3927:. 3915:^ 3901:. 3889:. 3866:. 3856:. 3844:16 3842:. 3838:. 3815:. 3805:. 3795:10 3793:. 3789:. 3766:. 3756:. 3746:34 3744:. 3740:. 3717:. 3709:. 3699:. 3661:. 3649:19 3647:. 3643:. 3620:. 3610:. 3598:21 3596:. 3592:. 3569:. 3559:. 3549:15 3547:. 3543:. 3520:. 3510:. 3498:11 3496:. 3492:. 3480:^ 3464:. 3444:^ 3430:. 3420:. 3406:. 3402:. 3388:^ 3374:. 3364:. 3354:25 3352:. 3348:. 3325:. 3315:14 3313:. 3290:. 3280:. 3249:. 3239:. 3229:34 3227:. 3223:. 3200:. 3190:. 3180:25 3178:. 3174:. 3151:. 3139:24 3137:. 3133:. 3110:. 3100:. 3090:17 3088:. 3084:. 3061:. 3053:. 3041:. 3018:. 3008:. 2994:. 2990:. 2972:^ 2950:. 2927:. 2915:26 2913:. 2909:. 2886:. 2878:. 2868:11 2866:. 2852:^ 2838:. 2824:. 2820:. 2797:. 2787:. 2775:22 2773:. 2769:. 2746:. 2734:33 2732:. 2728:. 2705:. 2697:. 2687:13 2685:. 2667:^ 2639:. 2617:^ 2603:. 2593:. 2585:. 2573:. 2569:. 2539:. 2529:. 2515:. 2511:. 2484:. 2472:15 2470:. 2466:. 2454:^ 2426:. 2392:^ 2378:. 2368:. 2358:26 2356:. 2352:. 2329:. 2319:. 2309:26 2307:. 2303:. 2280:. 2270:. 2258:. 2254:. 2231:. 2221:18 2219:. 2196:. 2186:12 2184:. 2161:. 2151:. 2141:12 2139:. 2135:. 2123:^ 2109:. 2099:. 2089:10 2087:. 2083:. 2060:. 2050:. 2038:. 2034:. 2011:. 2001:. 1991:44 1989:. 1985:. 1969:^ 1955:. 1945:. 1921:^ 1907:. 1899:. 1887:. 1864:. 1854:. 1842:10 1840:. 1836:. 1824:^ 1808:. 1788:^ 1772:. 1733:. 1710:. 1702:. 1690:. 1672:^ 1658:. 1648:. 1634:. 1630:. 1610:^ 1596:. 1586:. 1574:11 1572:. 1568:. 1478:. 1398:, 1394:, 1390:, 1206:, 1023:, 1019:, 983:, 824:, 750:, 746:, 742:, 687:, 570:. 379:, 375:, 371:, 367:, 363:, 323:, 257:, 246:. 66:, 5277:. 5255:: 5228:. 5203:. 5181:: 5154:. 5130:: 5124:7 5103:. 5077:. 5063:: 5055:: 5032:. 5008:: 4958:. 4936:: 4928:: 4922:6 4901:. 4877:: 4871:9 4850:. 4828:: 4820:: 4814:6 4793:. 4771:: 4765:1 4744:. 4722:: 4712:: 4685:. 4663:: 4636:. 4614:: 4587:. 4563:: 4557:9 4536:. 4514:: 4495:. 4473:: 4446:. 4422:: 4416:6 4395:. 4371:: 4365:5 4344:. 4324:: 4301:. 4279:: 4252:. 4230:: 4224:3 4203:. 4173:. 4149:: 4143:3 4119:. 4097:: 4070:. 4056:: 4029:. 4007:: 3980:. 3974:: 3968:1 3947:. 3935:: 3909:. 3897:: 3891:8 3874:. 3850:: 3823:. 3801:: 3774:. 3752:: 3725:. 3695:: 3669:. 3655:: 3628:. 3604:: 3577:. 3555:: 3528:. 3504:: 3474:. 3460:: 3438:. 3414:: 3408:9 3382:. 3360:: 3333:. 3321:: 3298:. 3276:: 3257:. 3235:: 3208:. 3186:: 3159:. 3145:: 3118:. 3096:: 3069:. 3049:: 3043:8 3026:. 3002:: 2996:9 2966:. 2935:. 2921:: 2894:. 2874:: 2846:. 2832:: 2826:8 2805:. 2781:: 2754:. 2740:: 2713:. 2693:: 2661:. 2635:: 2611:. 2589:: 2581:: 2547:. 2523:: 2517:9 2492:. 2478:: 2448:. 2422:: 2386:. 2364:: 2337:. 2315:: 2288:. 2266:: 2260:8 2239:. 2227:: 2204:. 2192:: 2169:. 2147:: 2117:. 2095:: 2068:. 2046:: 2040:4 2019:. 1997:: 1963:. 1941:: 1915:. 1895:: 1889:7 1872:. 1848:: 1818:. 1804:: 1782:. 1768:: 1749:. 1718:. 1698:: 1692:2 1666:. 1642:: 1636:7 1604:. 1580:: 1076:- 1057:' 1002:.

Index


Porphyra umbilicalis
chloroplast
GenBank
MF385003.1
Chloroplot
GC content
transcription
molecular biology
genetics
genome
genes
coding regions
sequenced
assembled
database
ab initio
open reading frame
Maxam-Gilbert
Sanger
DNA sequencing
reads
Staden Package
base
codon
protein coding sequence
translated
tRNAs
amino acids
ribosome

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.