1114:
and in more recent times, it was being defined as genetic code that is transcribed into RNA. Although the definition of a gene has evolved greatly over the last century, it has remained a challenging and controversial subject for many researchers. With the advent of the ENCODE/GENCODE project, even more problematic aspects of the definition have been uncovered, including alternative splicing (where a series of exons are separated by introns), intergenic transcriptions, and the complex patterns of dispersed regulation, together with non-genic conservation and the abundance of noncoding RNA genes. As GENCODE endeavours to build an encyclopaedia of genes and gene variants, these problems presented a mounting challenge for the GENCODE project to come up with an updated notion of a gene.
780:
756:
768:
823:
sites and incorrect biotypes. These are fed back to the manual annotators using the AnnoTrack tracking system. Some of these pipelines use data from other ENCODE subgroups including RNASeq data, histone modification and CAGE and Ditag data. RNAseq data is an important new source of evidence, but generating complete gene models from it is a difficult problem. As part of GENCODE, a competition was run to assess the quality of predictions produced by various RNAseq prediction pipelines (Refer to
312:
search for appropriate guide sentences by listing potential binding sites for the CRISPR/Cas9 complex that are next to transcribed regions, or within 200 bp of one. For each site, the track provides possible guide sequences along with a collection of predicted efficiency and specificity scores for those guide sequences. It also provides information about potential off-targets, grouped by the number of missmatches between the off-target and the guide.
796:
848:
242:
844:
reference chromosomes and stored in separated files which include: Gene annotation, PolyA features annotated by HAVANA, (Retrotransposed) pseudogenes predicted by the Yale & UCSC pipelines, but not by HAVANA, long non-coding RNAs, and tRNA structures predicted by tRNA-Scan. Some examples of the lines in the GTF format are shown below:
255:
part of this stage, the GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. It was envisaged that the results of the first two phases will be used to determine the best path forward for analysing the remaining 99% of the human genome in a cost-effective and comprehensive production phase.
320:
discovery of novel loci and novel transcripts at existing loci. Also, given the COVID-19 pandemic during 2020, there has been an urge to support research responding to the situation, so GENCODE has reviewed and improved the annotation for a set of protein-coding genes associated with SARSCoV-2 infection.
721:
Through advancements in sequencing technologies (such as RT-PCR-seq), increased coverage from manual annotations (HAVANA group), and improvements to automatic annotation algorithms using
Ensembl, the accuracy and completeness of GENCODE annotations have been continuously refined through its iteration
822:
There are several analysis groups in the GENCODE consortium that run pipelines that aid the manual annotators in producing models in unannotated regions, and to identify potential missed or incorrect manual annotation, including completely missing loci, missing alternative isoforms, incorrect splice
311:
In 2018, one of the latest additions to the GENCODE project was the CRISPR/Cas9 track on human and model organism assemblies. CRISPR is a genome editing technique that uses sequences of RNA that successfully bind to the region edited with high specificity. The new track was designed to assist in the
262:
The first release of the annotation of the 44 ENCODE regions was frozen on 29 April 2005 and was used in the first ENCODE Genome
Annotation Assessment Project (E-GASP) workshop. GENCODE Release 1 contained 416 known loci, 26 novel (coding DNA sequence) CDS loci, 82 novel transcript loci, 78 putative
1168:
RGASP is organised in a consortium framework modelled after the EGASP (ENCODE Genome
Annotation Assessment Project) gene prediction workshop, and two rounds of workshops have been conducted to address different aspects of RNA-seq analysis as well as changing sequencing technologies and formats. One
1151:
A key research area of the GENCODE project was to investigate the biological significance of long non-coding RNAs (lncRNA). To better understand the lncRNA expression in Humans, a sub project was created by GENCODE to develop custom microarray platforms capable of quantifying the transcripts in the
1113:
The definition of a "gene" has never been a trivial issue, with numerous definitions and notions proposed throughout the years since the discovery of the human genome. First, genes were conceived in the 1900s as discrete units of heredity, then it was thought as the blueprint for protein synthesis,
1099:
Also, the GENCODE website contains a Genome
Browser for human and mouse where you can reach any genomic region by giving the chromosome number and start-end position (e.g. 22:30,700,000..30,900,000), as well as by ENS transcript id (with/without version), ENS gene id (with/without version) and gene
813:
Ensembl transcripts are products of the
Ensembl automatic gene annotation system (a collection of gene annotation pipelines), termed the Ensembl gene build. All Ensembl transcripts are based on experimental evidence and thus the automated pipeline relies on the mRNAs and protein sequences deposited
799:
GENCODE pipeline diagram. The schema shows the flow of data between manual annotation and automated annotation through specialized prediction pipelines to provide hints to first-pass annotation and quality control (QC). Annotated gene models are subject to experimental validation, and the AnnoTrack
254:
The project was designed with three phases - Pilot, Technology development and
Production phase. The pilot stage of the ENCODE project aimed to investigate in great depth, computationally and experimentally, 44 regions totaling 30 Mb of sequence representing approximately 1% of the human genome. As
1164:
The RNA-seq Genome
Annotation Assessment Project (RGASP) project is designed to assess the effectiveness of various computational methods for high quality RNA-sequence data analysis. The primary goals of RGASP are to provide an unbiased evaluation for RNA-seq alignment, transcript characterisation
1126:
was an international research effort to determine the sequence of the human genome and identify the genes that it contains. The
Project was coordinated by the National Institutes of Health and the U.S. Department of Energy. Additional contributors included universities across the United States and
843:
The current GENCODE Human gene set version (GENCODE Release 20) includes annotation files (in GTF and GFF3 formats), FASTA files and METADATA files associated with the GENCODE annotation on all genomic regions (reference-chromosomes/patches/scaffolds/haplotypes). The annotation data is referred on
805:
HAVANA, together with automatic annotations from the
Ensembl automatically annotated gene set. This process also adds unique full-length CDS predictions from the Ensembl protein coding set into manually annotated genes, to provide the most complete and up-to-date annotation of the genome possible.
188:
The GENCODE consortium was initially formed as part of the pilot phase of the ENCODE project to identify and map all protein-coding genes within the ENCODE regions (approx. 1% of Human genome). Given the initial success of the project, GENCODE now aims to build an “Encyclopedia of genes and genes
804:
Putative loci can be verified by wet-lab experiments and computational predictions are analysed manually. Currently, to ensure a set of annotation covers the complete genome rather than just the regions that have been manually annotated, a merged data set is created using manual annotations from
319:
Among other achievements, it has been completed the first pass manual annotation of the mouse reference genome, it has started a cooperation with RefSeq and
Uniprot reference annotation databases toward achieving annotation convergence, and the annotation of lncRNAs has been improved via the
725:
A comparison of key statistics from 3 major GENCODE releases until 2014 is shown below. It is evident that although the coverage, in terms of total number of genes discovered, is steady increasing, the number of protein-coding genes has actually decreased. This is mostly attributed to new
1169:
of the main discoveries from rounds 1 & 2 of the project was the importance of read alignment on the quality of gene predictions produced. Hence, a third round of RGASP workshop is currently being conducted (in 2014) to focus primarily on read mapping to the genome.
289:
were published in June 2007. The findings highlighted the success of the pilot project to create a feasible platform and new technologies to characterise functional elements in the human genome, which paves the way for opening research into genome-wide studies.
800:
tracking system contains data from all these sources and is used to highlight differences, coordinate QC, and track outcomes. Manual and automated annotation processes produce the GENCODE data set and also used to QC the completed annotation.
328:
The key participants of the GENCODE project have remained relatively consistent throughout its various phases, with the Wellcome Trust Sanger Institute now leading the overall efforts of the project.
1127:
international partners in the United Kingdom, France, Germany, Japan, and China. The Human Genome Project formally began in 1990 and was completed in 2003, 2 years ahead of its original schedule.
779:
278:
techniques. GENCODE Release 2 contained 411 known loci, 30 novel CDS loci, 81 novel transcript loci, 83 putative loci, 104 processed pseudogenes and 66 unprocessed pseudogenes.
1825:
755:
2162:
2177:
2167:
304:
In September 2012, The GENCODE consortium published a major paper discussing the results from a major release – GENCODE Release 7, which was frozen in December 2011.
275:
767:
297:
New funding was part of NHGRI's endeavour to scale-up the ENCODE Project to a production phase on the entire genome along with additional pilot-scale studies.
1165:(discovery, reconstruction and quantification) software, and to determine the feasibility of automated genome annotations based on transcriptome sequencing.
219:
The most recent release of the Human geneset annotations is Gencode 36, with a freeze date of December 2020. This release utilises the latest GRCh38 human
1183:
835:
For GENCODE 7, transcript models are assigned a high or low level of support based on a new method developed to score the quality of transcripts.
453:
2182:
420:
406:
270:
A second version (release 02) was frozen on 14 October 2005, containing updates following discoveries from experimental validations using
2057:
468:
443:
271:
230:
57:
827:
below). To confirm uncertain models, GENCODE also has an experimental validation pipeline using RNA sequencing and RACE.
727:
2157:
1799:
521:
Since its inception, GENCODE has released 36 versions of the Human gene set annotations (excluding minor updates).
1373:"The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression"
2172:
354:
2187:
226:
The latest release for the mouse geneset annotations is Gencode M25, also with a freeze date December 2020.
1602:"Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project"
389:
1204:
1600:
Birney E, Stamatoyannopoulos JA, Dutta A, GuigĂł R, Gingeras TR, Margulies EH, et al. (June 2007).
1523:. Wellcome Trust Sanger Institute. p. The GENCODE Project: Encyclopædia of genes and gene variants
1203:
Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al. (September 2012).
1613:
1558:
1153:
1123:
398:
197:
2090:
Steijger T, Abril JF, Engström PG, Kokocinski F, Hubbard TJ, Guigó R, et al. (December 2013).
1711:
Frankish A, Diekhans M, Jungreis I, Lagarde J, Loveland JE, Mudge JM, et al. (December 2020).
1752:
1582:
1546:
1662:
Casper J, Zweig AS, Villarreal C, Tyner C, Speir ML, Rosenbloom KR, et al. (January 2018).
1371:
Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, et al. (September 2012).
2121:
2025:
1970:
1819:
1744:
1693:
1639:
1574:
1453:
1402:
1348:
1299:
1237:
75:
2111:
2103:
2065:
2015:
1960:
1950:
1891:
1881:
1734:
1724:
1683:
1675:
1629:
1621:
1566:
1443:
1433:
1392:
1384:
1338:
1330:
1289:
1279:
1227:
1219:
500:
220:
193:
141:
978:
Description of key-value pairs in 9th column of the GENCODE GTF file (format: key "value")
795:
731:
434:
123:
52:
1617:
1562:
2116:
2091:
1965:
1938:
1896:
1869:
1739:
1712:
1688:
1663:
1634:
1601:
1448:
1421:
1397:
1372:
1343:
1318:
1294:
1267:
1232:
1178:
494:
201:
2002:
Gerstein MB, Bruce C, Rozowsky JS, Zheng D, Du J, Korbel JO, et al. (June 2007).
1806:. Wellcome Trust Sanger Institute. c. 2014. Archived from the original on 19 June 2018
1420:
Pei B, Sisu C, Frankish A, Howald C, Habegger L, Mu XJ, et al. (September 2012).
2151:
1756:
286:
1868:
Searle S, Frankish A, Bignell A, Aken B, Derrien T, Diekhans M, et al. (2010).
1586:
1319:"The importance of identifying alternative splicing in vertebrate genome annotation"
1156:
eArray system, and these designs are available in a standard custom Agilent format.
1012:
1266:
Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, et al. (2006).
491:
Roderic Guigo (PI), Centre de Regulació Genòmica (CRG), Barcelona, Catalonia, Spain
1520:
1491:
1839:
1770:
524:
The key summary statistics of the most recent GENCODE Human gene set annotation (
205:
847:
241:
2043:
1886:
1334:
216:
GENCODE is currently progressing towards its goals in Phase 2 of the project.
1955:
1438:
1570:
1284:
509:
Michael Tress, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
488:
Paul Flicek (Lead PI), EMBL European Bioinformatics Institute, Cambridge, UK
382:
Institut Municipal d'Investigació Mèdica (IMIM), Barcelona, Catalonia, Spain
2125:
2029:
1974:
1748:
1729:
1697:
1643:
1578:
1457:
1406:
1352:
1303:
1241:
1152:
GENCODE lncRNA annotation. A number of designs have been created using the
851:
GTF file example where it is shown TAB-separated standard GTF columns (1-9)
331:
A summary of key participating institutions of each phase is listed below:
79:
1679:
1388:
1223:
859:
Format description of GENCODE GTF file. TAB-separated standard GTF columns
506:
Benedict Paten (PI), University of California, Santa Cruz, California, USA
1625:
233:
project and each new GENCODE release corresponds to an Ensembl release.
2107:
2020:
2003:
1205:"GENCODE: the reference human genome annotation for The ENCODE Project"
1140:
735:
229:
Since September 2009, GENCODE has been the human gene set used by the
1911:
905:{gene,transcript,exon,CDS,UTR,start_codon,stop_codon,Selenocysteine}
855:
The columns within the GENCODE GTF file formats are described below.
192:
The result will be a set of annotations including all protein-coding
182:
178:
129:
883:
chr{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,M}
846:
794:
240:
1988:
1474:
1062:
indicates the biological position of the exon in the transcript
263:
loci, 104 processed pseudogenes and 66 unprocessed pseudogenes.
512:
Jyoti Choudhary, Institute of Cancer Research (ICR), London, UK
110:
2142:
497:(PI), Massachusetts Institute of Technology (MIT), Boston, USA
2092:"Assessment of transcript reconstruction methods for RNA-seq"
2004:"What is a gene, post-ENCODE? History and updated definition"
379:
Centre de Regulació Genòmica, Barcelona, Catalonia, Spain
376:
Centre de Regulació Genòmica, Barcelona, Catalonia, Spain
1268:"GENCODE: producing a reference annotation for ENCODE"
459:
Spanish National Cancer Research Centre, Madrid, Spain
814:
into public databases from the scientific community.
742:
Version 7 (December 2010 freeze, GRCh37) - Ensembl 62
369:
Team 71: Informatics (Mainly HAVANA annotation group)
1939:"AnnoTrack--a tracking system for genome annotation"
700:
Genes that have more than one distinct translations
1547:"The ENCODE (ENCyclopedia Of DNA Elements) Project"
1469:
1467:
785:
Comparison of GENCODE Human versions (Translations)
748:
Version 20 (April 2014 freeze, GRCh38) - Ensembl 76
162:
150:
140:
135:
122:
117:
105:
100:
85:
71:
63:
51:
46:
36:
28:
23:
1937:Kokocinski F, Harrow J, Hubbard T (October 2010).
761:Comparison of GENCODE Human versions (Transcripts)
745:Version 10 (July 2011 freeze, GRCh37) - Ensembl 65
426:Massachusetts Institute of Technology, Boston, USA
1498:. Wellcome Trust Sanger Institute. September 2019
1317:Frankish A, Mudge JM, Thomas M, Harrow J (2012).
185:(ENCyclopedia Of DNA Elements) scale-up project.
2085:
2083:
1824:: CS1 maint: bot: original URL status unknown (
1794:
1792:
1664:"The UCSC Genome Browser database: 2018 update"
363:Wellcome Trust Sanger Institute, Cambridge, UK
1863:
1861:
1657:
1655:
1653:
1545:The ENCODE Project Consortium (October 2004).
1366:
1364:
1362:
1100:name. The browser is powered by Biodalliance.
1486:
1484:
1261:
1259:
1257:
1255:
1253:
1251:
680:Immunoglobulin/T-cell receptor gene segments
42:All gene features in Human & mouse genome
8:
1540:
1538:
773:Comparison of GENCODE Human versions (Genes)
366:Team 16: Population and Comparative Genomics
18:
1846:. Wellcome Trust Sanger Institute. c. 2013
17:
2163:Genetic engineering in the United Kingdom
2115:
2019:
1964:
1954:
1895:
1885:
1738:
1728:
1687:
1633:
1447:
1437:
1396:
1342:
1293:
1283:
1231:
968:additional information as key-value pairs
412:University of California, Santa Cruz, USA
2178:Science and technology in Cambridgeshire
530:
429:University of California, Berkeley, USA
415:Washington University in St. Louis, USA
360:Wellcome Sanger Institute, Cambridge, UK
333:
2168:Medical databases in the United Kingdom
1195:
751:
454:Spanish National Cancer Research Centre
32:Encyclopædia of genes and gene variants
1817:
726:experimental evidence obtained using
624:Long non-coding RNA loci transcripts
503:(PI), Yale University, New Haven, USA
421:Massachusetts Institute of Technology
7:
1515:
1513:
610:Nonsense mediated decay transcripts
407:University of California, Santa Cruz
204:loci with transcript evidence, and
1147:lncRNA Expression Microarray Design
728:Cap Analysis Gene Expression (CAGE)
395:University of Lausanne, Switzerland
168:Mouse - Release M26 (February 2021)
166:Human - Release 37 (February 2021)
1775:Genome BioInformatics Research Lab
686:Total No of distinct translations
469:Washington University in St. Louis
198:alternatively transcribed variants
130:http://genome.cse.ucsc.edu/encode/
14:
2044:"Human Genome Project - Homepage"
1422:"The GENCODE pseudogene resource"
596:- partial length protein-coding:
444:European Bioinformatics Institute
1184:Vertebrate and Genome Annotation
1143:is part of the GENCODE project.
971:See explanation in table below.
957:genomic phase (for CDS features)
818:Manual Annotation (HAVANA group)
778:
766:
754:
526:Release 36, December 2020 freeze
440:Yale University, New Haven, USA
245:Timeline of the GENCODE project
58:Wellcome Trust Sanger Institute
1771:"GENCODE Project Participants"
1086:(automatically annotated loci)
809:Automatic annotation (Ensembl)
582:- full length protein-coding:
1:
2183:South Cambridgeshire District
2064:. August 2014. Archived from
1870:"The GENCODE human gene set"
480:Participants, PIs and CO-PIs
1095:Biodalliance Genome Browser
981:
862:
694:- protein coding segments:
656:- polymorphic pseudogenes:
632:- unprocessed pseudogenes:
590:Small non-coding RNA genes
568:Protein-coding transcripts
177:is a scientific project in
2204:
1083:(manually annotated loci),
824:
576:Long non-coding RNA genes
1989:"Biodalliance - Homepage"
1887:10.1186/gb-2010-11-S1-P36
618:- processed pseudogenes:
355:Wellcome Sanger Institute
338:GENCODE Phase 2 (Current)
285:The conclusions from the
181:research and part of the
2058:"ENCODE data in Ensembl"
1956:10.1186/1471-2164-11-538
1439:10.1186/gb-2012-13-9-r51
916:integer-value (1-based)
554:Total No of Transcripts
1571:10.1126/science.1105136
1335:10.1093/database/bas014
1285:10.1186/gb-2006-7-s1-s4
1046:{KNOWN,NOVEL,PUTATIVE}
1022:{KNOWN,NOVEL,PUTATIVE}
644:- unitary pseudogenes:
2143:Official GENCODE pages
1800:"GENCODE – Statistics"
1717:Nucleic Acids Research
1668:Nucleic Acids Research
1109:Definition of a "gene"
913:genomic start location
852:
801:
390:University of Lausanne
343:GENCODE Scale-up Phase
246:
1389:10.1101/gr.132159.111
1224:10.1101/gr.135350.111
850:
798:
562:Protein-coding genes
456:(CNIO), Madrid, Spain
244:
128:UCSC Genome Browser:
1912:"Ensembl - Homepage"
1730:10.1093/nar/gkaa1087
1475:"GENCODE - Homepage"
1154:Agilent Technologies
1124:Human Genome Project
1118:Human Genome Project
924:genomic end location
730:clusters, annotated
399:University of Geneva
2046:. 20 December 2020.
1991:. 20 December 2020.
1680:10.1093/nar/gkx1020
1626:10.1038/nature05874
1618:2007Natur.447..799B
1563:2004Sci...306..636E
1477:. 20 December 2020.
1278:(Suppl 1): S4.1–9.
528:) is shown below:
348:GENCODE Pilot Phase
158:Mouse - Half yearly
20:
2158:Genetics databases
2108:10.1038/nmeth.2714
2068:on 9 November 2017
2021:10.1101/gr.6339607
1043:transcript_status
853:
802:
548:Total No of Genes
247:
156:Human - Quarterly
1840:"GENCODE – Goals"
1723:(D1): D916–D923.
1674:(D1): D762–D769.
1612:(7146): 799–816.
1179:Genome annotation
1092:
1091:
1038:list of biotypes
975:
974:
894:{ENSEMBL,HAVANA}
891:annotation source
831:Assessing quality
719:
718:
477:
476:
409:, California, USA
172:
171:
2195:
2173:Medical genetics
2130:
2129:
2119:
2087:
2078:
2077:
2075:
2073:
2054:
2048:
2047:
2040:
2034:
2033:
2023:
1999:
1993:
1992:
1985:
1979:
1978:
1968:
1958:
1934:
1928:
1927:
1925:
1923:
1908:
1902:
1901:
1899:
1889:
1865:
1856:
1855:
1853:
1851:
1836:
1830:
1829:
1823:
1815:
1813:
1811:
1796:
1787:
1786:
1784:
1782:
1767:
1761:
1760:
1742:
1732:
1708:
1702:
1701:
1691:
1659:
1648:
1647:
1637:
1597:
1591:
1590:
1557:(5696): 636–40.
1542:
1533:
1532:
1530:
1528:
1517:
1508:
1507:
1505:
1503:
1492:"GENCODE – Data"
1488:
1479:
1478:
1471:
1462:
1461:
1451:
1441:
1417:
1411:
1410:
1400:
1368:
1357:
1356:
1346:
1314:
1308:
1307:
1297:
1287:
1263:
1246:
1245:
1235:
1209:
1200:
1080:(verified loci),
1070:ENSEXXXXXXXXXXX
1051:transcript_name
1035:transcript_type
1013:list of biotypes
1004:ENSTXXXXXXXXXXX
996:ENSGXXXXXXXXXXX
982:
935:score (not used)
863:
782:
770:
758:
531:
501:Mark B. Gerstein
437:, New Haven, USA
334:
324:Key Participants
221:reference genome
212:Current progress
113:
96:
94:
72:Primary citation
67:Harrow J, et al
21:
2203:
2202:
2198:
2197:
2196:
2194:
2193:
2192:
2148:
2147:
2139:
2134:
2133:
2102:(12): 1177–84.
2089:
2088:
2081:
2071:
2069:
2056:
2055:
2051:
2042:
2041:
2037:
2008:Genome Research
2001:
2000:
1996:
1987:
1986:
1982:
1936:
1935:
1931:
1921:
1919:
1910:
1909:
1905:
1880:(Suppl 1): 36.
1867:
1866:
1859:
1849:
1847:
1838:
1837:
1833:
1816:
1809:
1807:
1798:
1797:
1790:
1780:
1778:
1769:
1768:
1764:
1710:
1709:
1705:
1661:
1660:
1651:
1599:
1598:
1594:
1544:
1543:
1536:
1526:
1524:
1519:
1518:
1511:
1501:
1499:
1490:
1489:
1482:
1473:
1472:
1465:
1419:
1418:
1414:
1377:Genome Research
1370:
1369:
1360:
1316:
1315:
1311:
1265:
1264:
1249:
1212:Genome Research
1207:
1202:
1201:
1197:
1192:
1175:
1162:
1149:
1138:
1133:
1120:
1111:
1106:
1097:
880:chromosome name
841:
833:
820:
811:
793:
786:
783:
774:
771:
762:
759:
708:- pseudogenes:
668:- pseudogenes:
519:
482:
435:Yale University
357:, Cambridge, UK
326:
318:
310:
303:
296:
284:
269:
261:
239:
214:
167:
157:
152:
111:Website Gencode
109:
92:
90:
53:Research center
38:
12:
11:
5:
2201:
2199:
2191:
2190:
2188:Wellcome Trust
2185:
2180:
2175:
2170:
2165:
2160:
2150:
2149:
2146:
2145:
2138:
2137:External links
2135:
2132:
2131:
2096:Nature Methods
2079:
2049:
2035:
1994:
1980:
1929:
1903:
1874:Genome Biology
1857:
1831:
1788:
1762:
1713:"GENCODE 2021"
1703:
1649:
1592:
1534:
1509:
1480:
1463:
1426:Genome Biology
1412:
1383:(9): 1775–89.
1358:
1309:
1272:Genome Biology
1247:
1218:(9): 1760–74.
1194:
1193:
1191:
1188:
1187:
1186:
1181:
1174:
1171:
1161:
1158:
1148:
1145:
1137:
1134:
1132:
1129:
1119:
1116:
1110:
1107:
1105:
1102:
1096:
1093:
1090:
1089:
1088:
1087:
1084:
1081:
1076:
1072:
1071:
1068:
1064:
1063:
1060:
1056:
1055:
1052:
1048:
1047:
1044:
1040:
1039:
1036:
1032:
1031:
1028:
1024:
1023:
1020:
1016:
1015:
1010:
1006:
1005:
1002:
1001:transcript_id
998:
997:
994:
990:
989:
986:
973:
972:
969:
966:
962:
961:
958:
955:
951:
950:
947:
946:genomic strand
944:
940:
939:
936:
933:
929:
928:
927:integer-value
925:
922:
918:
917:
914:
911:
907:
906:
903:
900:
896:
895:
892:
889:
885:
884:
881:
878:
874:
873:
872:Values/format
870:
867:
866:Column number
840:
837:
832:
829:
819:
816:
810:
807:
792:
789:
788:
787:
784:
777:
775:
772:
765:
763:
760:
753:
750:
749:
746:
743:
717:
716:
714:
712:
709:
705:
704:
701:
698:
695:
691:
690:
687:
684:
681:
677:
676:
674:
672:
669:
665:
664:
662:
660:
657:
653:
652:
650:
648:
645:
641:
640:
638:
636:
633:
629:
628:
625:
622:
619:
615:
614:
611:
608:
605:
601:
600:
597:
594:
591:
587:
586:
583:
580:
577:
573:
572:
569:
566:
563:
559:
558:
555:
552:
549:
545:
544:
541:
538:
535:
518:
517:Key Statistics
515:
514:
513:
510:
507:
504:
498:
495:Manolis Kellis
492:
489:
481:
478:
475:
474:
472:
466:
463:
462:
460:
457:
450:
449:
447:
441:
438:
431:
430:
427:
424:
417:
416:
413:
410:
403:
402:
401:, Switzerland
396:
393:
386:
385:
383:
380:
377:
373:
372:
371:
370:
367:
361:
358:
351:
350:
345:
340:
325:
322:
301:2012 September
250:2003 September
238:
235:
213:
210:
170:
169:
164:
160:
159:
154:
148:
147:
144:
138:
137:
133:
132:
126:
120:
119:
115:
114:
107:
103:
102:
98:
97:
93:September 2012
89:September 2012
87:
83:
82:
73:
69:
68:
65:
61:
60:
55:
49:
48:
44:
43:
40:
34:
33:
30:
26:
25:
13:
10:
9:
6:
4:
3:
2:
2200:
2189:
2186:
2184:
2181:
2179:
2176:
2174:
2171:
2169:
2166:
2164:
2161:
2159:
2156:
2155:
2153:
2144:
2141:
2140:
2136:
2127:
2123:
2118:
2113:
2109:
2105:
2101:
2097:
2093:
2086:
2084:
2080:
2067:
2063:
2059:
2053:
2050:
2045:
2039:
2036:
2031:
2027:
2022:
2017:
2014:(6): 669–81.
2013:
2009:
2005:
1998:
1995:
1990:
1984:
1981:
1976:
1972:
1967:
1962:
1957:
1952:
1948:
1944:
1940:
1933:
1930:
1918:. August 2014
1917:
1913:
1907:
1904:
1898:
1893:
1888:
1883:
1879:
1875:
1871:
1864:
1862:
1858:
1845:
1841:
1835:
1832:
1827:
1821:
1805:
1801:
1795:
1793:
1789:
1776:
1772:
1766:
1763:
1758:
1754:
1750:
1746:
1741:
1736:
1731:
1726:
1722:
1718:
1714:
1707:
1704:
1699:
1695:
1690:
1685:
1681:
1677:
1673:
1669:
1665:
1658:
1656:
1654:
1650:
1645:
1641:
1636:
1631:
1627:
1623:
1619:
1615:
1611:
1607:
1603:
1596:
1593:
1588:
1584:
1580:
1576:
1572:
1568:
1564:
1560:
1556:
1552:
1548:
1541:
1539:
1535:
1522:
1516:
1514:
1510:
1497:
1493:
1487:
1485:
1481:
1476:
1470:
1468:
1464:
1459:
1455:
1450:
1445:
1440:
1435:
1431:
1427:
1423:
1416:
1413:
1408:
1404:
1399:
1394:
1390:
1386:
1382:
1378:
1374:
1367:
1365:
1363:
1359:
1354:
1350:
1345:
1340:
1336:
1332:
1328:
1324:
1320:
1313:
1310:
1305:
1301:
1296:
1291:
1286:
1281:
1277:
1273:
1269:
1262:
1260:
1258:
1256:
1254:
1252:
1248:
1243:
1239:
1234:
1229:
1225:
1221:
1217:
1213:
1206:
1199:
1196:
1189:
1185:
1182:
1180:
1177:
1176:
1172:
1170:
1166:
1159:
1157:
1155:
1146:
1144:
1142:
1135:
1130:
1128:
1125:
1117:
1115:
1108:
1103:
1101:
1094:
1085:
1082:
1079:
1078:
1077:
1074:
1073:
1069:
1066:
1065:
1061:
1058:
1057:
1053:
1050:
1049:
1045:
1042:
1041:
1037:
1034:
1033:
1029:
1026:
1025:
1021:
1018:
1017:
1014:
1011:
1008:
1007:
1003:
1000:
999:
995:
992:
991:
988:Value format
987:
984:
983:
980:
979:
970:
967:
964:
963:
959:
956:
953:
952:
948:
945:
942:
941:
937:
934:
931:
930:
926:
923:
920:
919:
915:
912:
909:
908:
904:
901:
898:
897:
893:
890:
887:
886:
882:
879:
876:
875:
871:
868:
865:
864:
861:
860:
856:
849:
845:
838:
836:
830:
828:
826:
817:
815:
808:
806:
797:
790:
781:
776:
769:
764:
757:
752:
747:
744:
741:
740:
739:
737:
733:
729:
723:
722:of releases.
715:
713:
710:
707:
706:
702:
699:
696:
693:
692:
688:
685:
682:
679:
678:
675:
673:
670:
667:
666:
663:
661:
658:
655:
654:
651:
649:
646:
643:
642:
639:
637:
634:
631:
630:
626:
623:
620:
617:
616:
612:
609:
606:
603:
602:
598:
595:
592:
589:
588:
584:
581:
578:
575:
574:
570:
567:
564:
561:
560:
556:
553:
550:
547:
546:
542:
539:
536:
533:
532:
529:
527:
522:
516:
511:
508:
505:
502:
499:
496:
493:
490:
487:
486:
485:
479:
473:
470:
467:
465:
464:
461:
458:
455:
452:
451:
448:
446:, Hinxton, UK
445:
442:
439:
436:
433:
432:
428:
425:
423:, Boston, USA
422:
419:
418:
414:
411:
408:
405:
404:
400:
397:
394:
392:, Switzerland
391:
388:
387:
384:
381:
378:
375:
374:
368:
365:
364:
362:
359:
356:
353:
352:
349:
346:
344:
341:
339:
336:
335:
332:
329:
323:
321:
317:
313:
309:
305:
302:
298:
295:
291:
288:
287:pilot project
283:
279:
277:
273:
268:
264:
260:
256:
252:
251:
243:
236:
234:
232:
227:
224:
222:
217:
211:
209:
207:
203:
199:
195:
190:
186:
184:
180:
176:
165:
161:
155:
149:
145:
143:
139:
136:Miscellaneous
134:
131:
127:
125:
121:
116:
112:
108:
104:
99:
88:
84:
81:
77:
74:
70:
66:
62:
59:
56:
54:
50:
45:
41:
35:
31:
27:
22:
16:
2099:
2095:
2070:. Retrieved
2066:the original
2061:
2052:
2038:
2011:
2007:
1997:
1983:
1946:
1943:BMC Genomics
1942:
1932:
1920:. Retrieved
1915:
1906:
1877:
1873:
1848:. Retrieved
1843:
1834:
1808:. Retrieved
1803:
1779:. Retrieved
1774:
1765:
1720:
1716:
1706:
1671:
1667:
1609:
1605:
1595:
1554:
1550:
1525:. Retrieved
1500:. Retrieved
1495:
1429:
1425:
1415:
1380:
1376:
1326:
1322:
1312:
1275:
1271:
1215:
1211:
1198:
1167:
1163:
1150:
1139:
1131:Sub Projects
1121:
1112:
1098:
1059:exon_number
1019:gene_status
977:
976:
902:feature-type
858:
857:
854:
842:
839:Usage/Access
834:
821:
812:
803:
724:
720:
604:Pseudogenes
525:
523:
520:
483:
347:
342:
337:
330:
327:
315:
314:
307:
306:
300:
299:
294:2007 October
293:
292:
281:
280:
267:2005 October
266:
265:
258:
257:
253:
249:
248:
228:
225:
218:
215:
191:
187:
174:
173:
151:Data release
86:Release date
15:
2072:7 September
1922:6 September
1850:5 September
1810:20 December
1781:8 September
1527:20 December
791:Methodology
734:sites, and
540:Categories
534:Categories
206:pseudogenes
189:variants”.
146:Open Access
29:Description
2152:Categories
1502:14 October
1432:(9): R51.
1329:: bas014.
1190:References
1104:Challenges
1027:gene_name
1009:gene_type
960:{0,1,2,.}
259:2005 April
223:assembly.
202:non-coding
37:Data types
1777:. c. 2005
1757:227260109
1521:"GENCODE"
985:Key name
282:2007 June
153:frequency
2126:24185837
2030:17567988
1975:20923551
1820:cite web
1749:33270111
1698:29106570
1644:17571346
1587:22837649
1579:15499007
1458:22951037
1407:22955988
1353:22434846
1323:Database
1304:16925838
1242:22955987
1173:See also
1067:exon_id
993:gene_id
869:Content
557:232,117
484:Source:
80:22955987
39:captured
2117:3851240
2062:Ensembl
1966:3091687
1949:: 538.
1916:Ensembl
1897:3026266
1844:GENCODE
1804:GENCODE
1740:7778937
1689:5753355
1635:2212820
1614:Bibcode
1559:Bibcode
1551:Science
1496:GENCODE
1449:3491395
1398:3431493
1344:3308168
1295:1810553
1233:3431492
1141:Ensembl
1136:Ensembl
1054:string
1030:string
736:peptide
703:13,685
689:63,058
627:48,734
621:10,669
613:17,378
607:14,761
599:26,000
585:59,269
579:17,958
571:85,269
565:19,962
551:60,660
237:History
231:Ensembl
175:GENCODE
163:Version
142:License
106:Website
91: (
64:Authors
47:Contact
24:Content
19:GENCODE
2124:
2114:
2028:
1973:
1963:
1894:
1755:
1747:
1737:
1696:
1686:
1642:
1632:
1606:Nature
1585:
1577:
1456:
1446:
1405:
1395:
1351:
1341:
1302:
1292:
1240:
1230:
1075:level
949:{+,-}
738:hits.
635:3,554
593:7,569
543:Total
537:Total
276:RT-PCR
183:ENCODE
179:genome
101:Access
78:
1753:S2CID
1583:S2CID
1208:(PDF)
1160:RGASP
825:RGASP
732:PolyA
471:, USA
196:with
118:Tools
2122:PMID
2074:2014
2026:PMID
1971:PMID
1924:2014
1852:2014
1826:link
1812:2020
1783:2014
1745:PMID
1694:PMID
1640:PMID
1575:PMID
1529:2020
1504:2019
1454:PMID
1403:PMID
1349:PMID
1327:2012
1300:PMID
1238:PMID
1122:The
711:236
697:409
683:645
647:236
316:2020
308:2018
274:and
272:RACE
194:loci
76:PMID
2112:PMC
2104:doi
2016:doi
1961:PMC
1951:doi
1892:PMC
1882:doi
1735:PMC
1725:doi
1684:PMC
1676:doi
1630:PMC
1622:doi
1610:447
1567:doi
1555:306
1444:PMC
1434:doi
1393:PMC
1385:doi
1339:PMC
1331:doi
1290:PMC
1280:doi
1228:PMC
1220:doi
671:18
659:48
200:,
124:Web
2154::
2120:.
2110:.
2100:10
2098:.
2094:.
2082:^
2060:.
2024:.
2012:17
2010:.
2006:.
1969:.
1959:.
1947:11
1945:.
1941:.
1914:.
1890:.
1878:11
1876:.
1872:.
1860:^
1842:.
1822:}}
1818:{{
1802:.
1791:^
1773:.
1751:.
1743:.
1733:.
1721:49
1719:.
1715:.
1692:.
1682:.
1672:46
1670:.
1666:.
1652:^
1638:.
1628:.
1620:.
1608:.
1604:.
1581:.
1573:.
1565:.
1553:.
1549:.
1537:^
1512:^
1494:.
1483:^
1466:^
1452:.
1442:.
1430:13
1428:.
1424:.
1401:.
1391:.
1381:22
1379:.
1375:.
1361:^
1347:.
1337:.
1325:.
1321:.
1298:.
1288:.
1274:.
1270:.
1250:^
1236:.
1226:.
1216:22
1214:.
1210:.
965:9
954:8
943:7
938:.
932:6
921:5
910:4
899:3
888:2
877:1
208:.
2128:.
2106::
2076:.
2032:.
2018::
1977:.
1953::
1926:.
1900:.
1884::
1854:.
1828:)
1814:.
1785:.
1759:.
1727::
1700:.
1678::
1646:.
1624::
1616::
1589:.
1569::
1561::
1531:.
1506:.
1460:.
1436::
1409:.
1387::
1355:.
1333::
1306:.
1282::
1276:7
1244:.
1222::
95:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.