GENCODE - Knowledge (XXG)

1114:

and in more recent times, it was being defined as genetic code that is transcribed into RNA. Although the definition of a gene has evolved greatly over the last century, it has remained a challenging and controversial subject for many researchers. With the advent of the ENCODE/GENCODE project, even more problematic aspects of the definition have been uncovered, including alternative splicing (where a series of exons are separated by introns), intergenic transcriptions, and the complex patterns of dispersed regulation, together with non-genic conservation and the abundance of noncoding RNA genes. As GENCODE endeavours to build an encyclopaedia of genes and gene variants, these problems presented a mounting challenge for the GENCODE project to come up with an updated notion of a gene.

780: 756: 768: 823:

sites and incorrect biotypes. These are fed back to the manual annotators using the AnnoTrack tracking system. Some of these pipelines use data from other ENCODE subgroups including RNASeq data, histone modification and CAGE and Ditag data. RNAseq data is an important new source of evidence, but generating complete gene models from it is a difficult problem. As part of GENCODE, a competition was run to assess the quality of predictions produced by various RNAseq prediction pipelines (Refer to

312:

search for appropriate guide sentences by listing potential binding sites for the CRISPR/Cas9 complex that are next to transcribed regions, or within 200 bp of one. For each site, the track provides possible guide sequences along with a collection of predicted efficiency and specificity scores for those guide sequences. It also provides information about potential off-targets, grouped by the number of missmatches between the off-target and the guide.

796: 848: 242: 844:

reference chromosomes and stored in separated files which include: Gene annotation, PolyA features annotated by HAVANA, (Retrotransposed) pseudogenes predicted by the Yale & UCSC pipelines, but not by HAVANA, long non-coding RNAs, and tRNA structures predicted by tRNA-Scan. Some examples of the lines in the GTF format are shown below:

255:

part of this stage, the GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. It was envisaged that the results of the first two phases will be used to determine the best path forward for analysing the remaining 99% of the human genome in a cost-effective and comprehensive production phase.

320:

discovery of novel loci and novel transcripts at existing loci. Also, given the COVID-19 pandemic during 2020, there has been an urge to support research responding to the situation, so GENCODE has reviewed and improved the annotation for a set of protein-coding genes associated with SARSCoV-2 infection.

721:

Through advancements in sequencing technologies (such as RT-PCR-seq), increased coverage from manual annotations (HAVANA group), and improvements to automatic annotation algorithms using Ensembl, the accuracy and completeness of GENCODE annotations have been continuously refined through its iteration

822:

There are several analysis groups in the GENCODE consortium that run pipelines that aid the manual annotators in producing models in unannotated regions, and to identify potential missed or incorrect manual annotation, including completely missing loci, missing alternative isoforms, incorrect splice

311:

In 2018, one of the latest additions to the GENCODE project was the CRISPR/Cas9 track on human and model organism assemblies. CRISPR is a genome editing technique that uses sequences of RNA that successfully bind to the region edited with high specificity. The new track was designed to assist in the

262:

The first release of the annotation of the 44 ENCODE regions was frozen on 29 April 2005 and was used in the first ENCODE Genome Annotation Assessment Project (E-GASP) workshop. GENCODE Release 1 contained 416 known loci, 26 novel (coding DNA sequence) CDS loci, 82 novel transcript loci, 78 putative

1168:

RGASP is organised in a consortium framework modelled after the EGASP (ENCODE Genome Annotation Assessment Project) gene prediction workshop, and two rounds of workshops have been conducted to address different aspects of RNA-seq analysis as well as changing sequencing technologies and formats. One

1151:

A key research area of the GENCODE project was to investigate the biological significance of long non-coding RNAs (lncRNA). To better understand the lncRNA expression in Humans, a sub project was created by GENCODE to develop custom microarray platforms capable of quantifying the transcripts in the

1113:

The definition of a "gene" has never been a trivial issue, with numerous definitions and notions proposed throughout the years since the discovery of the human genome. First, genes were conceived in the 1900s as discrete units of heredity, then it was thought as the blueprint for protein synthesis,

1099:

Also, the GENCODE website contains a Genome Browser for human and mouse where you can reach any genomic region by giving the chromosome number and start-end position (e.g. 22:30,700,000..30,900,000), as well as by ENS transcript id (with/without version), ENS gene id (with/without version) and gene

813:

Ensembl transcripts are products of the Ensembl automatic gene annotation system (a collection of gene annotation pipelines), termed the Ensembl gene build. All Ensembl transcripts are based on experimental evidence and thus the automated pipeline relies on the mRNAs and protein sequences deposited

799:

GENCODE pipeline diagram. The schema shows the flow of data between manual annotation and automated annotation through specialized prediction pipelines to provide hints to first-pass annotation and quality control (QC). Annotated gene models are subject to experimental validation, and the AnnoTrack

254:

The project was designed with three phases - Pilot, Technology development and Production phase. The pilot stage of the ENCODE project aimed to investigate in great depth, computationally and experimentally, 44 regions totaling 30 Mb of sequence representing approximately 1% of the human genome. As

1164:

The RNA-seq Genome Annotation Assessment Project (RGASP) project is designed to assess the effectiveness of various computational methods for high quality RNA-sequence data analysis. The primary goals of RGASP are to provide an unbiased evaluation for RNA-seq alignment, transcript characterisation

1126:

was an international research effort to determine the sequence of the human genome and identify the genes that it contains. The Project was coordinated by the National Institutes of Health and the U.S. Department of Energy. Additional contributors included universities across the United States and

843:

The current GENCODE Human gene set version (GENCODE Release 20) includes annotation files (in GTF and GFF3 formats), FASTA files and METADATA files associated with the GENCODE annotation on all genomic regions (reference-chromosomes/patches/scaffolds/haplotypes). The annotation data is referred on

805:

HAVANA, together with automatic annotations from the Ensembl automatically annotated gene set. This process also adds unique full-length CDS predictions from the Ensembl protein coding set into manually annotated genes, to provide the most complete and up-to-date annotation of the genome possible.

188:

The GENCODE consortium was initially formed as part of the pilot phase of the ENCODE project to identify and map all protein-coding genes within the ENCODE regions (approx. 1% of Human genome). Given the initial success of the project, GENCODE now aims to build an “Encyclopedia of genes and genes

804:

Putative loci can be verified by wet-lab experiments and computational predictions are analysed manually. Currently, to ensure a set of annotation covers the complete genome rather than just the regions that have been manually annotated, a merged data set is created using manual annotations from

319:

Among other achievements, it has been completed the first pass manual annotation of the mouse reference genome, it has started a cooperation with RefSeq and Uniprot reference annotation databases toward achieving annotation convergence, and the annotation of lncRNAs has been improved via the

725:

A comparison of key statistics from 3 major GENCODE releases until 2014 is shown below. It is evident that although the coverage, in terms of total number of genes discovered, is steady increasing, the number of protein-coding genes has actually decreased. This is mostly attributed to new

1169:

of the main discoveries from rounds 1 & 2 of the project was the importance of read alignment on the quality of gene predictions produced. Hence, a third round of RGASP workshop is currently being conducted (in 2014) to focus primarily on read mapping to the genome.

289:

were published in June 2007. The findings highlighted the success of the pilot project to create a feasible platform and new technologies to characterise functional elements in the human genome, which paves the way for opening research into genome-wide studies.

800:

tracking system contains data from all these sources and is used to highlight differences, coordinate QC, and track outcomes. Manual and automated annotation processes produce the GENCODE data set and also used to QC the completed annotation.

328:

The key participants of the GENCODE project have remained relatively consistent throughout its various phases, with the Wellcome Trust Sanger Institute now leading the overall efforts of the project.

1127:

international partners in the United Kingdom, France, Germany, Japan, and China. The Human Genome Project formally began in 1990 and was completed in 2003, 2 years ahead of its original schedule.

779: 278:

techniques. GENCODE Release 2 contained 411 known loci, 30 novel CDS loci, 81 novel transcript loci, 83 putative loci, 104 processed pseudogenes and 66 unprocessed pseudogenes.

1825: 755: 2162: 2177: 2167: 304:

In September 2012, The GENCODE consortium published a major paper discussing the results from a major release – GENCODE Release 7, which was frozen in December 2011.

275: 767: 297:

New funding was part of NHGRI's endeavour to scale-up the ENCODE Project to a production phase on the entire genome along with additional pilot-scale studies.

1165:(discovery, reconstruction and quantification) software, and to determine the feasibility of automated genome annotations based on transcriptome sequencing. 219:

The most recent release of the Human geneset annotations is Gencode 36, with a freeze date of December 2020. This release utilises the latest GRCh38 human

1183: 835:

For GENCODE 7, transcript models are assigned a high or low level of support based on a new method developed to score the quality of transcripts.

453: 2182: 420: 406: 270:

A second version (release 02) was frozen on 14 October 2005, containing updates following discoveries from experimental validations using

2057: 468: 443: 271: 230: 57: 827:

below). To confirm uncertain models, GENCODE also has an experimental validation pipeline using RNA sequencing and RACE.

727: 2157: 1799: 521:

Since its inception, GENCODE has released 36 versions of the Human gene set annotations (excluding minor updates).

1373:"The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression" 2172: 354: 2187: 226:

The latest release for the mouse geneset annotations is Gencode M25, also with a freeze date December 2020.

1602:"Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project" 389: 1204: 1600:

Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, et al. (June 2007).

1523:. Wellcome Trust Sanger Institute. p. The GENCODE Project: Encyclopædia of genes and gene variants 1203:

Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al. (September 2012).

1613: 1558: 1153: 1123: 398: 197: 2090:

Steijger T, Abril JF, Engström PG, Kokocinski F, Hubbard TJ, Guigó R, et al. (December 2013).

1711:

Frankish A, Diekhans M, Jungreis I, Lagarde J, Loveland JE, Mudge JM, et al. (December 2020).

1752: 1582: 1546: 1662:

Casper J, Zweig AS, Villarreal C, Tyner C, Speir ML, Rosenbloom KR, et al. (January 2018).

1371:

Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, et al. (September 2012).

2121: 2025: 1970: 1819: 1744: 1693: 1639: 1574: 1453: 1402: 1348: 1299: 1237: 75: 2111: 2103: 2065: 2015: 1960: 1950: 1891: 1881: 1734: 1724: 1683: 1675: 1629: 1621: 1566: 1443: 1433: 1392: 1384: 1338: 1330: 1289: 1279: 1227: 1219: 500: 220: 193: 141: 978:

Description of key-value pairs in 9th column of the GENCODE GTF file (format: key "value")

795: 731: 434: 123: 52: 1617: 1562: 2116: 2091: 1965: 1938: 1896: 1869: 1739: 1712: 1688: 1663: 1634: 1601: 1448: 1421: 1397: 1372: 1343: 1318: 1294: 1267: 1232: 1178: 494: 201: 2002:

Gerstein MB, Bruce C, Rozowsky JS, Zheng D, Du J, Korbel JO, et al. (June 2007).

1806:. Wellcome Trust Sanger Institute. c. 2014. Archived from the original on 19 June 2018 1420:

Pei B, Sisu C, Frankish A, Howald C, Habegger L, Mu XJ, et al. (September 2012).

2151: 1756: 286: 1868:

Searle S, Frankish A, Bignell A, Aken B, Derrien T, Diekhans M, et al. (2010).

1586: 1319:"The importance of identifying alternative splicing in vertebrate genome annotation" 1156:

eArray system, and these designs are available in a standard custom Agilent format.

1012: 1266:

Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, et al. (2006).

491:

Roderic Guigo (PI), Centre de Regulació Genòmica (CRG), Barcelona, Catalonia, Spain

1520: 1491: 1839: 1770: 524:

The key summary statistics of the most recent GENCODE Human gene set annotation (

205: 847: 241: 2043: 1886: 1334: 216:

GENCODE is currently progressing towards its goals in Phase 2 of the project.

1955: 1438: 1570: 1284: 509:

Michael Tress, Spanish National Cancer Research Centre (CNIO), Madrid, Spain

488:

Paul Flicek (Lead PI), EMBL European Bioinformatics Institute, Cambridge, UK

382:

Institut Municipal d'Investigació Mèdica (IMIM), Barcelona, Catalonia, Spain

2125: 2029: 1974: 1748: 1729: 1697: 1643: 1578: 1457: 1406: 1352: 1303: 1241: 1152:

GENCODE lncRNA annotation. A number of designs have been created using the

851:

GTF file example where it is shown TAB-separated standard GTF columns (1-9)

331:

A summary of key participating institutions of each phase is listed below:

79: 1679: 1388: 1223: 859:

Format description of GENCODE GTF file. TAB-separated standard GTF columns

506:

Benedict Paten (PI), University of California, Santa Cruz, California, USA

1625: 233:

project and each new GENCODE release corresponds to an Ensembl release.

2107: 2020: 2003: 1205:"GENCODE: the reference human genome annotation for The ENCODE Project" 1140: 735: 229:

Since September 2009, GENCODE has been the human gene set used by the

1911: 905:{gene,transcript,exon,CDS,UTR,start_codon,stop_codon,Selenocysteine} 855:

The columns within the GENCODE GTF file formats are described below.

192:

The result will be a set of annotations including all protein-coding

182: 178: 129: 883:

chr{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,M}

846: 794: 240: 1988: 1474: 1062:

indicates the biological position of the exon in the transcript

263:

loci, 104 processed pseudogenes and 66 unprocessed pseudogenes.

512:

Jyoti Choudhary, Institute of Cancer Research (ICR), London, UK

110: 2142: 497:(PI), Massachusetts Institute of Technology (MIT), Boston, USA 2092:"Assessment of transcript reconstruction methods for RNA-seq" 2004:"What is a gene, post-ENCODE? History and updated definition" 379:

Centre de Regulació Genòmica, Barcelona, Catalonia, Spain

376:

Centre de Regulació Genòmica, Barcelona, Catalonia, Spain

1268:"GENCODE: producing a reference annotation for ENCODE" 459:

Spanish National Cancer Research Centre, Madrid, Spain

814:

into public databases from the scientific community.

742:

Version 7 (December 2010 freeze, GRCh37) - Ensembl 62

369:

Team 71: Informatics (Mainly HAVANA annotation group)

1939:"AnnoTrack--a tracking system for genome annotation" 700:

Genes that have more than one distinct translations

1547:"The ENCODE (ENCyclopedia Of DNA Elements) Project" 1469: 1467: 785:

Comparison of GENCODE Human versions (Translations)

748:

Version 20 (April 2014 freeze, GRCh38) - Ensembl 76

162: 150: 140: 135: 122: 117: 105: 100: 85: 71: 63: 51: 46: 36: 28: 23: 1937:Kokocinski F, Harrow J, Hubbard T (October 2010). 761:Comparison of GENCODE Human versions (Transcripts) 745:Version 10 (July 2011 freeze, GRCh37) - Ensembl 65 426:Massachusetts Institute of Technology, Boston, USA 1498:. Wellcome Trust Sanger Institute. September 2019 1317:Frankish A, Mudge JM, Thomas M, Harrow J (2012). 185:(ENCyclopedia Of DNA Elements) scale-up project. 2085: 2083: 1824:: CS1 maint: bot: original URL status unknown ( 1794: 1792: 1664:"The UCSC Genome Browser database: 2018 update" 363:Wellcome Trust Sanger Institute, Cambridge, UK 1863: 1861: 1657: 1655: 1653: 1545:The ENCODE Project Consortium (October 2004). 1366: 1364: 1362: 1100:name. The browser is powered by Biodalliance. 1486: 1484: 1261: 1259: 1257: 1255: 1253: 1251: 680:Immunoglobulin/T-cell receptor gene segments 42:All gene features in Human & mouse genome 8: 1540: 1538: 773:Comparison of GENCODE Human versions (Genes) 366:Team 16: Population and Comparative Genomics 18: 1846:. Wellcome Trust Sanger Institute. c. 2013 17: 2163:Genetic engineering in the United Kingdom 2115: 2019: 1964: 1954: 1895: 1885: 1738: 1728: 1687: 1633: 1447: 1437: 1396: 1342: 1293: 1283: 1231: 968:additional information as key-value pairs 412:University of California, Santa Cruz, USA 2178:Science and technology in Cambridgeshire 530: 429:University of California, Berkeley, USA 415:Washington University in St. Louis, USA 360:Wellcome Sanger Institute, Cambridge, UK 333: 2168:Medical databases in the United Kingdom 1195: 751: 454:Spanish National Cancer Research Centre 32:Encyclopædia of genes and gene variants 1817: 726:experimental evidence obtained using 624:Long non-coding RNA loci transcripts 503:(PI), Yale University, New Haven, USA 421:Massachusetts Institute of Technology 7: 1515: 1513: 610:Nonsense mediated decay transcripts 407:University of California, Santa Cruz 204:loci with transcript evidence, and 1147:lncRNA Expression Microarray Design 728:Cap Analysis Gene Expression (CAGE) 395:University of Lausanne, Switzerland 168:Mouse - Release M26 (February 2021) 166:Human - Release 37 (February 2021) 1775:Genome BioInformatics Research Lab 686:Total No of distinct translations 469:Washington University in St. Louis 198:alternatively transcribed variants 130:http://genome.cse.ucsc.edu/encode/ 14: 2044:"Human Genome Project - Homepage" 1422:"The GENCODE pseudogene resource" 596:- partial length protein-coding: 444:European Bioinformatics Institute 1184:Vertebrate and Genome Annotation 1143:is part of the GENCODE project. 971:See explanation in table below. 957:genomic phase (for CDS features) 818:Manual Annotation (HAVANA group) 778: 766: 754: 526:Release 36, December 2020 freeze 440:Yale University, New Haven, USA 245:Timeline of the GENCODE project 58:Wellcome Trust Sanger Institute 1771:"GENCODE Project Participants" 1086:(automatically annotated loci) 809:Automatic annotation (Ensembl) 582:- full length protein-coding: 1: 2183:South Cambridgeshire District 2064:. August 2014. Archived from 1870:"The GENCODE human gene set" 480:Participants, PIs and CO-PIs 1095:Biodalliance Genome Browser 981: 862: 694:- protein coding segments: 656:- polymorphic pseudogenes: 632:- unprocessed pseudogenes: 590:Small non-coding RNA genes 568:Protein-coding transcripts 177:is a scientific project in 2204: 1083:(manually annotated loci), 824: 576:Long non-coding RNA genes 1989:"Biodalliance - Homepage" 1887:10.1186/gb-2010-11-S1-P36 618:- processed pseudogenes: 355:Wellcome Sanger Institute 338:GENCODE Phase 2 (Current) 285:The conclusions from the 181:research and part of the 2058:"ENCODE data in Ensembl" 1956:10.1186/1471-2164-11-538 1439:10.1186/gb-2012-13-9-r51 916:integer-value (1-based) 554:Total No of Transcripts 1571:10.1126/science.1105136 1335:10.1093/database/bas014 1285:10.1186/gb-2006-7-s1-s4 1046:{KNOWN,NOVEL,PUTATIVE} 1022:{KNOWN,NOVEL,PUTATIVE} 644:- unitary pseudogenes: 2143:Official GENCODE pages 1800:"GENCODE – Statistics" 1717:Nucleic Acids Research 1668:Nucleic Acids Research 1109:Definition of a "gene" 913:genomic start location 852: 801: 390:University of Lausanne 343:GENCODE Scale-up Phase 246: 1389:10.1101/gr.132159.111 1224:10.1101/gr.135350.111 850: 798: 562:Protein-coding genes 456:(CNIO), Madrid, Spain 244: 128:UCSC Genome Browser: 1912:"Ensembl - Homepage" 1730:10.1093/nar/gkaa1087 1475:"GENCODE - Homepage" 1154:Agilent Technologies 1124:Human Genome Project 1118:Human Genome Project 924:genomic end location 730:clusters, annotated 399:University of Geneva 2046:. 20 December 2020. 1991:. 20 December 2020. 1680:10.1093/nar/gkx1020 1626:10.1038/nature05874 1618:2007Natur.447..799B 1563:2004Sci...306..636E 1477:. 20 December 2020. 1278:(Suppl 1): S4.1–9. 528:) is shown below: 348:GENCODE Pilot Phase 158:Mouse - Half yearly 20: 2158:Genetics databases 2108:10.1038/nmeth.2714 2068:on 9 November 2017 2021:10.1101/gr.6339607 1043:transcript_status 853: 802: 548:Total No of Genes 247: 156:Human - Quarterly 1840:"GENCODE – Goals" 1723:(D1): D916–D923. 1674:(D1): D762–D769. 1612:(7146): 799–816. 1179:Genome annotation 1092: 1091: 1038:list of biotypes 975: 974: 894:{ENSEMBL,HAVANA} 891:annotation source 831:Assessing quality 719: 718: 477: 476: 409:, California, USA 172: 171: 2195: 2173:Medical genetics 2130: 2129: 2119: 2087: 2078: 2077: 2075: 2073: 2054: 2048: 2047: 2040: 2034: 2033: 2023: 1999: 1993: 1992: 1985: 1979: 1978: 1968: 1958: 1934: 1928: 1927: 1925: 1923: 1908: 1902: 1901: 1899: 1889: 1865: 1856: 1855: 1853: 1851: 1836: 1830: 1829: 1823: 1815: 1813: 1811: 1796: 1787: 1786: 1784: 1782: 1767: 1761: 1760: 1742: 1732: 1708: 1702: 1701: 1691: 1659: 1648: 1647: 1637: 1597: 1591: 1590: 1557:(5696): 636–40. 1542: 1533: 1532: 1530: 1528: 1517: 1508: 1507: 1505: 1503: 1492:"GENCODE – Data" 1488: 1479: 1478: 1471: 1462: 1461: 1451: 1441: 1417: 1411: 1410: 1400: 1368: 1357: 1356: 1346: 1314: 1308: 1307: 1297: 1287: 1263: 1246: 1245: 1235: 1209: 1200: 1080:(verified loci), 1070:ENSEXXXXXXXXXXX 1051:transcript_name 1035:transcript_type 1013:list of biotypes 1004:ENSTXXXXXXXXXXX 996:ENSGXXXXXXXXXXX 982: 935:score (not used) 863: 782: 770: 758: 531: 501:Mark B. Gerstein 437:, New Haven, USA 334: 324:Key Participants 221:reference genome 212:Current progress 113: 96: 94: 72:Primary citation 67:Harrow J, et al 21: 2203: 2202: 2198: 2197: 2196: 2194: 2193: 2192: 2148: 2147: 2139: 2134: 2133: 2102:(12): 1177–84. 2089: 2088: 2081: 2071: 2069: 2056: 2055: 2051: 2042: 2041: 2037: 2008:Genome Research 2001: 2000: 1996: 1987: 1986: 1982: 1936: 1935: 1931: 1921: 1919: 1910: 1909: 1905: 1880:(Suppl 1): 36. 1867: 1866: 1859: 1849: 1847: 1838: 1837: 1833: 1816: 1809: 1807: 1798: 1797: 1790: 1780: 1778: 1769: 1768: 1764: 1710: 1709: 1705: 1661: 1660: 1651: 1599: 1598: 1594: 1544: 1543: 1536: 1526: 1524: 1519: 1518: 1511: 1501: 1499: 1490: 1489: 1482: 1473: 1472: 1465: 1419: 1418: 1414: 1377:Genome Research 1370: 1369: 1360: 1316: 1315: 1311: 1265: 1264: 1249: 1212:Genome Research 1207: 1202: 1201: 1197: 1192: 1175: 1162: 1149: 1138: 1133: 1120: 1111: 1106: 1097: 880:chromosome name 841: 833: 820: 811: 793: 786: 783: 774: 771: 762: 759: 708:- pseudogenes: 668:- pseudogenes: 519: 482: 435:Yale University 357:, Cambridge, UK 326: 318: 310: 303: 296: 284: 269: 261: 239: 214: 167: 157: 152: 111:Website Gencode 109: 92: 90: 53:Research center 38: 12: 11: 5: 2201: 2199: 2191: 2190: 2188:Wellcome Trust 2185: 2180: 2175: 2170: 2165: 2160: 2150: 2149: 2146: 2145: 2138: 2137:External links 2135: 2132: 2131: 2096:Nature Methods 2079: 2049: 2035: 1994: 1980: 1929: 1903: 1874:Genome Biology 1857: 1831: 1788: 1762: 1713:"GENCODE 2021" 1703: 1649: 1592: 1534: 1509: 1480: 1463: 1426:Genome Biology 1412: 1383:(9): 1775–89. 1358: 1309: 1272:Genome Biology 1247: 1218:(9): 1760–74. 1194: 1193: 1191: 1188: 1187: 1186: 1181: 1174: 1171: 1161: 1158: 1148: 1145: 1137: 1134: 1132: 1129: 1119: 1116: 1110: 1107: 1105: 1102: 1096: 1093: 1090: 1089: 1088: 1087: 1084: 1081: 1076: 1072: 1071: 1068: 1064: 1063: 1060: 1056: 1055: 1052: 1048: 1047: 1044: 1040: 1039: 1036: 1032: 1031: 1028: 1024: 1023: 1020: 1016: 1015: 1010: 1006: 1005: 1002: 1001:transcript_id 998: 997: 994: 990: 989: 986: 973: 972: 969: 966: 962: 961: 958: 955: 951: 950: 947: 946:genomic strand 944: 940: 939: 936: 933: 929: 928: 927:integer-value 925: 922: 918: 917: 914: 911: 907: 906: 903: 900: 896: 895: 892: 889: 885: 884: 881: 878: 874: 873: 872:Values/format 870: 867: 866:Column number 840: 837: 832: 829: 819: 816: 810: 807: 792: 789: 788: 787: 784: 777: 775: 772: 765: 763: 760: 753: 750: 749: 746: 743: 717: 716: 714: 712: 709: 705: 704: 701: 698: 695: 691: 690: 687: 684: 681: 677: 676: 674: 672: 669: 665: 664: 662: 660: 657: 653: 652: 650: 648: 645: 641: 640: 638: 636: 633: 629: 628: 625: 622: 619: 615: 614: 611: 608: 605: 601: 600: 597: 594: 591: 587: 586: 583: 580: 577: 573: 572: 569: 566: 563: 559: 558: 555: 552: 549: 545: 544: 541: 538: 535: 518: 517:Key Statistics 515: 514: 513: 510: 507: 504: 498: 495:Manolis Kellis 492: 489: 481: 478: 475: 474: 472: 466: 463: 462: 460: 457: 450: 449: 447: 441: 438: 431: 430: 427: 424: 417: 416: 413: 410: 403: 402: 401:, Switzerland 396: 393: 386: 385: 383: 380: 377: 373: 372: 371: 370: 367: 361: 358: 351: 350: 345: 340: 325: 322: 301:2012 September 250:2003 September 238: 235: 213: 210: 170: 169: 164: 160: 159: 154: 148: 147: 144: 138: 137: 133: 132: 126: 120: 119: 115: 114: 107: 103: 102: 98: 97: 93:September 2012 89:September 2012 87: 83: 82: 73: 69: 68: 65: 61: 60: 55: 49: 48: 44: 43: 40: 34: 33: 30: 26: 25: 13: 10: 9: 6: 4: 3: 2: 2200: 2189: 2186: 2184: 2181: 2179: 2176: 2174: 2171: 2169: 2166: 2164: 2161: 2159: 2156: 2155: 2153: 2144: 2141: 2140: 2136: 2127: 2123: 2118: 2113: 2109: 2105: 2101: 2097: 2093: 2086: 2084: 2080: 2067: 2063: 2059: 2053: 2050: 2045: 2039: 2036: 2031: 2027: 2022: 2017: 2014:(6): 669–81. 2013: 2009: 2005: 1998: 1995: 1990: 1984: 1981: 1976: 1972: 1967: 1962: 1957: 1952: 1948: 1944: 1940: 1933: 1930: 1918:. August 2014 1917: 1913: 1907: 1904: 1898: 1893: 1888: 1883: 1879: 1875: 1871: 1864: 1862: 1858: 1845: 1841: 1835: 1832: 1827: 1821: 1805: 1801: 1795: 1793: 1789: 1776: 1772: 1766: 1763: 1758: 1754: 1750: 1746: 1741: 1736: 1731: 1726: 1722: 1718: 1714: 1707: 1704: 1699: 1695: 1690: 1685: 1681: 1677: 1673: 1669: 1665: 1658: 1656: 1654: 1650: 1645: 1641: 1636: 1631: 1627: 1623: 1619: 1615: 1611: 1607: 1603: 1596: 1593: 1588: 1584: 1580: 1576: 1572: 1568: 1564: 1560: 1556: 1552: 1548: 1541: 1539: 1535: 1522: 1516: 1514: 1510: 1497: 1493: 1487: 1485: 1481: 1476: 1470: 1468: 1464: 1459: 1455: 1450: 1445: 1440: 1435: 1431: 1427: 1423: 1416: 1413: 1408: 1404: 1399: 1394: 1390: 1386: 1382: 1378: 1374: 1367: 1365: 1363: 1359: 1354: 1350: 1345: 1340: 1336: 1332: 1328: 1324: 1320: 1313: 1310: 1305: 1301: 1296: 1291: 1286: 1281: 1277: 1273: 1269: 1262: 1260: 1258: 1256: 1254: 1252: 1248: 1243: 1239: 1234: 1229: 1225: 1221: 1217: 1213: 1206: 1199: 1196: 1189: 1185: 1182: 1180: 1177: 1176: 1172: 1170: 1166: 1159: 1157: 1155: 1146: 1144: 1142: 1135: 1130: 1128: 1125: 1117: 1115: 1108: 1103: 1101: 1094: 1085: 1082: 1079: 1078: 1077: 1074: 1073: 1069: 1066: 1065: 1061: 1058: 1057: 1053: 1050: 1049: 1045: 1042: 1041: 1037: 1034: 1033: 1029: 1026: 1025: 1021: 1018: 1017: 1014: 1011: 1008: 1007: 1003: 1000: 999: 995: 992: 991: 988:Value format 987: 984: 983: 980: 979: 970: 967: 964: 963: 959: 956: 953: 952: 948: 945: 942: 941: 937: 934: 931: 930: 926: 923: 920: 919: 915: 912: 909: 908: 904: 901: 898: 897: 893: 890: 887: 886: 882: 879: 876: 875: 871: 868: 865: 864: 861: 860: 856: 849: 845: 838: 836: 830: 828: 826: 817: 815: 808: 806: 797: 790: 781: 776: 769: 764: 757: 752: 747: 744: 741: 740: 739: 737: 733: 729: 723: 722:of releases. 715: 713: 710: 707: 706: 702: 699: 696: 693: 692: 688: 685: 682: 679: 678: 675: 673: 670: 667: 666: 663: 661: 658: 655: 654: 651: 649: 646: 643: 642: 639: 637: 634: 631: 630: 626: 623: 620: 617: 616: 612: 609: 606: 603: 602: 598: 595: 592: 589: 588: 584: 581: 578: 575: 574: 570: 567: 564: 561: 560: 556: 553: 550: 547: 546: 542: 539: 536: 533: 532: 529: 527: 522: 516: 511: 508: 505: 502: 499: 496: 493: 490: 487: 486: 485: 479: 473: 470: 467: 465: 464: 461: 458: 455: 452: 451: 448: 446:, Hinxton, UK 445: 442: 439: 436: 433: 432: 428: 425: 423:, Boston, USA 422: 419: 418: 414: 411: 408: 405: 404: 400: 397: 394: 392:, Switzerland 391: 388: 387: 384: 381: 378: 375: 374: 368: 365: 364: 362: 359: 356: 353: 352: 349: 346: 344: 341: 339: 336: 335: 332: 329: 323: 321: 317: 313: 309: 305: 302: 298: 295: 291: 288: 287:pilot project 283: 279: 277: 273: 268: 264: 260: 256: 252: 251: 243: 236: 234: 232: 227: 224: 222: 217: 211: 209: 207: 203: 199: 195: 190: 186: 184: 180: 176: 165: 161: 155: 149: 145: 143: 139: 136:Miscellaneous 134: 131: 127: 125: 121: 116: 112: 108: 104: 99: 88: 84: 81: 77: 74: 70: 66: 62: 59: 56: 54: 50: 45: 41: 35: 31: 27: 22: 16: 2099: 2095: 2070:. Retrieved 2066:the original 2061: 2052: 2038: 2011: 2007: 1997: 1983: 1946: 1943:BMC Genomics 1942: 1932: 1920:. Retrieved 1915: 1906: 1877: 1873: 1848:. Retrieved 1843: 1834: 1808:. Retrieved 1803: 1779:. Retrieved 1774: 1765: 1720: 1716: 1706: 1671: 1667: 1609: 1605: 1595: 1554: 1550: 1525:. Retrieved 1500:. Retrieved 1495: 1429: 1425: 1415: 1380: 1376: 1326: 1322: 1312: 1275: 1271: 1215: 1211: 1198: 1167: 1163: 1150: 1139: 1131:Sub Projects 1121: 1112: 1098: 1059:exon_number 1019:gene_status 977: 976: 902:feature-type 858: 857: 854: 842: 839:Usage/Access 834: 821: 812: 803: 724: 720: 604:Pseudogenes 525: 523: 520: 483: 347: 342: 337: 330: 327: 315: 314: 307: 306: 300: 299: 294:2007 October 293: 292: 281: 280: 267:2005 October 266: 265: 258: 257: 253: 249: 248: 228: 225: 218: 215: 191: 187: 174: 173: 151:Data release 86:Release date 15: 2072:7 September 1922:6 September 1850:5 September 1810:20 December 1781:8 September 1527:20 December 791:Methodology 734:sites, and 540:Categories 534:Categories 206:pseudogenes 189:variants”. 146:Open Access 29:Description 2152:Categories 1502:14 October 1432:(9): R51. 1329:: bas014. 1190:References 1104:Challenges 1027:gene_name 1009:gene_type 960:{0,1,2,.} 259:2005 April 223:assembly. 202:non-coding 37:Data types 1777:. c. 2005 1757:227260109 1521:"GENCODE" 985:Key name 282:2007 June 153:frequency 2126:24185837 2030:17567988 1975:20923551 1820:cite web 1749:33270111 1698:29106570 1644:17571346 1587:22837649 1579:15499007 1458:22951037 1407:22955988 1353:22434846 1323:Database 1304:16925838 1242:22955987 1173:See also 1067:exon_id 993:gene_id 869:Content 557:232,117 484:Source: 80:22955987 39:captured 2117:3851240 2062:Ensembl 1966:3091687 1949:: 538. 1916:Ensembl 1897:3026266 1844:GENCODE 1804:GENCODE 1740:7778937 1689:5753355 1635:2212820 1614:Bibcode 1559:Bibcode 1551:Science 1496:GENCODE 1449:3491395 1398:3431493 1344:3308168 1295:1810553 1233:3431492 1141:Ensembl 1136:Ensembl 1054:string 1030:string 736:peptide 703:13,685 689:63,058 627:48,734 621:10,669 613:17,378 607:14,761 599:26,000 585:59,269 579:17,958 571:85,269 565:19,962 551:60,660 237:History 231:Ensembl 175:GENCODE 163:Version 142:License 106:Website 91: ( 64:Authors 47:Contact 24:Content 19:GENCODE 2124: 2114: 2028: 1973: 1963: 1894: 1755: 1747: 1737: 1696: 1686: 1642: 1632: 1606:Nature 1585: 1577: 1456: 1446: 1405: 1395: 1351: 1341: 1302: 1292: 1240: 1230: 1075:level 949:{+,-} 738:hits. 635:3,554 593:7,569 543:Total 537:Total 276:RT-PCR 183:ENCODE 179:genome 101:Access 78: 1753:S2CID 1583:S2CID 1208:(PDF) 1160:RGASP 825:RGASP 732:PolyA 471:, USA 196:with 118:Tools 2122:PMID 2074:2014 2026:PMID 1971:PMID 1924:2014 1852:2014 1826:link 1812:2020 1783:2014 1745:PMID 1694:PMID 1640:PMID 1575:PMID 1529:2020 1504:2019 1454:PMID 1403:PMID 1349:PMID 1327:2012 1300:PMID 1238:PMID 1122:The 711:236 697:409 683:645 647:236 316:2020 308:2018 274:and 272:RACE 194:loci 76:PMID 2112:PMC 2104:doi 2016:doi 1961:PMC 1951:doi 1892:PMC 1882:doi 1735:PMC 1725:doi 1684:PMC 1676:doi 1630:PMC 1622:doi 1610:447 1567:doi 1555:306 1444:PMC 1434:doi 1393:PMC 1385:doi 1339:PMC 1331:doi 1290:PMC 1280:doi 1228:PMC 1220:doi 671:18 659:48 200:, 124:Web 2154:: 2120:. 2110:. 2100:10 2098:. 2094:. 2082:^ 2060:. 2024:. 2012:17 2010:. 2006:. 1969:. 1959:. 1947:11 1945:. 1941:. 1914:. 1890:. 1878:11 1876:. 1872:. 1860:^ 1842:. 1822:}} 1818:{{ 1802:. 1791:^ 1773:. 1751:. 1743:. 1733:. 1721:49 1719:. 1715:. 1692:. 1682:. 1672:46 1670:. 1666:. 1652:^ 1638:. 1628:. 1620:. 1608:. 1604:. 1581:. 1573:. 1565:. 1553:. 1549:. 1537:^ 1512:^ 1494:. 1483:^ 1466:^ 1452:. 1442:. 1430:13 1428:. 1424:. 1401:. 1391:. 1381:22 1379:. 1375:. 1361:^ 1347:. 1337:. 1325:. 1321:. 1298:. 1288:. 1274:. 1270:. 1250:^ 1236:. 1226:. 1216:22 1214:. 1210:. 965:9 954:8 943:7 938:. 932:6 921:5 910:4 899:3 888:2 877:1 208:. 2128:. 2106:: 2076:. 2032:. 2018:: 1977:. 1953:: 1926:. 1900:. 1884:: 1854:. 1828:) 1814:. 1785:. 1759:. 1727:: 1700:. 1678:: 1646:. 1624:: 1616:: 1589:. 1569:: 1561:: 1531:. 1506:. 1460:. 1436:: 1409:. 1387:: 1355:. 1333:: 1306:. 1282:: 1276:7 1244:. 1222:: 95:)

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index