Consensus CDS Project - Knowledge (XXG)

500:(uORFs), secondary structure and the sequence context around the translation initiation site. A common start site is defined within Kozak consensus sequence: (GCC) GCCACCAUGG in vertebrates. The sequence in brackets (GCC) is the motif with unknown biological impact. There are variations within Kozak consensus sequence, such as G or A is observed three nucleotides upstream (at position -3) of AUG. Bases between positions -3 and +4 of Kozak sequence have the most significant impact on translational efficiency. Hence, a sequence (A/G)NNAUGG is defined as a strong Kozak signal in the CCDS project. 701:, which provides FTP download links and a query interface to acquire information about CCDS sequences and locations. CCDS reports can be obtained by using the query interface, which is located at the top of the CCDS data set page. Users can select various types of identifiers such as CCDS ID, gene ID, gene symbol, nucleotide ID and protein ID to search for specific CCDS information. The CCDS reports (Figure 1) are presented in a table format, providing links to specific resources, such as a history report, 1428:, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D (2009). 651:

corresponding protein molecules remain unknown. However, the definition of a read-through gene in the CCDS data set is that the individual partner genes must be distinct, and the read-through transcripts must share ≥ 1 exon (or ≥ 2 splice sites except in the case of a shared terminal exon) with each of the distinct shorter loci. Transcripts are not considered to be read-through transcripts in the following circumstances:

717:. The chromosome location table includes the genomic coordinates for each individual exon of the specific coding sequence. This table also provides links to several different genome browsers, which allow you to visualise the structure of the coding region. Exact nucleotide sequence and protein sequence of the specific coding sequence are also displayed in the section of CCDS sequence data. 369:

The CCDS database operates an internal website that serves multiple purposes including curator communication, collaborator voting, providing special reports and tracking the status of CCDS representations. When a collaborating CCDS group member identifies a CCDS ID that may need review, a voting process is employed to decide on the final outcome.

1954:

Harrow, J.; Frankish, A.; Gonzalez, J. M.; Tapanari, E.; Diekhans, M.; Kokocinski, F.; Aken, B. L.; Barrell, D.; Zadissa, A.; Searle, S.; Barnes, I.; Bignell, A.; Boychenko, V.; Hunt, T.; Kay, M.; Mukherjee, G.; Rajan, J.; Despacio-Reyes, G.; Saunders, G.; Steward, C.; Harte, R.; Lin, M.; Howald, C.;

368:

The CCDS database is unique in that the review process must be carried out by multiple collaborators, and agreement must be reached before any changes can be made. This is made possible with a collaborator coordination system that includes a work process flow and forums for analysis and discussion.

1362:

The CCDS set will become more complete as the independent curation groups agree on cases where they initially differ, as additional experimental validation of weakly supported genes occurs, and as automatic annotation methods continue to improve. Communication among the CCDS collaborating groups is

389:

and HAVANA annotation guidelines and thus, new annotations provided by both groups are more likely to be concordant and result in addition of a CCDS ID. These standards address specific problem areas, are not a comprehensive set of annotation guidelines, and do not restrict the annotation policies

125:

is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies. The CCDS project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures

534:

must be annotated except when there is experimental evidence that an internal start site is used to initiate translation. Additionally, other types of new data, such as ribosome profiling data, can be used to identify start codons. The CCDS data set records one translation initiation site per CCDS

1552:

Farrell, CM; O'Leary, NA; Harte, RA; Loveland, JE; Wilming, LG; Wallin, C; Diehans, M; Barrell, D; Searle, SM; Aken, B; Hiatt, SM; Frankish, A; Suner, MM; Rajput, B; Steward, CA; Brown, GR; Bennet, R; Murphy, M; Wu, W; Kay, MP; Hart, J; Rajan, J; Weber, J; Snow, C; Riddick, LD; Hunt, T; Webb, D;

377:

Coordinated manual curation is supported by a restricted-access website and a discussion e-mail list. CCDS curation guidelines were established to address specific conflicts that were observed at a higher frequency. Establishment of CCDS curation guidelines has helped to make the CCDS curation

154:

Biological and biomedical research has come to rely on accurate and consistent annotation of genes and their products on genome assemblies. Reference annotations of genomes are available from various sources, each with their own independent goals and policies, which results in some annotation

253:

In order to ensure that CDSs are of high quality, multiple quality assurance (QA) tests are performed (Table 1). All tests are performed following the annotation comparison step of each CCDS build and are independent of individual annotation group QA tests performed prior to the annotation

650:

or co-transcribed genes. Read-through transcripts are defined as transcripts combining at least part of one exon from each of two or more distinct known (partner) genes which lie on the same chromosome in the same orientation. The biological function of read-through transcripts and their

1955:

Tanzer, A.; Derrien, T.; Chrast, J.; Walters, N.; Balasubramanian, S.; Pei, B.; Tress, M.; Rodriguez, J. M.; Ezkurdia, I.; van Baren, J.; Brent, M.; Haussler, D.; Kellis, M.; Valencia, A.; Reymond, A.; Gerstein, M.; Guigo, R.; Hubbard, T. J. (5 September 2012).

237:"Consensus" is defined as protein-coding regions that agree at the start codon, stop codon, and splice junctions, and for which the prediction meets quality assurance benchmarks. A combination of manual and automated genome annotations provided by 721: 1358:

that have the same CCDS ID. It is also anticipated that as more complete and high-quality genome sequence data become available for other organisms, annotations from these organisms may be in scope for CCDS representation.

771:

The CCDS data set size has continued to increase with both the computational genome annotation updates, which integrate new data sets submitted to the International Nucleotide Sequence Database Collaboration

162:

assemblies by the participating annotation groups. The CCDS gene sets that have been arrived at by consensus of the different partners now consist of over 18,000 human and over 20,000 mouse genes (see

1553:

Thomas, M; Tamez, P; Rangwala, SH; McGarvey, KM; Pujar, S; Shkeda, A; Mudge, JM; Gonzale, JM; Gilbert, JG; Trevaion, SJ; Baetsch, R; Harrow, JL; Hubbard, T; Ostell, JM; Haussler, D; Pruitt, KD (2014).

1843:

Prakash, Tulika; Sharma, Vineet K.; Adati, Naoki; Ozawa, Ritsuko; Kumar, Naveen; Nishida, Yuichiro; Fujikake, Takayoshi; Takeda, Tadayuki; Taylor, Todd D.; Michalak, Pawel (12 October 2010).

378:

process more efficient by reducing the number of conflicting votes and time spent in discussion to reach a consensus agreement. A link to the CCDS curation guidelines can be found

689:. Once these quality problems are identified, the CCDS collaborators report the issues to the Genome Reference Consortium, which investigates and makes the necessary corrections. 1363:

ongoing and will resolve differences and identify refinements between CCDS update cycles. Human updates are expected to occur roughly every 6 months and mouse releases yearly.

553:

are another challenge for the CCDS data set. The scanning mechanism for translation initiation suggests that small ribosomal subunits (40S) bind at the 5’ end of a nascent

2078: 503:

According to the scanning mechanism, the small ribosomal subunit can initiate translation from the first reached start codon. There are exceptions to the scanning model:

2088: 557:

transcript and scan for the first AUG start codon. It is possible that an uAUG is recognised first, and the corresponding uORF is then translated. The translated u

739:

gene annotation project and it is used as a standard for high-quality coding exon definition in various research fields, including clinical studies, large-scale

360:

Annotations that fail QA tests undergo a round of manual checking that may improve results or reach a decision to reject annotation matches based on QA failure.

1492:

Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012).

238: 184: 127: 47: 677:

sequences become another challenge. Quality problems occur when the reference genome is misassembled. Thereby the misassembled genome may contain premature

1796:"The canonical UPF1-dependent nonsense-mediated mRNA decay is inhibited in transcripts carrying a short open reading frame independent of sequence context" 401:

Conflicting opinions are addressed by consulting with scientific experts or other annotation curation groups such as the HUGO Gene Nomenclature Committee

158:

The CCDS project was established to identify a gold standard set of protein-coding gene annotations that are identically annotated on the human and mouse

776:), and on ongoing curation activities that supplement or improve upon that annotation. Table 2 summarises the key statistics for each CCDS build where 706: 390:

of any collaborating group. Examples include, standardized curation guidelines for selection of the initiation codon and interpretation of upstream

451:

candidate. The CCDS collaborators use a conservative method, based on the EJC model, to screen mRNA transcripts. Any transcripts determined to be

662:

when transcripts are translated from genes that have nested structures relative to each other. In this instance, the CCDS collaborators and the

2093: 55: 447:(EJC) model. In this model, if the stop codon is >50 nt upstream of the last exon-exon junction, the transcript is assumed to be a 1695: 190: 51: 1339: 541:

AUG initiation codons located within transcript leaders are known as upstream AUGs (uAUGs). Sometimes, uAUGs are associated with u

673:

As the CCDS data set is built to represent genomic annotations of human and mouse, the quality problems with the human and mouse

663: 402: 202: 409:. If a conflict cannot be resolved, then collaborators agree to withdraw the CCDS ID until more information becomes available. 226: 196: 59: 1350:

Long-term goals include the addition of attributes that indicate where transcript annotation is also identical (including the

398:. Curation occurs continuously, and any of the collaborating centers can flag a CCDS ID as a potential update or withdrawal. 752: 1430:"The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes" 2073: 1733: 245:(which incorporates manual HAVANA annotations) are compared to identify annotations with matching genomic coordinates. 2083: 780:

are all those that were not under review or pending an update or withdrawal at the time of the current release date.

747:

projects and exon array design. Due to the consensus annotation of CCDS exons by the independent annotation groups,

601:

transcript before it reaches the protein-coding regions. Currently, no studies have reported the global impact of u

507:

when the initiation site is not surrounded by a strong Kozak signal, which results in leaky scanning. Thereby, the

1734:"Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans" 2004:

Parla, Jennifer S; Iossifov, Ivan; Grabill, Ian; Spector, Mona S; Kramer, Melissa; McCombie, W Richard (2011).

1382: 406: 208: 751:

projects in particular have regarded CCDS coding exons as reliable targets for downstream studies (e.g., for

2098: 578: 570: 562: 483: 475: 467: 460: 452: 448: 440: 428: 420: 395: 439:

is translated, the truncated protein may cause disease. Different mechanisms have been proposed to explain

444: 1794:

Silva, AL; Pereira, FJC; Morgado, A; Kong, J; Martins, R; Faustino, P; Liebhaber, SA; Romao, L (2006).

535:

ID. Any alternative start sites may be used for translation and will be stated in a CCDS public note.

379: 1856: 1748: 168: 1355: 1351: 682: 135: 258:

Table 1: Examples of the types of CCDS QA tests performed prior to acceptance of CCDS candidates

705:

or re-query the CCDS data set. The sequence identifiers table presents transcript information in

636: 613: 602: 586: 574: 566: 558: 550: 546: 542: 531: 523: 515: 497: 391: 2037: 1986: 1936: 1884: 1825: 1776: 1711: 1691: 1656: 1584: 1523: 1459: 656: 2058: 2027: 2017: 1976: 1968: 1926: 1918: 1902: 1874: 1864: 1815: 1807: 1766: 1756: 1701: 1683: 1646: 1638: 1574: 1566: 1513: 1505: 1449: 1441: 1425: 674: 338:

Checks if the protein encoded by the NCBI RefSeq is the same length as the EBI/WTSI protein

159: 126:

that they are consistently represented by the National Center for Biotechnology Information

100: 714: 306:

Checks for transcripts or proteins that are unusually short, typically <100 amino acids

1338:

The complete set of release statistics can be found at the official CCDS website on their

42: 725:

Figure 1. The CCDS data set screenshot showing the report for Itm2a protein (CCDS 30349).

466:

there is experimental evidence suggesting that a functional protein is produced from the

1860: 1752: 2032: 2005: 1981: 1956: 1931: 1906: 1879: 1844: 1820: 1795: 1771: 1706: 1675: 1651: 1626: 1579: 1554: 1518: 1493: 1454: 1429: 647: 1642: 2067: 1421: 756: 628: 621: 609: 598: 554: 436: 435:

before it can be translated into protein. This is important because if the defective

432: 424: 69: 1845:"Expression of Conjoined Genes: Another Mechanism for Gene Regulation in Eukaryotes" 1377: 455:

candidates are excluded from the CCDS data set except in the following situations:

385:

Curation policies established for the CCDS data set have been integrated in to the

1687: 490:

group and the HAVANA project have subsequently revised their annotation policies.

1869: 1494:"Tracking and coordinating an international curation effort for the CCDS project" 549:

are found in approximately 50% of human and mouse transcripts. The existence of u

740: 666:

have agreed that the read-through transcript be represented as a separate locus.

478:

candidate transcripts were considered to be protein coding transcripts by both

314:

Checks for genes that are not conserved and/or are not in a HomoloGene cluster

1676:"Genome-wide Annotation and Quantitation of Translation by Ribosome Profiling" 1509: 686: 678: 632: 582: 2022: 463:

candidates however the locus is previously known to be protein coding region;

1761: 1627:"Pushing the limits of the scanning mechanism for initiation of translation" 274:

Checks for transcripts that may be subject to nonsense-mediated decay (NMD)

2041: 1990: 1940: 1888: 1829: 1780: 1715: 1660: 1588: 1555:"Current status and new features of the Consensus Coding Sequence database" 1527: 1463: 346:

Checks for >99% overall identity between the NCBI and EBI/WTSI proteins

1972: 1922: 1570: 1445: 720: 330:

Checks for the presence of an internal stop codon in the genomic sequence

1605:

Alberts, B; Johnson, A; Lewis, J; Raff, M; Roberts, K; Walter, P (2002).

594: 590: 519: 508: 496:

Multiple factors contribute to translation initiation, such as upstream

1957:"GENCODE: The reference human genome annotation for The ENCODE Project" 1811: 1392: 1372: 736: 710: 242: 131: 1674:

Ingolia, NT; Brar, GA; Rouskin, S; McGeachy, AM; Weissman, JS (2014).

511:

skips this AUG and initiates translation from a downstream start site;

1387: 702: 698: 487: 479: 386: 219: 138:. The integrity of the CCDS dataset is maintained through stringent 760: 748: 744: 719: 322:

Checks for a start or stop codon in the reference genome sequence

486:

candidate transcripts were represented in the CCDS data set. The

697:

The CCDS project is available from the NCBI CCDS data set page

1905:; Ostell, J.; Pruitt, K. D.; Tatusova, T. (28 November 2010). 608:

The current CCDS annotation guidelines allow the inclusion of

298:

Checks for genes that are predicted to be pseudogenes by UCSC

459:

all transcripts at one particular locus are assessed to be

589:

inhibit translation of the downstream gene by trapping a

530:

According to the CCDS annotation guidelines, the longest

225:

Human and Vertebrate Analysis and Annotation (HAVANA) at

101:

https://www.ncbi.nlm.nih.gov/projects/CCDS/CcdsBrowse.cgi

773: 616:

if they meet the following two biological requirements:

394:

and transcripts that are predicted to be candidates for

167:). The CCDS dataset is increasingly representing more 32:

Convergence towards a standard set of gene annotations

784:

Table 2. Summary statistics for past CCDS releases.

111: 106: 96: 91: 83: 75: 65: 41: 36: 28: 23: 565:candidate, although studies have shown that some u 1354:) and to indicate splice variants with different 1907:"Entrez Gene: gene-centered information at NCBI" 1416: 1414: 1412: 1410: 1408: 755:detection), and these exons have been used as 183:National Center for Biotechnology Information 139: 1732:Calvo, SE; Pagliarni, DJ; Mootha, VK (2009). 413:Curation challenges and annotation guidelines 48:National Center for Biotechnology Information 8: 1424:, Harrow J, Harte RA, Wallin C, Diekhans M, 735:The CCDS dataset is an integral part of the 18: 646:Read-through transcripts are also known as 522:to re-initiate translation at a downstream 494:Multiple in-frame translation start sites: 17: 2079:Genetic engineering in the United Kingdom 2031: 2021: 2006:"A comparative analysis of exome capture" 1980: 1930: 1878: 1868: 1819: 1770: 1760: 1705: 1650: 1620: 1618: 1616: 1578: 1517: 1453: 179:Participating annotation groups include: 164: 2089:Science and technology in Cambridgeshire 1600: 1598: 1487: 1485: 1483: 1481: 1479: 1477: 1475: 1473: 782: 354:Checks if the GeneID is no longer valid 256: 123:Consensus Coding Sequence (CCDS) Project 1727: 1725: 1547: 1545: 1543: 1541: 1539: 1537: 1404: 803: 800: 797: 794: 791: 265: 262: 788: 290:Checks for non-canonical splice sites 1607:Molecular Biology of the Cell 5th edn 671:Quality of reference genome sequence: 624:transcript has a strong Kozak signal; 335:NCBI:Ensembl protein length different 7: 143: 56:University of California, Santa Cruz 659:but do not share same splice sites; 655:when transcripts are produced from 593:initiation complex and causing the 585:. It also has been suggested that u 759:targets in commercially available 319:CDS start or stop not in alignment 214:Manual annotation is provided by: 189:European Bioinformatics Institute 14: 343:NCBI:Ensembl low percent identity 282:Checks for low coding propensity 201:HUGO Gene Nomenclature Committee 52:European Bioinformatics Institute 311:Ortholog not found/not conserved 195:Wellcome Trust Sanger Institute 482:and HAVANA, and thereby, these 60:Wellcome Trust Sanger Institute 573:. The average size limit for u 418:Nonsense-mediated decay (NMD): 171:events with each new release. 1: 2094:South Cambridgeshire District 1688:10.1002/0471142727.mb0418s103 1682:. Chapter 4: 4.18.1–4.18.19. 1643:10.1016/S0378-1119(02)01056-9 635:or overlaps with the primary 605:on translational regulation. 539:Upstream open reading frames: 405:and Mouse Genome Informatics 1870:10.1371/journal.pone.0013284 1741:Proc. Natl. Acad. Sci. U.S.A 1609:. New York: Garland Science. 2115: 631:transcript is either ≥ 35 287:Non-consensus splice sites 233:Defining the CCDS gene set 1340:Releases & Statistics 753:single nucleotide variant 644:Read-through transcripts: 249:Quality assurance testing 207:Mouse Genome Informatics 150:Motivation and background 140:quality assurance testing 2023:10.1186/gb-2011-12-9-r97 1383:Mouse Genome Informatics 685:, or likely polymorphic 612:transcripts containing u 1762:10.1073/pnas.0810916106 1680:Curr. Protoc. Mol. Biol 1510:10.1093/database/bas008 597:to dissociate from the 396:nonsense-mediated decay 79:Pruitt KD, et al (2009) 727: 427:surveillance process. 1973:10.1101/gr.135350.111 1917:(Database): D52–D57. 1446:10.1101/gr.080531.108 804:Current release date 798:Public CCDS ID count 723: 470:candidate transcript. 445:exon junction complex 431:eliminates defective 423:is the most powerful 2074:Biological databases 767:CCDS release history 731:Current applications 581:is approximately 35 295:Predicted pseudogene 266:Purpose of the test 218:Reference Sequence ( 169:alternative splicing 165:CCDS release history 1923:10.1093/nar/gkq1237 1861:2010PLoSO...513284P 1753:2009PNAS..106.7507C 1571:10.1093/nar/gkt1059 785: 693:Access to CCDS data 498:open reading frames 259: 175:Contributing groups 136:UCSC Genome Browser 20: 2084:Genetics databases 1812:10.1261/rna.201406 783: 728: 683:frame-shift indels 637:open reading frame 257: 1911:Nucleic Acids Res 1625:Kozak, M (2002). 1565:(D1): D865–D872. 1559:Nucleic Acids Res 1336: 1335: 657:overlapping genes 577:that will escape 358: 357: 351:Gene discontinued 119: 118: 2106: 2046: 2045: 2035: 2025: 2001: 1995: 1994: 1984: 1967:(9): 1760–1774. 1951: 1945: 1944: 1934: 1899: 1893: 1892: 1882: 1872: 1840: 1834: 1833: 1823: 1791: 1785: 1784: 1774: 1764: 1738: 1729: 1720: 1719: 1709: 1671: 1665: 1664: 1654: 1622: 1611: 1610: 1602: 1593: 1592: 1582: 1549: 1532: 1531: 1521: 1489: 1468: 1467: 1457: 1418: 1346:Future prospects 786: 675:reference genome 443:; one being the 260: 160:reference genome 76:Primary citation 21: 2114: 2113: 2109: 2108: 2107: 2105: 2104: 2103: 2064: 2063: 2055: 2050: 2049: 2003: 2002: 1998: 1953: 1952: 1948: 1901: 1900: 1896: 1842: 1841: 1837: 1806:(12): 2160–70. 1793: 1792: 1788: 1747:(18): 7507–12. 1736: 1731: 1730: 1723: 1698: 1673: 1672: 1668: 1624: 1623: 1614: 1604: 1603: 1596: 1551: 1550: 1535: 1491: 1490: 1471: 1420: 1419: 1406: 1401: 1369: 1348: 778:Public CCDS IDs 769: 733: 695: 648:conjoined genes 514:when a shorter 415: 375: 373:Manual curation 366: 251: 235: 177: 152: 144:manual curation 115:CCDS Release 24 58: 54: 50: 43:Research center 12: 11: 5: 2112: 2110: 2102: 2101: 2099:Wellcome Trust 2096: 2091: 2086: 2081: 2076: 2066: 2065: 2062: 2061: 2059:CCDS home page 2054: 2053:External links 2051: 2048: 2047: 1996: 1946: 1894: 1855:(10): e13284. 1835: 1786: 1721: 1696: 1666: 1612: 1594: 1533: 1469: 1440:(7): 1316–23. 1403: 1402: 1400: 1397: 1396: 1395: 1390: 1385: 1380: 1375: 1368: 1365: 1347: 1344: 1334: 1333: 1330: 1327: 1324: 1321: 1316: 1312: 1311: 1308: 1305: 1302: 1299: 1294: 1290: 1289: 1286: 1283: 1280: 1277: 1272: 1268: 1267: 1264: 1261: 1258: 1255: 1250: 1246: 1245: 1242: 1239: 1236: 1233: 1228: 1224: 1223: 1222:July 30, 2015 1220: 1217: 1214: 1211: 1206: 1202: 1201: 1198: 1195: 1192: 1189: 1184: 1180: 1179: 1176: 1173: 1170: 1167: 1162: 1158: 1157: 1154: 1151: 1148: 1145: 1140: 1136: 1135: 1132: 1129: 1126: 1123: 1118: 1114: 1113: 1110: 1107: 1104: 1101: 1096: 1092: 1091: 1088: 1085: 1082: 1079: 1074: 1070: 1069: 1066: 1063: 1060: 1057: 1052: 1048: 1047: 1044: 1041: 1038: 1035: 1030: 1026: 1025: 1022: 1019: 1016: 1013: 1008: 1004: 1003: 1000: 997: 994: 991: 986: 982: 981: 978: 975: 972: 969: 964: 960: 959: 956: 953: 950: 947: 942: 938: 937: 934: 931: 928: 925: 920: 916: 915: 912: 909: 906: 903: 898: 894: 893: 890: 887: 884: 881: 876: 872: 871: 868: 865: 862: 859: 854: 850: 849: 846: 843: 840: 837: 832: 828: 827: 824: 821: 818: 815: 810: 806: 805: 802: 801:Gene ID count 799: 796: 795:Assembly name 793: 790: 768: 765: 732: 729: 694: 691: 668: 667: 660: 641: 640: 625: 528: 527: 518:can allow the 512: 472: 471: 464: 414: 411: 374: 371: 365: 364:Review process 362: 356: 355: 352: 348: 347: 344: 340: 339: 336: 332: 331: 328: 324: 323: 320: 316: 315: 312: 308: 307: 304: 300: 299: 296: 292: 291: 288: 284: 283: 280: 276: 275: 272: 271:Subject to NMD 268: 267: 264: 250: 247: 234: 231: 230: 229: 223: 212: 211: 205: 199: 193: 187: 176: 173: 151: 148: 117: 116: 113: 109: 108: 104: 103: 98: 94: 93: 89: 88: 85: 81: 80: 77: 73: 72: 67: 63: 62: 45: 39: 38: 34: 33: 30: 26: 25: 13: 10: 9: 6: 4: 3: 2: 2111: 2100: 2097: 2095: 2092: 2090: 2087: 2085: 2082: 2080: 2077: 2075: 2072: 2071: 2069: 2060: 2057: 2056: 2052: 2043: 2039: 2034: 2029: 2024: 2019: 2015: 2011: 2007: 2000: 1997: 1992: 1988: 1983: 1978: 1974: 1970: 1966: 1962: 1958: 1950: 1947: 1942: 1938: 1933: 1928: 1924: 1920: 1916: 1912: 1908: 1904: 1898: 1895: 1890: 1886: 1881: 1876: 1871: 1866: 1862: 1858: 1854: 1850: 1846: 1839: 1836: 1831: 1827: 1822: 1817: 1813: 1809: 1805: 1801: 1797: 1790: 1787: 1782: 1778: 1773: 1768: 1763: 1758: 1754: 1750: 1746: 1742: 1735: 1728: 1726: 1722: 1717: 1713: 1708: 1703: 1699: 1697:9780471142720 1693: 1689: 1685: 1681: 1677: 1670: 1667: 1662: 1658: 1653: 1648: 1644: 1640: 1637:(1–2): 1–34. 1636: 1632: 1628: 1621: 1619: 1617: 1613: 1608: 1601: 1599: 1595: 1590: 1586: 1581: 1576: 1572: 1568: 1564: 1560: 1556: 1548: 1546: 1544: 1542: 1540: 1538: 1534: 1529: 1525: 1520: 1515: 1511: 1507: 1503: 1499: 1495: 1488: 1486: 1484: 1482: 1480: 1478: 1476: 1474: 1470: 1465: 1461: 1456: 1451: 1447: 1443: 1439: 1435: 1431: 1427: 1423: 1417: 1415: 1413: 1411: 1409: 1405: 1398: 1394: 1391: 1389: 1386: 1384: 1381: 1379: 1376: 1374: 1371: 1370: 1366: 1364: 1360: 1357: 1353: 1345: 1343: 1341: 1332:Oct 26, 2022 1331: 1328: 1325: 1322: 1320: 1317: 1314: 1313: 1310:Oct 24, 2019 1309: 1306: 1303: 1300: 1298: 1295: 1292: 1291: 1288:Jun 14, 2018 1287: 1284: 1281: 1278: 1276: 1273: 1270: 1269: 1265: 1262: 1259: 1256: 1254: 1251: 1248: 1247: 1243: 1240: 1237: 1234: 1232: 1229: 1226: 1225: 1221: 1218: 1215: 1212: 1210: 1207: 1204: 1203: 1200:May 12, 2015 1199: 1196: 1193: 1190: 1188: 1185: 1182: 1181: 1178:Sep 10, 2014 1177: 1174: 1171: 1168: 1166: 1163: 1160: 1159: 1156:Sep 10, 2014 1155: 1152: 1149: 1146: 1144: 1141: 1138: 1137: 1133: 1130: 1127: 1124: 1122: 1119: 1116: 1115: 1112:Nov 29, 2013 1111: 1108: 1105: 1102: 1100: 1097: 1094: 1093: 1089: 1086: 1083: 1080: 1078: 1075: 1072: 1071: 1068:Oct 24, 2013 1067: 1064: 1061: 1058: 1056: 1053: 1050: 1049: 1046:Apr 29, 2013 1045: 1042: 1039: 1036: 1034: 1031: 1028: 1027: 1023: 1020: 1017: 1014: 1012: 1009: 1006: 1005: 1002:Oct 25, 2012 1001: 998: 995: 992: 990: 987: 984: 983: 979: 976: 973: 970: 968: 965: 962: 961: 958:Aug 14, 2012 957: 954: 951: 948: 946: 943: 940: 939: 936:Apr 20, 2011 935: 932: 929: 926: 924: 921: 918: 917: 913: 910: 907: 904: 902: 899: 896: 895: 892:Jan 24, 2011 891: 888: 885: 882: 880: 877: 874: 873: 869: 866: 863: 860: 858: 855: 852: 851: 848:Nov 28, 2007 847: 844: 841: 838: 836: 833: 830: 829: 826:Mar 14, 2007 825: 822: 819: 816: 814: 811: 808: 807: 787: 781: 779: 775: 766: 764: 762: 758: 757:coding region 754: 750: 746: 742: 738: 730: 726: 722: 718: 716: 712: 708: 704: 700: 692: 690: 688: 684: 680: 676: 672: 665: 661: 658: 654: 653: 652: 649: 645: 638: 634: 630: 626: 623: 619: 618: 617: 615: 611: 606: 604: 600: 596: 592: 588: 584: 580: 576: 572: 568: 564: 560: 556: 552: 548: 544: 540: 536: 533: 525: 521: 517: 513: 510: 506: 505: 504: 501: 499: 495: 491: 489: 485: 481: 477: 469: 465: 462: 458: 457: 456: 454: 450: 446: 442: 438: 434: 430: 426: 422: 419: 412: 410: 408: 404: 399: 397: 393: 388: 383: 381: 372: 370: 363: 361: 353: 350: 349: 345: 342: 341: 337: 334: 333: 329: 327:Internal stop 326: 325: 321: 318: 317: 313: 310: 309: 305: 302: 301: 297: 294: 293: 289: 286: 285: 281: 278: 277: 273: 270: 269: 261: 255: 248: 246: 244: 240: 232: 228: 224: 221: 217: 216: 215: 210: 206: 204: 200: 198: 194: 192: 188: 186: 182: 181: 180: 174: 172: 170: 166: 161: 156: 149: 147: 145: 142:and on-going 141: 137: 133: 129: 124: 114: 110: 107:Miscellaneous 105: 102: 99: 95: 90: 86: 82: 78: 74: 71: 70:Kim D. Pruitt 68: 64: 61: 57: 53: 49: 46: 44: 40: 35: 31: 27: 22: 16: 2013: 2009: 1999: 1964: 1960: 1949: 1914: 1910: 1897: 1852: 1848: 1838: 1803: 1799: 1789: 1744: 1740: 1679: 1669: 1634: 1630: 1606: 1562: 1558: 1501: 1497: 1437: 1433: 1378:Human Genome 1361: 1349: 1337: 1319:Homo sapiens 1318: 1297:Mus musculus 1296: 1275:Homo sapiens 1274: 1266:Dec 8, 2016 1253:Mus musculus 1252: 1244:Sep 8, 2016 1231:Homo sapiens 1230: 1209:Mus musculus 1208: 1187:Homo sapiens 1186: 1165:Homo sapiens 1164: 1143:Mus musculus 1142: 1134:Aug 7, 2014 1121:Homo sapiens 1120: 1099:Homo sapiens 1098: 1090:Apr 7, 2014 1077:Mus musculus 1076: 1055:Homo sapiens 1054: 1033:Homo sapiens 1032: 1024:Aug 5, 2013 1011:Mus musculus 1010: 989:Homo sapiens 988: 980:Sep 6, 2011 967:Homo sapiens 966: 945:Mus musculus 944: 923:Homo sapiens 922: 914:Sep 2, 2009 901:Homo sapiens 900: 879:Mus musculus 878: 870:May 1, 2008 857:Homo sapiens 856: 835:Mus musculus 834: 813:Homo sapiens 812: 777: 770: 734: 724: 696: 670: 669: 643: 642: 607: 538: 537: 529: 502: 493: 492: 474:Previously, 473: 417: 416: 400: 384: 376: 367: 359: 254:comparison. 252: 236: 213: 178: 157: 153: 122: 120: 84:Release date 19:CCDS Project 15: 2010:Genome Biol 1903:Maglott, D. 1323:GRCh38.p14 1279:GRCh38.p12 703:Entrez Gene 687:pseudogenes 679:stop codons 633:amino acids 583:amino acids 561:could be a 279:Low quality 155:variation. 29:Description 2068:Categories 2016:(9): R97. 1961:Genome Res 1504:: bas008. 1434:Genome Res 1426:Maglott DR 1399:References 1301:GRCm38.p6 1125:GRCh37.p13 1103:GRCh37.p13 1059:GRCh37.p10 741:epigenomic 569:can avoid 1422:Pruitt KD 1257:GRCm38.p4 1235:GRCh38.p7 1213:GRCm38.p3 1191:GRCh38.p2 1147:GRCm38.p2 1081:GRCm38.p1 1037:GRCh37.p9 993:GRCh37.p5 971:GRCh37.p2 743:studies, 303:Too short 222:) at NCBI 2042:21958622 1991:22955987 1941:21115458 1889:20967262 1849:PLOS ONE 1830:17077274 1781:19372376 1716:23821443 1661:12459250 1589:24217909 1528:22434842 1498:Database 1464:19498102 1367:See also 792:Species 789:Release 595:ribosome 591:ribosome 520:ribosome 509:ribosome 263:QA test 2033:3308060 1982:3431492 1932:3013746 1880:2953495 1857:Bibcode 1821:1664719 1772:2669787 1749:Bibcode 1707:3775365 1652:7126118 1580:3965069 1519:3308164 1455:2704439 1393:Ensembl 1373:GENCODE 1329:19,107 1326:35,608 1307:20,486 1304:27,219 1285:19,033 1282:33,397 949:MGSCv37 886:17, 082 883:MGSCv37 839:MGSCv36 737:GENCODE 711:Ensembl 243:Ensembl 132:Ensembl 112:Version 97:Website 66:Authors 37:Contact 24:Content 2040: 2030: 1989: 1979: 1939: 1929: 1887: 1877: 1828: 1818: 1779: 1769: 1714: 1704: 1694: 1659: 1649: 1587: 1577: 1526: 1516: 1462: 1452: 1388:RefSeq 1342:page. 1263:20,354 1260:25,757 1241:18,892 1238:32,524 1219:20,215 1216:24,834 1197:18,826 1194:31,371 1175:18,800 1172:30,461 1169:GRCh38 1153:20,079 1150:23,835 1131:18,681 1128:28,897 1109:18,673 1106:28,649 1087:19,990 1084:23,010 1065:18,607 1062:27,655 1043:18,535 1040:27,377 1021:19,945 1018:22,934 1015:GRCm38 999:18,474 996:26,254 977:18,407 974:25,354 955:19,507 952:21,874 933:18,174 930:22,912 927:GRCh37 911:17,053 908:19,393 905:NCBI36 889:16,888 867:15,805 864:17,494 861:NCBI36 845:13,012 842:13,218 823:12,950 820:13,740 817:NCBI35 774:(INSDC 763:kits. 699:(here) 488:RefSeq 480:RefSeq 403:(HGNC) 387:RefSeq 239:(NCBI) 220:RefSeq 203:(HGNC) 197:(WTSI) 185:(NCBI) 134:, and 128:(NCBI) 92:Access 1737:(PDF) 761:exome 749:exome 745:exome 715:Blink 407:(MGI) 209:(MGI) 191:(EBI) 2038:PMID 1987:PMID 1937:PMID 1885:PMID 1826:PMID 1777:PMID 1712:PMID 1692:ISBN 1657:PMID 1631:Gene 1585:PMID 1524:PMID 1502:2012 1460:PMID 1356:UTRs 1352:UTRs 713:and 707:VEGA 664:HGNC 629:mRNA 627:the 622:mRNA 620:the 614:ORFs 610:mRNA 603:ORFs 599:mRNA 587:ORFs 575:ORFs 567:ORFs 555:mRNA 551:ORFs 547:ORFs 543:ORFs 437:mRNA 433:mRNA 425:mRNA 392:ORFs 380:here 241:and 227:WTSI 121:The 87:2009 2028:PMC 2018:doi 1977:PMC 1969:doi 1927:PMC 1919:doi 1875:PMC 1865:doi 1816:PMC 1808:doi 1800:RNA 1767:PMC 1757:doi 1745:106 1702:PMC 1684:doi 1647:PMC 1639:doi 1635:299 1575:PMC 1567:doi 1514:PMC 1506:doi 1450:PMC 1442:doi 1315:24 1293:23 1271:22 579:NMD 571:NMD 563:NMD 559:ORF 545:. u 532:ORF 524:ORF 516:ORF 484:NMD 476:NMD 468:NMD 461:NMD 453:NMD 449:NMD 441:NMD 429:NMD 421:NMD 2070:: 2036:. 2026:. 2014:12 2012:. 2008:. 1985:. 1975:. 1965:22 1963:. 1959:. 1935:. 1925:. 1915:39 1913:. 1909:. 1883:. 1873:. 1863:. 1851:. 1847:. 1824:. 1814:. 1804:12 1802:. 1798:. 1775:. 1765:. 1755:. 1743:. 1739:. 1724:^ 1710:. 1700:. 1690:. 1678:. 1655:. 1645:. 1633:. 1629:. 1615:^ 1597:^ 1583:. 1573:. 1563:42 1561:. 1557:. 1536:^ 1522:. 1512:. 1500:. 1496:. 1472:^ 1458:. 1448:. 1438:19 1436:. 1432:. 1407:^ 1249:21 1227:20 1205:19 1183:18 1161:17 1139:16 1117:15 1095:14 1073:13 1051:12 1029:11 1007:10 709:, 681:, 382:. 146:. 130:, 2044:. 2020:: 1993:. 1971:: 1943:. 1921:: 1891:. 1867:: 1859:: 1853:5 1832:. 1810:: 1783:. 1759:: 1751:: 1718:. 1686:: 1663:. 1641:: 1591:. 1569:: 1530:. 1508:: 1466:. 1444:: 985:9 963:8 941:7 919:6 897:5 875:4 853:3 831:2 809:1 639:. 526:.

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index