Knowledge

Phylogenetic Assignment of Named Global Outbreak Lineages

Source 📝

201:. A user with a full genome sequence of a sample of SARS-CoV-2 can use the tool to submit that sequence, which is then compared with other genome sequences, and assigned the most likely lineage (Pango lineage). Single or multiple runs are possible, and the tool can return further information regarding the known history of the assigned lineage. Additionally, it interfaces with Microreact, to show a time sequence of the location of reports of sequenced samples of the same lineage. This latter feature draws on publicly available genomes obtained from the 923: 160: 79: 1526: 38: 1538: 228:
et al. (2020), a Pango lineage is described as a cluster of sequences that are associated with an epidemiological event, for instance an introduction of the virus into a distinct geographic area with evidence of onward spread. Lineages are designed to capture the emerging edge of the pandemic and are
283:
Originally, PANGOLIN used a maximum-likelihood-based assignment algorithm to assign query SARS-CoV-2 the most likely lineage sequence. Since the release of Version 2.0 in July 2020, however, it has used the 'pangoLEARN' machine-learning-based assignment algorithm to assign lineages to new SARS-CoV-2
309:
PANGOLIN was created by Áine O'Toole and the Rambaut lab and released on 5 April 2020. The main developers of PANGOLIN are Áine O'Toole and Emily Scher; many others have contributed to various aspects of the tool, including Ben Jackson, J.T. McCrone, Verity Hill, and Rachel Colquhoun of the Rambaut
257:
is constructed from an alignment containing publicly available SARS-CoV-2 genomes, and sub-clusters of sequences in this tree are manually examined and cross-referenced against epidemiological information to designate new lineages; these can be designated by data producers, and lineage suggestions
379:
O'Toole, Áine; Scher, Emily; Underwood, Anthony; Jackson, Ben; Hill, Verity; McCrone, John T; Colquhoun, Rachel; Ruis, Chris; Abu-Dahab, Khalil; Taylor, Ben; Yeats, Corin; Du Plessis, Louis; Maloney, Daniel; Medd, Nathan; Attwood, Stephen W; Aanensen, David M; Holmes, Edward C; Pybus, Oliver G;
270:
These manually curated lineage designations, and the associated genome sequences, are the input into the machine learning model training. This model, both the training and the assignment, has been termed 'pangoLEARN'. The current version of pangoLEARN uses a classification tree, based on the
688:
Jacob, Jobin John; Vasudevan, Karthick; Pragasam, Agila Kumari; Gunasekaran, Karthik; Kang, Gagandeep; Veeraraghavan, Balaji; Mutreja, Ankur (22 December 2020). "Evolutionary tracking of SARS-CoV-2 genetic variants highlights intricate balance of stabilizing and destabilizing mutations".
1381: 300:
and from a GitHub repository, and as a web-application with a drag-and-drop graphical user interface. The PANGOLIN web application has assigned more than 512,000 unique SARS-CoV-2 sequences as of January 2021.
1326: 1290: 1229: 1308: 313:
The PANGOLIN web application was developed by the Centre for Genomic Pathogen Surveillance, namely Anthony Underwood, Ben Taylor, Corin Yeats, Khali Abu-Dahab, and David Aanensen.
1579: 437: 781: 1320: 1387: 1334: 1235: 701:
Phylogenetic Assignment of Named Global Outbreak LINeages tool (PANGOLIN) has been the most widely used tool for lineage assignment to newly emerging variants.
1363: 666: 1398: 566:
Rambaut, Andrew; Holmes, Edward C.; o'Toole, Áine; Hill, Verity; McCrone, John T.; Ruis, Christopher; Du Plessis, Louis; Pybus, Oliver G. (15 July 2020).
1375: 1369: 956: 182: 1357: 1134: 1314: 1212: 322: 253:
Distinct from the PANGOLIN tool, Pango lineages are regularly, manually curated based on the current globally circulating diversity. A large
544: 1070: 857:
The model was trained using ~60,000 SARS-CoV-2 sequences from GISAID... training this model takes approximately 30 minutes on our hardware
844: 1340: 1392: 1296: 1263: 1246: 1206: 429: 1194: 1182: 1564: 1024: 202: 1241: 773: 1161: 1078: 1046: 125: 1491: 1010: 949: 721: 230: 102: 1119: 1042: 984: 814: 70: 751: 1099: 137: 618: 1574: 1569: 1542: 1269: 1223: 1252: 284:
genomes. This approach is fast and can assign large numbers of SARS-CoV-2 genomes in a relatively short time.
1530: 1496: 1139: 992: 942: 297: 1460: 1200: 988: 617:
Pipes, Lenore; Wang, Hongru; Huelsenbeck, John P; Nielsen, Rasmus (9 December 2020). Malik, Harmit (ed.).
327: 880: 1481: 1094: 1465: 690: 186: 159: 78: 1455: 1382:
International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics
1274: 1062: 470: 189:. Its purpose is to implement a dynamic nomenclature (known as the Pango nomenclature) to classify 568:"Addendum: A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology" 1511: 1415: 1218: 536: 507: 931:— information on the Pango system rules, governance committee, and lineage designation committee 357: 1506: 1445: 1020: 836: 658: 640: 599: 499: 411: 254: 237: 190: 1486: 1450: 1258: 648: 630: 589: 579: 489: 479: 401: 393: 132: 1177: 922: 236:
Both the tool and the PANGOLIN nomenclature system have been used extensively during the
466:"A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology" 382:"Assignment of Epidemiological Lineages in an Emerging Pandemic Using the Pangolin Tool" 965: 653: 594: 567: 494: 465: 406: 381: 225: 178: 1189:
Microsoft Research - University of Trento Centre for Computational and Systems Biology
1558: 1440: 1036: 511: 1430: 1425: 1420: 293: 272: 584: 1501: 1435: 1054: 1016: 695: 484: 332: 194: 713: 644: 635: 905: 806: 662: 603: 503: 415: 108: 37: 1129: 1066: 1058: 1028: 743: 397: 210: 198: 221:
PANGOLIN is a key component underpinning the Pango nomenclature system.
17: 1104: 1050: 1006: 1002: 998: 980: 177:) is a software tool developed by Dr. Áine O'Toole and members of the 1302: 1156: 1109: 1074: 909: 259: 206: 1188: 1114: 619:"Assessing Uncertainty in the Rooting of the SARS-CoV-2 Phylogeny" 337: 934: 1032: 181:
laboratory, with an associated web application developed by the
938: 916: 872: 714:"pangoLEARN Store of the trained model for PANGOLIN to access" 148: 1291:
African Society for Bioinformatics and Computational Biology
928: 464:
Rambaut, A.; Holmes, E.C.; O'Toole, Á.; et al. (2020).
1230:
Max Planck Institute of Molecular Cell Biology and Genetics
171:
Phylogenetic Assignment of Named Global Outbreak Lineages
31:
Phylogenetic Assignment of Named Global Outbreak Lineages
1309:
International Nucleotide Sequence Database Collaboration
1474: 1408: 1350: 1283: 1170: 1148: 1087: 972: 143: 131: 121: 101: 69: 47: 1001:, database of protein sequences grouping together 837:"pangoLEARN PANGOLIN 2.0: pangoLEARN description" 1236:US National Center for Biotechnology Information 1321:International Society for Computational Biology 629:(4). Oxford University Press (OUP): 1537–1543. 229:at a fine-grain resolution suitable to genomic 1388:ISCB Africa ASBCB Conference on Bioinformatics 459: 457: 455: 275:implementation of a decision tree classifier. 1335:Institute of Genomics and Integrative Biology 950: 8: 1364:European Conference on Computational Biology 30: 1399:Research in Computational Molecular Biology 801: 799: 1580:Medical responses to the COVID-19 pandemic 1376:International Conference on Bioinformatics 957: 943: 935: 921: 158: 77: 36: 29: 1370:Intelligent Systems for Molecular Biology 694: 652: 634: 593: 583: 493: 483: 405: 258:can be submitted to the Pango team via a 531: 529: 527: 525: 523: 521: 183:Centre for Genomic Pathogen Surveillance 1358:Basel Computational Biology Conference‎ 867: 865: 349: 1315:International Society for Biocuration 1213:European Molecular Biology Laboratory 883:from the original on 10 February 2021 817:from the original on 15 February 2021 784:from the original on 19 February 2021 774:"sklearn.tree.DecisionTreeClassifier" 754:from the original on 28 February 2021 669:from the original on 10 December 2020 547:from the original on 10 February 2021 430:"Real-Time Epidemiology for COVID-19" 323:Colloquial names of COVID-19 variants 7: 1537: 873:"Pangolin COVID-19 Lineage Assigner" 847:from the original on 4 November 2021 440:from the original on 17 January 2021 1341:Japanese Society for Bioinformatics 724:from the original on 3 January 2021 1303:European Molecular Biology network 537:"Pangolin web application release" 25: 1393:Pacific Symposium on Biocomputing 1297:Australia Bioinformatics Resource 1264:Swiss Institute of Bioinformatics 1247:Netherlands Bioinformatics Centre 1207:European Bioinformatics Institute 1536: 1525: 1524: 1195:Database Center for Life Science 1183:Computational Biology Department 1071:Arabidopsis Information Resource 1041:Specialised genomic databases: 718:GitHub: cov-lineages/pangoLEARN 623:Molecular Biology and Evolution 380:Rambaut, Andrew (5 July 2021). 296:-based tool, downloadable from 203:COVID-19 Genomics UK Consortium 138:GNU General Public License v3.0 27:SARS-CoV-2 lineage nomenclature 1242:Japanese Institute of Genetics 1: 1162:Rosalind (education platform) 1079:Zebrafish Information Network 1047:Saccharomyces Genome Database 811:GitHub: cov-lineages/pangolin 1492:List of biological databases 1011:Protein Information Resource 434:www.pathogensurveillance.net 233:and outbreak investigation. 231:epidemiological surveillance 205:and from those submitted to 985:European Nucleotide Archive 292:PANGOLIN is available as a 1596: 585:10.1038/s41564-021-00872-5 1520: 1270:Wellcome Sanger Institute 1224:J. Craig Venter Institute 696:10.1101/2020.12.22.423920 485:10.1038/s41564-020-0770-5 97: 65: 35: 1253:Philippine Genome Center 209:. It is named after the 197:, the virus that causes 1497:Molecular phylogenetics 993:China National GeneBank 807:"cov-lineages/pangolin" 305:Creators and developers 84:; 14 months ago 1565:Phylogenetics software 1201:DNA Data Bank of Japan 989:DNA Data Bank of Japan 636:10.1093/molbev/msaa316 328:Variants of SARS-CoV-2 1482:Computational biology 997:Secondary databases: 53:; 4 years ago 979:Sequence databases: 187:South Cambridgeshire 1275:Whitehead Institute 1063:Rat Genome Database 572:Nature Microbiology 471:Nature Microbiology 279:Lineage assignation 249:Lineage designation 32: 1512:Sequence alignment 1219:Flatiron Institute 877:pangolin.cog-uk.io 398:10.1093/ve/veab064 51:30 April 2020 1552: 1551: 1507:Sequence database 1021:Protein Data Bank 1015:Other databases: 478:(11): 1403–1407. 255:phylogenetic tree 238:COVID-19 pandemic 167: 166: 16:(Redirected from 1587: 1575:Medical software 1570:Genome databases 1540: 1539: 1528: 1527: 1487:List of biobanks 1451:Stockholm format 1259:Scripps Research 959: 952: 945: 936: 925: 920: 919: 917:Official website 893: 892: 890: 888: 869: 860: 859: 854: 852: 841:cov-lineages.org 833: 827: 826: 824: 822: 803: 794: 793: 791: 789: 778:scikit-learn.org 770: 764: 763: 761: 759: 748:cov-lineages.org 744:"PANGO lineages" 740: 734: 733: 731: 729: 710: 704: 703: 698: 685: 679: 678: 676: 674: 656: 638: 614: 608: 607: 597: 587: 563: 557: 556: 554: 552: 533: 516: 515: 497: 487: 461: 450: 449: 447: 445: 426: 420: 419: 409: 376: 370: 369: 367: 365: 354: 224:As described in 191:genetic lineages 163: 162: 155: 152: 150: 117: 114: 112: 110: 92: 90: 85: 81: 61: 59: 54: 40: 33: 21: 1595: 1594: 1590: 1589: 1588: 1586: 1585: 1584: 1555: 1554: 1553: 1548: 1516: 1470: 1404: 1346: 1327:Student Council 1279: 1178:Broad Institute 1166: 1144: 1083: 968: 963: 915: 914: 902: 897: 896: 886: 884: 871: 870: 863: 850: 848: 835: 834: 830: 820: 818: 805: 804: 797: 787: 785: 772: 771: 767: 757: 755: 742: 741: 737: 727: 725: 712: 711: 707: 687: 686: 682: 672: 670: 616: 615: 611: 565: 564: 560: 550: 548: 541:virological.org 535: 534: 519: 463: 462: 453: 443: 441: 428: 427: 423: 386:Virus Evolution 378: 377: 373: 363: 361: 358:"Release 4.3.1" 356: 355: 351: 346: 319: 307: 290: 281: 268: 262:issue request. 251: 246: 219: 157: 147: 107: 93: 88: 86: 83: 57: 55: 52: 48:Initial release 43: 28: 23: 22: 15: 12: 11: 5: 1593: 1591: 1583: 1582: 1577: 1572: 1567: 1557: 1556: 1550: 1549: 1547: 1546: 1534: 1521: 1518: 1517: 1515: 1514: 1509: 1504: 1499: 1494: 1489: 1484: 1478: 1476: 1475:Related topics 1472: 1471: 1469: 1468: 1463: 1458: 1453: 1448: 1443: 1438: 1433: 1428: 1423: 1418: 1412: 1410: 1406: 1405: 1403: 1402: 1396: 1390: 1385: 1379: 1373: 1367: 1361: 1354: 1352: 1348: 1347: 1345: 1344: 1338: 1332: 1331: 1330: 1318: 1312: 1306: 1300: 1294: 1287: 1285: 1281: 1280: 1278: 1277: 1272: 1267: 1261: 1256: 1250: 1244: 1239: 1233: 1227: 1221: 1216: 1210: 1204: 1198: 1192: 1186: 1180: 1174: 1172: 1168: 1167: 1165: 1164: 1159: 1152: 1150: 1146: 1145: 1143: 1142: 1137: 1132: 1127: 1122: 1117: 1112: 1107: 1102: 1097: 1091: 1089: 1085: 1084: 1082: 1081: 1039: 1013: 995: 976: 974: 970: 969: 966:Bioinformatics 964: 962: 961: 954: 947: 939: 933: 932: 926: 912: 901: 900:External links 898: 895: 894: 861: 828: 795: 765: 735: 705: 680: 609: 558: 517: 451: 421: 392:(2): veab064. 371: 360:. 26 July 2023 348: 347: 345: 342: 341: 340: 335: 330: 325: 318: 315: 306: 303: 289: 286: 280: 277: 267: 266:Model training 264: 250: 247: 245: 242: 226:Andrew Rambaut 218: 215: 179:Andrew Rambaut 165: 164: 145: 141: 140: 135: 129: 128: 123: 119: 118: 105: 99: 98: 95: 94: 82:/ 26 July 2023 75: 73: 71:Stable release 67: 66: 63: 62: 49: 45: 44: 41: 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 1592: 1581: 1578: 1576: 1573: 1571: 1568: 1566: 1563: 1562: 1560: 1545: 1544: 1535: 1533: 1532: 1523: 1522: 1519: 1513: 1510: 1508: 1505: 1503: 1500: 1498: 1495: 1493: 1490: 1488: 1485: 1483: 1480: 1479: 1477: 1473: 1467: 1464: 1462: 1459: 1457: 1454: 1452: 1449: 1447: 1444: 1442: 1441:Pileup format 1439: 1437: 1434: 1432: 1429: 1427: 1424: 1422: 1419: 1417: 1414: 1413: 1411: 1407: 1400: 1397: 1394: 1391: 1389: 1386: 1383: 1380: 1377: 1374: 1371: 1368: 1365: 1362: 1359: 1356: 1355: 1353: 1349: 1342: 1339: 1336: 1333: 1328: 1325: 1324: 1322: 1319: 1316: 1313: 1310: 1307: 1304: 1301: 1298: 1295: 1292: 1289: 1288: 1286: 1284:Organizations 1282: 1276: 1273: 1271: 1268: 1265: 1262: 1260: 1257: 1254: 1251: 1248: 1245: 1243: 1240: 1237: 1234: 1231: 1228: 1225: 1222: 1220: 1217: 1214: 1211: 1208: 1205: 1202: 1199: 1196: 1193: 1190: 1187: 1184: 1181: 1179: 1176: 1175: 1173: 1169: 1163: 1160: 1158: 1154: 1153: 1151: 1147: 1141: 1138: 1136: 1133: 1131: 1128: 1126: 1123: 1121: 1118: 1116: 1113: 1111: 1108: 1106: 1103: 1101: 1098: 1096: 1093: 1092: 1090: 1086: 1080: 1076: 1072: 1068: 1064: 1060: 1056: 1052: 1048: 1044: 1040: 1038: 1037:Gene Ontology 1034: 1030: 1026: 1022: 1018: 1014: 1012: 1008: 1004: 1000: 996: 994: 990: 986: 982: 978: 977: 975: 971: 967: 960: 955: 953: 948: 946: 941: 940: 937: 930: 929:pango.network 927: 924: 918: 913: 911: 907: 904: 903: 899: 882: 878: 874: 868: 866: 862: 858: 846: 842: 838: 832: 829: 816: 812: 808: 802: 800: 796: 783: 779: 775: 769: 766: 753: 749: 745: 739: 736: 723: 719: 715: 709: 706: 702: 697: 692: 684: 681: 668: 664: 660: 655: 650: 646: 642: 637: 632: 628: 624: 620: 613: 610: 605: 601: 596: 591: 586: 581: 577: 573: 569: 562: 559: 546: 542: 538: 532: 530: 528: 526: 524: 522: 518: 513: 509: 505: 501: 496: 491: 486: 481: 477: 473: 472: 467: 460: 458: 456: 452: 439: 435: 431: 425: 422: 417: 413: 408: 403: 399: 395: 391: 387: 383: 375: 372: 359: 353: 350: 343: 339: 336: 334: 331: 329: 326: 324: 321: 320: 316: 314: 311: 304: 302: 299: 295: 287: 285: 278: 276: 274: 265: 263: 261: 256: 248: 243: 241: 239: 234: 232: 227: 222: 216: 214: 212: 208: 204: 200: 196: 192: 188: 184: 180: 176: 172: 161: 154: 146: 142: 139: 136: 134: 130: 127: 124: 120: 116: 113:/cov-lineages 106: 104: 100: 96: 80: 74: 72: 68: 64: 50: 46: 42:PANGOLIN logo 39: 34: 19: 1541: 1529: 1436:Nexus format 1431:NeXML format 1426:FASTQ format 1421:FASTA format 1409:File formats 1171:Institutions 1124: 885:. Retrieved 876: 856: 849:. Retrieved 840: 831: 819:. Retrieved 810: 786:. Retrieved 777: 768: 756:. Retrieved 747: 738: 726:. Retrieved 717: 708: 700: 683: 671:. Retrieved 626: 622: 612: 575: 571: 561: 549:. Retrieved 543:. May 2020. 540: 475: 469: 442:. Retrieved 433: 424: 389: 385: 374: 362:. Retrieved 352: 312: 308: 294:command-line 291: 288:Availability 282: 273:scikit-learn 269: 252: 235: 223: 220: 174: 170: 168: 89:26 July 2023 1416:CRAM format 1337:(CSIR-IGIB) 887:13 February 851:19 November 821:13 February 788:13 February 728:13 February 551:18 February 244:Description 76:4.3.1  1559:Categories 1502:Sequencing 1466:GTF format 1461:GFF format 1456:VCF format 1446:SAM format 1209:(EMBL-EBI) 1135:SOAP suite 1055:VectorBase 1017:BioNumbers 1003:Swiss-Prot 673:22 January 578:(3): 415. 444:22 January 344:References 333:Nextstrain 195:SARS-CoV-2 122:Written in 103:Repository 58:2020-04-30 1329:(ISCB-SC) 1299:(EMBL-AR) 1232:(MPI-CBG) 973:Databases 645:0737-4038 512:220544096 115:/pangolin 1531:Category 1401:(RECOMB) 1351:Meetings 1305:(EMBnet) 1155:Server: 1130:SAMtools 1125:PANGOLIN 1088:Software 1067:PHI-base 1059:WormBase 1029:InterPro 906:pangolin 881:Archived 845:Archived 815:Archived 782:Archived 752:Archived 722:Archived 667:Archived 663:33295605 604:33514928 545:Archived 504:32669681 438:Archived 416:34527285 364:1 August 317:See also 211:pangolin 199:COVID-19 175:PANGOLIN 149:pangolin 18:PANGOLIN 1543:Commons 1378:(InCoB) 1323:(ISCB) 1311:(INSDC) 1293:(ASBCB) 1197:(DBCLS) 1191:(COSBI) 1105:Clustal 1051:FlyBase 1025:Ensembl 999:UniProt 981:GenBank 758:4 March 691:bioRxiv 654:7798932 595:7845574 495:7610519 407:8344591 217:Context 151:.cog-uk 144:Website 133:License 87: ( 56: ( 1384:(CIBB) 1372:(ISMB) 1366:(ECCB) 1343:(JSBi) 1249:(NBIC) 1238:(NCBI) 1226:(JCVI) 1215:(EMBL) 1203:(DDBJ) 1157:ExPASy 1140:TopHat 1120:MUSCLE 1110:EMBOSS 1100:Bowtie 1075:GISAID 1035:, and 1007:TrEMBL 910:GitHub 693:  661:  651:  643:  602:  592:  510:  502:  492:  414:  404:  260:GitHub 207:GISAID 156:  126:Python 109:github 1395:(PSB) 1317:(ISB) 1266:(SIB) 1255:(PGC) 1185:(CBD) 1149:Other 1115:HMMER 1095:BLAST 508:S2CID 338:INSDC 310:Lab. 298:Conda 1077:and 1043:BOLD 1033:KEGG 1009:and 991:and 889:2021 853:2021 823:2021 790:2021 760:2021 730:2021 675:2021 659:PMID 641:ISSN 600:PMID 553:2021 500:PMID 446:2021 412:PMID 366:2023 193:for 169:The 111:.com 908:on 649:PMC 631:doi 590:PMC 580:doi 490:PMC 480:doi 402:PMC 394:doi 185:in 153:.io 1561:: 1360:() 1073:, 1069:, 1065:, 1061:, 1057:, 1053:, 1049:, 1045:, 1031:, 1027:, 1023:, 1019:, 1005:, 987:, 983:, 879:. 875:. 864:^ 855:. 843:. 839:. 813:. 809:. 798:^ 780:. 776:. 750:. 746:. 720:. 716:. 699:. 665:. 657:. 647:. 639:. 627:38 625:. 621:. 598:. 588:. 574:. 570:. 539:. 520:^ 506:. 498:. 488:. 474:. 468:. 454:^ 436:. 432:. 410:. 400:. 388:. 384:. 240:. 213:. 958:e 951:t 944:v 891:. 825:. 792:. 762:. 732:. 677:. 633:: 606:. 582:: 576:6 555:. 514:. 482:: 476:5 448:. 418:. 396:: 390:7 368:. 173:( 91:) 60:) 20:)

Index

PANGOLIN

Stable release
Edit this on Wikidata
Repository
github.com/cov-lineages/pangolin
Python
License
GNU General Public License v3.0
pangolin.cog-uk.io
Edit this on Wikidata
Andrew Rambaut
Centre for Genomic Pathogen Surveillance
South Cambridgeshire
genetic lineages
SARS-CoV-2
COVID-19
COVID-19 Genomics UK Consortium
GISAID
pangolin
Andrew Rambaut
epidemiological surveillance
COVID-19 pandemic
phylogenetic tree
GitHub
scikit-learn
command-line
Conda
Colloquial names of COVID-19 variants
Variants of SARS-CoV-2

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.