17:
67:
was established in 1982 for the collection, management, storage, and distribution of DNA sequence data due to the increasing availability of DNA sequences. With the increasing number of genetic data, biotechnological companies have been able to use human DNA sequence to develop protein and antibody
88:
became important to decipher the enormous collection of genomic data. They are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection. The followings are commonly used genetic algorithms:
154:, and many more. Mining for enzymes, researchers can figure out the classes that BGCs encode and compare target gene clusters to known gene clusters. To verify the relation between the BGCs and natural products, the target BGCs can be expressed by suitable host through the use of
166:
Genetic data has been accumulated in databases. Researchers are able to utilize algorithms to decipher the data accessible from databases for the discovery of new processes, targets, and products. The following are databases and tools:
68:
drugs through genome mining since 1992. In the late 1990s, many companies, such as Amgen, Immunec, Genentech were able to develop drugs that progressed to the clinical stage by adopting genome mining. Since the
147:
138:(BGCs) encoded in the microorganism. By adopting genome mining, the BGCs that produce the target natural product can be predicted. Some important enzymes responsible for the formation of natural products are
1144:
Gomez-Escribano JP, Bibb MJ (February 2014). "Heterologous expression of natural product biosynthetic gene clusters in
Streptomyces coelicolor: from genome mining to manipulation of biosynthetic pathways".
96:
PRISM (Prediction
Informatics for Secondary Metabolites) is a combinatorial approach to chemical structure prediction for genetically encoded nonribosomal peptides and type I and II polyketides.
27:
describes the exploitation of genomic information for the discovery of biosynthetic pathways of natural products and their possible interactions. It depends on computational technology and
191:
MIBiG (Minimum
Information about a Biosynthetic Gene cluster specification) provides a standard for annotations and metadata on biosynthetic gene clusters and their molecular products.
715:"antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences"
182:
AntiSMASH-DB allows comparing the sequences of newly sequenced BGCs against those of previously predicted and experimentally characterized ones.
122:
Genome mining applies on the discovery of natural product by facilitating the characterization of novel molecules and biosynthetic pathways.
958:
Rutledge PJ, Challis GL (August 2015). "Discovery of microbial natural products by activation of silent biosynthetic gene clusters".
538:"Genome sequence of an industrial microorganism Streptomyces avermitilis: deducing the ability of producing secondary metabolites"
1558:
194:
Interactive tree of life (iTOL) is a web-based tool for the display, manipulation and annotation of phylogenetic trees.
1058:
Hoffmeister D, Keller NP (April 2007). "Natural products of filamentous fungi: enzymes, genes, and their regulation".
778:"Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences"
314:"Mini review: Genome mining approaches for the identification of secondary metabolite biosynthetic gene clusters in
1505:"Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees"
76:. Subsequently, many of these genomes have been carefully studied to identify new genes and biosynthetic pathways.
776:
Skinnider MA, Johnston CW, Gunabalasingam M, Merwin NJ, Kieliszek AM, MacLellan RJ, et al. (November 2020).
93:
AntiSMASH (Antibiotics and
Secondary Metabolite Analysis Shell) addresses secondary metabolite genome pipelines.
1440:
Kautsar SA, Blin K, Shaw S, Navarro-Muñoz JC, Terlouw BR, van der Hooft JJ, et al. (January 2020).
874:
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (October 1990). "Basic local alignment search tool".
143:
110:
104:
597:"Identification of Thiotetronic Acid Antibiotic Biosynthetic Pathways by Target-directed Genome Mining"
59:
In the mid- to late 1980s, researchers have increasingly focused on genetic studies with the advancing
1014:
789:
549:
412:
69:
1188:
Sayers EW, Cavanaugh M, Clark K, Pruitt KD, Schoch CL, Sherry ST, Karsch-Mizrachi I (January 2021).
713:
Medema MH, Blin K, Cimermancic P, de Jager V, Zakrzewski P, Fischbach MA, et al. (July 2011).
177:
44:
1170:
983:
1095:"Genome mining for natural product biosynthetic gene clusters in the Subsection V cyanobacteria"
595:
Tang X, Li J, Millán-Aguiñaga N, Zhang JJ, O'Neill EC, Ugalde JA, et al. (December 2015).
263:
Hannigan GD, Prihoda D, Palicka A, Soukup J, Klempir O, Rampula L, et al. (October 2019).
1568:
1534:
1471:
1408:
1377:
Ichikawa N, Sasagawa M, Yamamoto M, Komaki H, Yoshida Y, Yamazaki S, Fujita N (January 2013).
1345:
1282:
1251:
Palaniappan K, Chen IA, Chu K, Ratner A, Seshadri R, Kyrpides NC, et al. (January 2020).
1219:
1162:
1126:
1075:
1040:
975:
940:
891:
856:
815:
744:
678:
626:
577:
536:
Omura S, Ikeda H, Ishikawa J, Hanamoto A, Takahashi C, Shinose M, et al. (October 2001).
518:
477:
428:
385:
347:
294:
245:
155:
85:
403:
Bains W, Smith GC (December 1988). "A novel method for nucleic acid sequence determination".
1524:
1516:
1461:
1453:
1398:
1390:
1335:
1327:
1272:
1264:
1209:
1201:
1154:
1116:
1106:
1067:
1030:
1022:
967:
930:
922:
883:
846:
805:
797:
734:
726:
668:
660:
616:
608:
567:
557:
508:
467:
459:
420:
377:
337:
329:
284:
276:
235:
225:
1003:"Genome mining of biosynthetic and chemotherapeutic gene clusters in Streptomyces bacteria"
131:
48:
1018:
793:
553:
463:
416:
1529:
1504:
1466:
1441:
1403:
1378:
1363:
1340:
1315:
1277:
1253:"IMG-ABC v.5.0: an update to the IMG/Atlas of Biosynthetic Gene Clusters Knowledgebase"
1252:
1214:
1189:
1121:
1094:
1035:
1002:
935:
910:
810:
777:
739:
714:
673:
648:
621:
596:
472:
447:
342:
313:
289:
264:
240:
213:
60:
32:
28:
887:
424:
84:
As large quantities of genomic sequence data began to accumulate in public databases,
72:
was completed in the early 2000, researchers have been sequencing the genomes of many
1552:
572:
537:
73:
1174:
851:
834:
664:
1563:
987:
135:
1093:
Micallef ML, D'Agostino PM, Sharma D, Viswanathan R, Moffitt MC (September 2015).
699:
265:"A deep learning genome-mining strategy for biosynthetic gene cluster prediction"
113:(Basic local alignment search tool) is an approach for rapid sequence comparison.
1237:
1026:
926:
801:
612:
542:
Proceedings of the
National Academy of Sciences of the United States of America
333:
1158:
1111:
139:
368:
Challis GL (May 2008). "Genome mining for novel natural product discovery".
151:
40:
16:
1538:
1475:
1412:
1349:
1286:
1223:
1205:
1166:
1130:
1079:
1044:
979:
944:
860:
819:
748:
682:
630:
581:
562:
522:
481:
389:
351:
298:
249:
188:
DoBISCUIT is a database of secondary metabolite biosynthetic gene clusters.
1442:"MIBiG 2.0: a repository for biosynthetic gene clusters of known function"
1426:
1394:
1379:"DoBISCUIT: a database of secondary metabolite biosynthetic gene clusters"
1331:
895:
432:
31:
tools. The mining process relies on a huge amount of data (represented by
1520:
1457:
1268:
730:
280:
36:
971:
513:
496:
230:
171:
64:
649:"Data structures and compression algorithms for genomic sequence data"
642:
640:
381:
43:, the data can be used to generate new knowledge in several areas of
1071:
1300:
835:"Confirmation of data mining based predictions of protein function"
100:
15:
911:"Mining genomes to illuminate the specialized chemistry of life"
1314:
Kautsar SA, Blin K, Shaw S, Weber T, Medema MH (January 2021).
20:
Genome mining is associated with bioinformatics investigations.
312:
Lee N, Hwang S, Kim J, Cho S, Palsson B, Cho BK (2020-01-01).
214:"Genome Mining as New Challenge in Natural Products Discovery"
99:
SIM (Statistically based sequence similarity) method, such as
212:
Albarano L, Esposito R, Ruocco N, Costantini M (April 2020).
762:
1316:"BiG-FAM: the biosynthetic gene cluster families database"
1001:
Belknap KC, Park CJ, Barth BM, Andam CP (February 2020).
1489:
185:
BIG-FAM is a biosynthetic gene cluster family database.
497:"The evolution of genome mining in microbes - a review"
1147:
Journal of
Industrial Microbiology & Biotechnology
148:
ribosomally and post-translationally modified peptides
322:Computational and Structural Biotechnology Journal
909:Medema MH, de Rond T, Moore BS (September 2021).
174:database provides genomic datasets for analysis.
694:
692:
495:Ziemert N, Alanjary M, Weber T (August 2016).
647:Brandon MC, Wallace DC, Baldi P (July 2009).
8:
452:Annual Review of Genomics and Human Genetics
1528:
1465:
1402:
1339:
1276:
1213:
1120:
1110:
1034:
934:
850:
809:
738:
672:
620:
571:
561:
512:
471:
341:
288:
239:
229:
448:"Patents in genomics and human genetics"
204:
35:and annotations) accessible in genomic
833:King RD, Wise PH, Clare A (May 2004).
446:Cook-Deegan R, Heaney C (2010-09-01).
7:
363:
361:
464:10.1146/annurev-genom-082509-141811
14:
134:is regulated by the biosynthetic
1503:Letunic I, Bork P (July 2016).
725:(Web Server issue): W339–W346.
405:Journal of Theoretical Biology
370:Journal of Medicinal Chemistry
1:
1389:(Database issue): D408–D414.
888:10.1016/S0022-2836(05)80360-2
852:10.1093/bioinformatics/bth047
665:10.1093/bioinformatics/btp319
425:10.1016/S0022-5193(88)80246-7
107:, infer orthologous homology.
960:Nature Reviews. Microbiology
876:Journal of Molecular Biology
47:, such as discovering novel
1585:
1027:10.1038/s41598-020-58904-9
927:10.1038/s41576-021-00363-7
802:10.1038/s41467-020-19986-1
613:10.1021/acschembio.5b00658
334:10.1016/j.csbj.2020.06.024
39:. By applying data mining
1159:10.1007/s10295-013-1348-5
1112:10.1186/s12864-015-1855-z
126:Natural product discovery
915:Nature Reviews. Genetics
1060:Natural Product Reports
501:Natural Product Reports
61:sequencing technologies
1509:Nucleic Acids Research
1446:Nucleic Acids Research
1383:Nucleic Acids Research
1320:Nucleic Acids Research
1257:Nucleic Acids Research
1194:Nucleic Acids Research
719:Nucleic Acids Research
563:10.1073/pnas.211433198
269:Nucleic Acids Research
21:
782:Nature Communications
765:. Adapsyn Bioscience.
144:non-ribosomal peptide
19:
1206:10.1093/nar/gkaa1023
601:ACS Chemical Biology
70:Human Genome Project
1559:Medicinal chemistry
1395:10.1093/nar/gks1177
1332:10.1093/nar/gkaa812
1019:2020NatSR..10.2003B
972:10.1038/nrmicro3496
794:2020NatCo..11.6058S
554:2001PNAS...9812215O
548:(21): 12215–12220.
417:1988JThBi.135..303B
178:UCSC Genome Browser
162:Databases and tools
45:medicinal chemistry
1521:10.1093/nar/gkw290
1458:10.1093/nar/gkz882
1269:10.1093/nar/gkz932
1007:Scientific Reports
731:10.1093/nar/gkr466
514:10.1039/C6NP00025H
281:10.1093/nar/gkz654
231:10.3390/md18040199
146:synthases (NRPS),
130:The production of
86:genetic algorithms
22:
1515:(W1): W242–W245.
1452:(D1): D454–D458.
1326:(D1): D490–D497.
1263:(D1): D422–D430.
659:(14): 1731–1738.
607:(12): 2841–2849.
382:10.1021/jm700948z
156:molecular cloning
142:synthases (PKS),
1576:
1543:
1542:
1532:
1500:
1494:
1493:
1486:
1480:
1479:
1469:
1437:
1431:
1430:
1423:
1417:
1416:
1406:
1374:
1368:
1367:
1360:
1354:
1353:
1343:
1311:
1305:
1304:
1297:
1291:
1290:
1280:
1248:
1242:
1241:
1234:
1228:
1227:
1217:
1185:
1179:
1178:
1141:
1135:
1134:
1124:
1114:
1090:
1084:
1083:
1072:10.1039/B603084J
1055:
1049:
1048:
1038:
998:
992:
991:
955:
949:
948:
938:
906:
900:
899:
871:
865:
864:
854:
845:(7): 1110–1118.
830:
824:
823:
813:
773:
767:
766:
759:
753:
752:
742:
710:
704:
703:
696:
687:
686:
676:
644:
635:
634:
624:
592:
586:
585:
575:
565:
533:
527:
526:
516:
492:
486:
485:
475:
443:
437:
436:
400:
394:
393:
376:(9): 2618–2628.
365:
356:
355:
345:
309:
303:
302:
292:
260:
254:
253:
243:
233:
209:
132:natural products
65:GenBank database
49:natural products
1584:
1583:
1579:
1578:
1577:
1575:
1574:
1573:
1549:
1548:
1547:
1546:
1502:
1501:
1497:
1488:
1487:
1483:
1439:
1438:
1434:
1425:
1424:
1420:
1376:
1375:
1371:
1362:
1361:
1357:
1313:
1312:
1308:
1299:
1298:
1294:
1250:
1249:
1245:
1236:
1235:
1231:
1200:(D1): D92–D96.
1187:
1186:
1182:
1143:
1142:
1138:
1092:
1091:
1087:
1057:
1056:
1052:
1000:
999:
995:
957:
956:
952:
908:
907:
903:
873:
872:
868:
832:
831:
827:
775:
774:
770:
761:
760:
756:
712:
711:
707:
698:
697:
690:
646:
645:
638:
594:
593:
589:
535:
534:
530:
507:(8): 988–1005.
494:
493:
489:
445:
444:
440:
402:
401:
397:
367:
366:
359:
311:
310:
306:
262:
261:
257:
211:
210:
206:
201:
164:
128:
120:
82:
57:
12:
11:
5:
1582:
1580:
1572:
1571:
1566:
1561:
1551:
1550:
1545:
1544:
1495:
1481:
1432:
1418:
1369:
1355:
1306:
1292:
1243:
1229:
1180:
1153:(2): 425–431.
1136:
1085:
1066:(2): 393–416.
1050:
993:
966:(8): 509–523.
950:
921:(9): 553–571.
901:
882:(3): 403–410.
866:
839:Bioinformatics
825:
768:
754:
705:
700:"AntiSMASH-DB"
688:
653:Bioinformatics
636:
587:
528:
487:
458:(1): 383–425.
438:
411:(3): 303–307.
395:
357:
304:
255:
203:
202:
200:
197:
196:
195:
192:
189:
186:
183:
180:
175:
163:
160:
127:
124:
119:
116:
115:
114:
108:
97:
94:
81:
78:
74:microorganisms
56:
53:
29:bioinformatics
13:
10:
9:
6:
4:
3:
2:
1581:
1570:
1567:
1565:
1562:
1560:
1557:
1556:
1554:
1540:
1536:
1531:
1526:
1522:
1518:
1514:
1510:
1506:
1499:
1496:
1491:
1485:
1482:
1477:
1473:
1468:
1463:
1459:
1455:
1451:
1447:
1443:
1436:
1433:
1428:
1422:
1419:
1414:
1410:
1405:
1400:
1396:
1392:
1388:
1384:
1380:
1373:
1370:
1365:
1359:
1356:
1351:
1347:
1342:
1337:
1333:
1329:
1325:
1321:
1317:
1310:
1307:
1302:
1296:
1293:
1288:
1284:
1279:
1274:
1270:
1266:
1262:
1258:
1254:
1247:
1244:
1239:
1233:
1230:
1225:
1221:
1216:
1211:
1207:
1203:
1199:
1195:
1191:
1184:
1181:
1176:
1172:
1168:
1164:
1160:
1156:
1152:
1148:
1140:
1137:
1132:
1128:
1123:
1118:
1113:
1108:
1104:
1100:
1096:
1089:
1086:
1081:
1077:
1073:
1069:
1065:
1061:
1054:
1051:
1046:
1042:
1037:
1032:
1028:
1024:
1020:
1016:
1012:
1008:
1004:
997:
994:
989:
985:
981:
977:
973:
969:
965:
961:
954:
951:
946:
942:
937:
932:
928:
924:
920:
916:
912:
905:
902:
897:
893:
889:
885:
881:
877:
870:
867:
862:
858:
853:
848:
844:
840:
836:
829:
826:
821:
817:
812:
807:
803:
799:
795:
791:
787:
783:
779:
772:
769:
764:
758:
755:
750:
746:
741:
736:
732:
728:
724:
720:
716:
709:
706:
701:
695:
693:
689:
684:
680:
675:
670:
666:
662:
658:
654:
650:
643:
641:
637:
632:
628:
623:
618:
614:
610:
606:
602:
598:
591:
588:
583:
579:
574:
569:
564:
559:
555:
551:
547:
543:
539:
532:
529:
524:
520:
515:
510:
506:
502:
498:
491:
488:
483:
479:
474:
469:
465:
461:
457:
453:
449:
442:
439:
434:
430:
426:
422:
418:
414:
410:
406:
399:
396:
391:
387:
383:
379:
375:
371:
364:
362:
358:
353:
349:
344:
339:
335:
331:
328:: 1548–1556.
327:
323:
319:
317:
308:
305:
300:
296:
291:
286:
282:
278:
274:
270:
266:
259:
256:
251:
247:
242:
237:
232:
227:
223:
219:
215:
208:
205:
198:
193:
190:
187:
184:
181:
179:
176:
173:
170:
169:
168:
161:
159:
157:
153:
150:(RiPPs), and
149:
145:
141:
137:
136:gene clusters
133:
125:
123:
117:
112:
109:
106:
102:
98:
95:
92:
91:
90:
87:
79:
77:
75:
71:
66:
62:
54:
52:
50:
46:
42:
38:
34:
33:DNA sequences
30:
26:
25:Genome mining
18:
1512:
1508:
1498:
1484:
1449:
1445:
1435:
1421:
1386:
1382:
1372:
1358:
1323:
1319:
1309:
1295:
1260:
1256:
1246:
1232:
1197:
1193:
1183:
1150:
1146:
1139:
1102:
1099:BMC Genomics
1098:
1088:
1063:
1059:
1053:
1010:
1006:
996:
963:
959:
953:
918:
914:
904:
879:
875:
869:
842:
838:
828:
785:
781:
771:
757:
722:
718:
708:
656:
652:
604:
600:
590:
545:
541:
531:
504:
500:
490:
455:
451:
441:
408:
404:
398:
373:
369:
325:
321:
316:Streptomyces
315:
307:
275:(18): e110.
272:
268:
258:
221:
218:Marine Drugs
217:
207:
165:
129:
121:
118:Applications
83:
58:
24:
23:
1364:"DoBISCUIT"
1013:(1): 2003.
788:(1): 6058.
1553:Categories
1105:(1): 669.
224:(4): 199.
199:References
152:terpenoids
140:polyketide
80:Algorithms
41:algorithms
1301:"BIG-FAM"
1238:"IMG-ABC"
1190:"GenBank"
105:PSI-BLAST
37:databases
1569:Genomics
1539:27095192
1476:31612915
1413:23185043
1350:33010170
1287:31665416
1224:33196830
1175:15215660
1167:24096958
1131:26335778
1080:17390002
1045:32029878
980:26119570
945:34083778
861:14764546
820:33247171
749:21672958
683:19447783
631:26458099
582:11572948
523:27272205
482:20590431
390:18393407
352:32637051
299:31400112
250:32283638
1530:4987883
1467:7145714
1427:"MIBiG"
1404:3531092
1341:7778980
1278:7145673
1215:7778897
1122:4558948
1036:7005152
1015:Bibcode
988:6474118
936:8364890
896:2231712
811:7699628
790:Bibcode
763:"PRISM"
740:3125804
674:2705231
622:4758359
550:Bibcode
473:2935940
433:3256722
413:Bibcode
343:7327026
290:6765103
241:7230286
172:GenBank
55:History
1537:
1527:
1490:"iTOL"
1474:
1464:
1411:
1401:
1348:
1338:
1285:
1275:
1222:
1212:
1173:
1165:
1129:
1119:
1078:
1043:
1033:
986:
978:
943:
933:
894:
859:
818:
808:
747:
737:
681:
671:
629:
619:
580:
570:
521:
480:
470:
431:
388:
350:
340:
297:
287:
248:
238:
63:. The
1171:S2CID
984:S2CID
573:59794
111:BLAST
101:FASTA
1535:PMID
1472:PMID
1409:PMID
1346:PMID
1283:PMID
1220:PMID
1163:PMID
1127:PMID
1076:PMID
1041:PMID
976:PMID
941:PMID
892:PMID
857:PMID
816:PMID
745:PMID
679:PMID
627:PMID
578:PMID
519:PMID
478:PMID
429:PMID
386:PMID
348:PMID
295:PMID
246:PMID
1564:DNA
1525:PMC
1517:doi
1462:PMC
1454:doi
1399:PMC
1391:doi
1336:PMC
1328:doi
1273:PMC
1265:doi
1210:PMC
1202:doi
1155:doi
1117:PMC
1107:doi
1068:doi
1031:PMC
1023:doi
968:doi
931:PMC
923:doi
884:doi
880:215
847:doi
806:PMC
798:doi
735:PMC
727:doi
669:PMC
661:doi
617:PMC
609:doi
568:PMC
558:doi
509:doi
468:PMC
460:doi
421:doi
409:135
378:doi
338:PMC
330:doi
285:PMC
277:doi
236:PMC
226:doi
103:or
1555::
1533:.
1523:.
1513:44
1511:.
1507:.
1470:.
1460:.
1450:48
1448:.
1444:.
1407:.
1397:.
1387:41
1385:.
1381:.
1344:.
1334:.
1324:49
1322:.
1318:.
1281:.
1271:.
1261:48
1259:.
1255:.
1218:.
1208:.
1198:49
1196:.
1192:.
1169:.
1161:.
1151:41
1149:.
1125:.
1115:.
1103:16
1101:.
1097:.
1074:.
1064:24
1062:.
1039:.
1029:.
1021:.
1011:10
1009:.
1005:.
982:.
974:.
964:13
962:.
939:.
929:.
919:22
917:.
913:.
890:.
878:.
855:.
843:20
841:.
837:.
814:.
804:.
796:.
786:11
784:.
780:.
743:.
733:.
723:39
721:.
717:.
691:^
677:.
667:.
657:25
655:.
651:.
639:^
625:.
615:.
605:10
603:.
599:.
576:.
566:.
556:.
546:98
544:.
540:.
517:.
505:33
503:.
499:.
476:.
466:.
456:11
454:.
450:.
427:.
419:.
407:.
384:.
374:51
372:.
360:^
346:.
336:.
326:18
324:.
320:.
293:.
283:.
273:47
271:.
267:.
244:.
234:.
222:18
220:.
216:.
158:.
51:.
1541:.
1519::
1492:.
1478:.
1456::
1429:.
1415:.
1393::
1366:.
1352:.
1330::
1303:.
1289:.
1267::
1240:.
1226:.
1204::
1177:.
1157::
1133:.
1109::
1082:.
1070::
1047:.
1025::
1017::
990:.
970::
947:.
925::
898:.
886::
863:.
849::
822:.
800::
792::
751:.
729::
702:.
685:.
663::
633:.
611::
584:.
560::
552::
525:.
511::
484:.
462::
435:.
423::
415::
392:.
380::
354:.
332::
318:"
301:.
279::
252:.
228::
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.