Knowledge (XXG)

Combrex

Source 📝

302:
ranking of all proteins in a gene cluster in terms of their distance to other proteins. In the simplest case proteins near the center of a cluster are judged to be most informative because their distance to the other proteins in the cluster is relatively small. As a result, functional annotation of a "center" of a cluster is likely to result in the most accurate predictions for the other proteins in the cluster. In evolutionary terms these "cluster centers" are closest to the evolutionary ancestor of all the proteins in the cluster. Active Learning generalizes this intuition principle to produce recommendations for additional experiments that are likely to either produce accurate predictions or identify proteins that are not annotated correctly.
371:
their validated function results in a large number of new predictions for other genes. At an early stage COMBREX will introduce lists of "high priority" genes, which may be identified as being of significant predictive or biomedical value, and to which COMBREX members may nominate candidates. As a longer-term goal, COMBREX is working towards the use of machine learning techniques such as active learning to optimize the selection of such genes.
315:
experimental effort serves three goals: (1) it brings together directly the scientists who make gene function predictions and those who test them, (2) it evaluates computational methods based on how accurate their predictions are so they can be improved, and (3) it broadens the landscape of experimentally validated genes, improving our overall understanding of biology and of sequence-structure-function relationships.
289:
experimental or computational biologists well acquainted with a particular protein family or biochemical pathway. Thus, predictions made by different methods may be easily compared, contrasted, and examined by experimental biologists. This side-by-side display of function predictions from many sources is the heart of the interaction between computational and experimental communities that COMBREX hopes to foster.
354:
lend themselves to experimental testing, and those that are too specific for the underlying evidence run a high risk of being inaccurate. COMBREX is actively working on developing algorithms for functional prediction that can identify genes with novel or interesting functions, and whose results can sit beside the high-quality predictions received from collaborating computational groups. COMBREX's relatively
333:
method used to make the prediction and the strength of the evidence are rarely stated. COMBREX has taken the first steps toward a more transparent system of annotation by (1) color-coding genes to distinguish observed from predicted functions, and (2) for many functions predicted by sequence similarity, identifying the experimentally validated "source gene" on which the prediction was based.
22: 261:
COMBREX. The database presently consists of genes from over 1000 completely sequenced bacterial and archaeal genomes, supplemented with a number of individual genes whose biochemical function has been experimentally determined. The genes are organized into sequence-similar, and likely isofunctional, groups determined by NCBI, referred to as Protein Clusters.
268:. By necessity, "predicted functions" may encompass a broad range of specificity, and one of our longer range goals is to quantify this specificity. (For example, the predicted function "valine decarboxylase" is significantly more specific, and more readily verifiable, than "lyase", or even "carboxy-lyase".) 345:
may help overcome the relatively high frequency of unannotated and misannotated genes that results from the static system used in many public databases. Furthermore, it will illuminate those genes whose biochemical functions are truly unknown, as opposed to those that are simply insufficiently annotated.
370:
COMBREX wishes to develop a new, integrative model of research in which experiments are prioritized to close the largest gaps in our overall predictive understanding of gene function. Such a model favors the validation of genes that provide relatively large increases in knowledge, for example because
366:
Through its funding decisions, COMBREX can help broaden as well as deepen our understanding of biochemical gene function by encouraging experimental investigation of specific genes. The choice of which genes to validate is an important one: little overall new knowledge is gained by validating closely
340:
COMBREX is the first database that attempts to "computationally" identify the link to the experimental source of an annotation using homology. Other databases provide two types of evidence, e.g. inferred directly from experiments or inferred computationally. However, the inference cannot be typically
318:
The experimental investigation of the biochemical function of a single gene or small number of genes is often beneath the purview of large funding agencies. COMBREX is set up to issue small grants for exactly this type of work, and such grants are particularly suited for laboratories already familiar
344:
This system of identifying source genes and functions, and evidential links, will enable a dynamic system of annotation that is automatically updated as experimental evidence for new genes is determined, and as new predictive methods are developed. Such a dynamic system of gene functional annotation
305:
In addition to evolutionary analysis and Active Learning COMBREX also points to other criteria that might be considered in considering experiments. Such criteria include whether there is a structure available, conservation of the bacterial gene in the human genome (e.g. domain sharing), availability
246:
COMBREX is a multifaceted project that aims to bring together the computational and experimental communities of biologists in the interest of improving our understanding of microbial gene function and accelerating the annotation of microbial gene function. The COMBREX project was co-founded by Simon
396:
Roberts, Richard J; Chang Yi-Chien; Hu Zhenjun; Rachlin John N; Anton Brian P; Pokrzywa Revonda M; Choi Han-Pil; Faller Lina L; Guleria Jyotsna; Housman Genevieve; Klitgord Niels; Mazumdar Varun; McGettrick Mark G; Osmani Lais; Swaminathan Rajeswari; Tao Kevin R; Letovsky Stan; Vitkup Dennis; Segrè
379:
COMBREX encourages the development of new technologies and cost-effective assays for gene function determination. The experimental validation effort described above amounts to a massively parallel application of low-throughput experiments via many small-scale grants. High-throughput assays that can
353:
Making gene functional predictions transparent is important, but equally important is making them as accurate as possible. Predictions need to be commensurate with the strength of evidence for them, such that they are as specific as the evidence will allow. Those that are not specific enough do not
314:
One of the missions of COMBREX is to issue small monetary grants for the experimental validation of specific gene predictions. The experimental determination of biochemical function for specific gene products serves to validate (or invalidate) the computational predictions made a priori. Thus, this
284:
The COMBREX database serves as a venue for computational biologists to publicize their most informative gene function predictions. A major effort within the bio-informatics field has been the computational prediction of gene function. There have been significant advances in this field over the last
301:
methodologies to recommend the most informative experiments. These are experiments that are most likely to generate the most informative (in the mathematical sense of maximizing information gain) predictions for the largest number of proteins in the database. The most basic recommendations provide
288:
The COMBREX database, besides drawing information from familiar sources such as NCBI and UniProtKB, also displays gene function predictions submitted by individual laboratories. Such predictions may be generated in large scale using computational algorithms, or may be made for individual genes by
260:
This evolving database consists of experimentally determined and computationally predicted functions for more than three million microbial genes. Searching for a gene or genes of interest may be an end in itself, or it may be a first step toward contributing information to or seeking a grant from
332:
One of the current problems with gene and genome annotation is a lack of transparency with respect to source. It is often difficult to determine which functions have been determined experimentally and which are predicted computationally. Furthermore, for computationally predicted functions, the
336:
COMBREX is working towards a more completely traceable annotation system, in which every stated functional annotation is either experimentally determined, or is a prediction explicitly linked through a chain of evidence to an ultimate source of information. These sources will in many cases be
367:
similar relatives of isofunctional genes, and validating experiments for genes with no specific predicted functions are unlikely to succeed. Furthermore, the landscape of what is already known is uneven, with many validated examples of some functions and few or no examples of others.
285:
decade or so, but many of these efforts have not realized their full potential to advance biological knowledge due to the fact that predictions are rarely experimentally tested, and predicted functions for individual genes made by competing methods are rarely directly compared.
247:
Kasif, Richard Roberts and Martin Steffen as an international consortium with a headquarters at Boston University and over 100 experimental and computational collaborators. The project was inspired by a call for community action published in PLoS Biology by Richard J. Roberts.
271:
Identification of genes whose products have been experimentally verified is also not a trivial task, and so we have embarked on a project to create a comprehensive, manually curated set of all such genes, which we refer to as the
341:
traced to the experimental source of the annotation. COMBREX cannot guarantee that the "traces" it provides are accurate at this point but it enables biologists to make this determination directly by examining the link.
306:
of computational or experimental evidence of gene function, phenotypical considerations (such as presence in a pathogen or relation to antibiotic resistance, pathogenicity or virulence) and others.
380:
analyze many gene products in parallel may result in the determination of function for many genes simultaneously, and may help make large strides in our overall understanding of gene function.
264:
A color-coding system is used to identify which genes have experimentally determined functions, which have computationally predicted functions, and which have no known or predicted function
32: 337:
experimentally validated genes, but in some cases will be annotations from existing databases whose sources are themselves not immediately apparent.
298: 231: 276:. This curated set is at present unique to the COMBREX database, and genes belonging to it are color-coded with a gold symbol. 119: 47: 571: 90: 62: 69: 76: 576: 58: 227: 134: 178: 581: 548: 479: 428: 188: 168: 538: 528: 469: 459: 418: 410: 163: 474: 447: 423: 398: 543: 516: 565: 293:
Recommendation and prioritization of experiments based on Active Learning principles.
83: 355: 399:"COMBREX: a project to accelerate the functional annotation of prokaryotic genomes" 397:
Daniel; Salzberg Steven L; Delisi Charles; Steffen Martin; Kasif Simon (Jan 2011).
533: 464: 265: 21: 235: 151: 138: 273: 552: 483: 432: 192: 414: 226:
is a multifaceted project that includes a database of gene annotations,
146: 118: 319:
with the types of assays required for the intended experiments.
448:"The COMBREX project: design, methodology, and initial results" 15: 297:
COMBREX uses simple principles as well as more sophisticated
517:"Identifying protein function--a call for community action" 501: 215: 39: 43: 356:
conservative BLAST-based propagation of gene function
310:
Grants for the Biochemical Characterization of genes
211: 206: 198: 184: 174: 162: 157: 145: 130: 125: 358:represents a simple first step towards this goal. 234:principles associated with millions of genes in 112:COMBREX: COMputational BRidges to EXperiments 8: 111: 48:introducing citations to additional sources 110: 542: 532: 473: 463: 422: 38:Relevant discussion may be found on the 507: 388: 7: 409:(Database issue). England: D11–4. 14: 256:A Database of genes and functions 362:Targeted experimental validation 117: 31:relies largely or entirely on a 20: 446:Anton, B.; et al. (2013). 1: 230:and recommendations based on 534:10.1371/journal.pbio.0020042 465:10.1371/journal.pbio.1001638 349:Improved predictive accuracy 280:Predictions of Gene Function 274:Gold Standard Gene Database 598: 515:Roberts, Richard (2004). 116: 328:Improved gene annotation 228:functional predictions 502:http://combrex.bu.edu 216:http://combrex.bu.edu 135:functional annotation 572:Biological databases 44:improve this article 415:10.1093/nar/gkq1168 236:prokaryotic genomes 139:prokaryotic genomes 113: 403:Nucleic Acids Res 221: 220: 179:Richard J Roberts 169:Boston University 109: 108: 94: 589: 557: 556: 546: 536: 512: 488: 487: 477: 467: 443: 437: 436: 426: 393: 375:New technologies 185:Primary citation 121: 114: 104: 101: 95: 93: 52: 24: 16: 597: 596: 592: 591: 590: 588: 587: 586: 562: 561: 560: 514: 513: 509: 498: 491: 458:(8): e1001638. 445: 444: 440: 395: 394: 390: 386: 377: 364: 351: 330: 325: 312: 299:Active Learning 295: 282: 258: 253: 244: 232:Active Learning 164:Research center 105: 99: 96: 53: 51: 37: 25: 12: 11: 5: 595: 593: 585: 584: 579: 574: 564: 563: 559: 558: 506: 505: 504: 497: 496:External links 494: 490: 489: 438: 387: 385: 382: 376: 373: 363: 360: 350: 347: 329: 326: 324: 321: 311: 308: 294: 291: 281: 278: 257: 254: 252: 249: 243: 240: 219: 218: 213: 209: 208: 204: 203: 200: 196: 195: 186: 182: 181: 176: 172: 171: 166: 160: 159: 155: 154: 149: 143: 142: 132: 128: 127: 123: 122: 107: 106: 42:. Please help 28: 26: 19: 13: 10: 9: 6: 4: 3: 2: 594: 583: 580: 578: 575: 573: 570: 569: 567: 554: 550: 545: 540: 535: 530: 526: 522: 518: 511: 508: 503: 500: 499: 495: 493: 485: 481: 476: 471: 466: 461: 457: 453: 449: 442: 439: 434: 430: 425: 420: 416: 412: 408: 404: 400: 392: 389: 383: 381: 374: 372: 368: 361: 359: 357: 348: 346: 342: 338: 334: 327: 322: 320: 316: 309: 307: 303: 300: 292: 290: 286: 279: 277: 275: 269: 267: 262: 255: 250: 248: 241: 239: 237: 233: 229: 225: 217: 214: 210: 205: 201: 197: 194: 190: 187: 183: 180: 177: 173: 170: 167: 165: 161: 156: 153: 150: 148: 144: 140: 136: 133: 129: 124: 120: 115: 103: 92: 89: 85: 82: 78: 75: 71: 68: 64: 61: –  60: 56: 55:Find sources: 49: 45: 41: 35: 34: 33:single source 29:This article 27: 23: 18: 17: 524: 520: 510: 492: 455: 451: 441: 406: 402: 391: 378: 369: 365: 352: 343: 339: 335: 331: 317: 313: 304: 296: 287: 283: 270: 263: 259: 245: 223: 222: 199:Release date 97: 87: 80: 73: 66: 54: 30: 577:Prokaryotes 152:Prokaryotes 131:Description 566:Categories 527:(3): E42. 384:References 70:newspapers 521:PLOS Biol 452:PLOS Biol 147:Organisms 59:"Combrex" 40:talk page 582:Genomics 553:15024411 484:24013487 433:21097892 193:21097892 100:May 2020 475:3754883 424:3013729 251:Content 224:COMBREX 212:Website 175:Authors 158:Contact 126:Content 84:scholar 551:  544:368155 541:  482:  472:  431:  421:  266:(info) 207:Access 191:  86:  79:  72:  65:  57:  323:Goals 242:About 91:JSTOR 77:books 549:PMID 480:PMID 429:PMID 202:2010 189:PMID 63:news 539:PMC 529:doi 470:PMC 460:doi 419:PMC 411:doi 137:of 46:by 568:: 547:. 537:. 523:. 519:. 478:. 468:. 456:11 454:. 450:. 427:. 417:. 407:39 405:. 401:. 238:. 555:. 531:: 525:2 486:. 462:: 435:. 413:: 141:. 102:) 98:( 88:· 81:· 74:· 67:· 50:. 36:.

Index


single source
talk page
improve this article
introducing citations to additional sources
"Combrex"
news
newspapers
books
scholar
JSTOR

functional annotation
prokaryotic genomes
Organisms
Prokaryotes
Research center
Boston University
Richard J Roberts
PMID
21097892
http://combrex.bu.edu
functional predictions
Active Learning
prokaryotic genomes
(info)
Gold Standard Gene Database
Active Learning
conservative BLAST-based propagation of gene function
"COMBREX: a project to accelerate the functional annotation of prokaryotic genomes"

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.