Knowledge (XXG)

Phrap

Source 📝

103:). The logic is simple: a base call with a high probability of being correct should never be aligned with another high quality but different base. However, Phrap does not rule out such alignments entirely, and the cross_match alignment gap and alignment penalties used while looking for local alignments are not always optimal for typical sequencing errors and a search for overlapping (contiguous) sequences. (Affine gaps are helpful for homology searches but not usually for sequencing error alignment). Phrap attempts to classify chimeras, vector sequences and low quality end regions all in a single alignment and will sometimes make mistakes. Furthermore, Phrap has more than one round of assembly building internally and later rounds are less stringent - Greedy algorithm. 131:: to determine the correct consensus sequence at all positions where the assembled sequences had discrepant bases. This approach had been suggested by Bonfield and Staden in 1995, and was implemented and further optimized in Phrap. Basically, at any consensus position with discrepant bases, Phrap examines the quality scores of the aligned sequences to find the highest quality sequence. In the process, Phrap takes confirmation of local sequence by other reads into account, after considering direction and sequencing chemistry. 99:. Phrap uses quality scores to tell if any observed differences in repeated regions are likely to be due to random ambiguities in the sequencing process, or more likely to be due to the sequences being from different copies of the Alu repeat. Typically, Phrap had no problems differentiating between the different Alu copies in a cosmid, and to correctly assemble the cosmids (or, later, 154:. Phred and Phrap, and similar programs who picked up on the ideas pioneered by these two programs, enabled the assembly of large parts of the human genome (and many other genomes) at an accuracy that was substantially higher (less than 1 error in 10,000 bases) than the typical accuracy of carefully hand-edited sequences that had been submitted to the 110:) but are less so now. Phrap appears error prone in comparison with newer assemblers like Euler and cannot use mate-pair information directly to guide assembly and assemble past perfect repeats. Phrap is not free software so it has not been extended and enhanced like less restricted open-source software 143:
regions that are not covered by high quality sequence (which will also have low quality), and (b) to quickly calculate a reasonably accurate estimate of the error rate of the consensus sequence. This information can then be used to direct finishing efforts, for example re-sequencing of problem regions.
138:
are logarithmically linked to error probabilities. This means that the quality scores of confirming reads can simply be added, as long as the error distributions are sufficiently independent. To satisfy this independence criterion, reads must typically be in different direction, since peak patterns
142:
If a consensus base is covered by both high-quality sequence and (discrepant) low-quality sequence, Phrap's selection of the higher quality sequence will in most cases be correct. Phrap then assigns the confirmed base quality to the consensus sequence base. This makes it easy to (a) find consensus
126:
by Phrap that contributed to the program's success was the determination of consensus sequences using sequence qualities. In effect, Phrap automated a step that was a major bottleneck in the early phases of the
54:
Phrap was written as a command line program for easy integration into automated data workflows in genome sequencing centers. For users who want to use Phrap from a graphical interface, the commercial programs
173:
Bonfield JK, Staden R (1995): The application of numerical estimates of base calling accuracy to DNA sequencing projects. Nucleic Acids Res. 1995 Apr 25;23(8):1406-10.
189:
Krawetz SA (1989): Sequence errors described in GenBank: a means to determine the accuracy of DNA sequence interpretation. Nucleic Acids Res. 1989 May 25;17(10):3951-7
107: 51:. Phrap has been widely used for many different sequence assembly projects, including bacterial genome assemblies and EST assemblies. 236: 100: 241: 91:. Phrap used quality scores to mitigate a problem that other assembly programs had struggled with at the beginning of the 139:
that cause base calling errors are often identical when a region is sequenced several times in the same direction.
151: 128: 92: 48: 147: 135: 123: 88: 40: 213: 83:
A detailed (albeit partially outdated) description of the Phrap algorithms can be found in the
174: 111: 72: 64: 20: 24: 106:
These design choices were helpful in the 1990s when the program was originally written (at
201: 230: 150:
and a quality-based consensus sequence was a critical element in the success of the
96: 84: 56: 178: 68: 60: 155: 218: 44: 28: 95:: correctly assembling frequent imperfect repeats, in particular 87:. A recurring thread within the Phrap algorithms is the use of 134:
The mathematics of this approach were rather simple, since
47:
in large-scale cosmid shotgun sequencing within the
8: 146:The combination of accurate, base-specific 39:Phrap was originally developed by Prof. 166: 7: 108:Washington University in St. Louis 14: 118:Quality based consensus sequences 19:is a widely used program for DNA 1: 223:DNA Baser Command Line Tool 258: 237:Bioinformatics software 242:Computational science 152:Human Genome Project 136:Phred quality scores 129:Human Genome Project 124:Phred quality scores 93:Human Genome Project 89:Phred quality scores 49:Human Genome Project 43:for the assembly of 23:. It is part of the 85:Phrap documentation 158:database before. 112:Sequence assembly 75:) are available. 73:Microsoft Windows 65:CodonCode Aligner 21:sequence assembly 249: 190: 187: 181: 171: 257: 256: 252: 251: 250: 248: 247: 246: 227: 226: 210: 198: 193: 188: 184: 172: 168: 164: 122:Another use of 120: 81: 37: 12: 11: 5: 255: 253: 245: 244: 239: 229: 228: 225: 224: 221: 216: 209: 208:Other Software 206: 205: 204: 202:Phrap homepage 197: 196:External links 194: 192: 191: 182: 165: 163: 160: 148:quality scores 119: 116: 80: 77: 36: 33: 13: 10: 9: 6: 4: 3: 2: 254: 243: 240: 238: 235: 234: 232: 222: 220: 217: 215: 212: 211: 207: 203: 200: 199: 195: 186: 183: 180: 176: 170: 167: 161: 159: 157: 153: 149: 144: 140: 137: 132: 130: 125: 117: 115: 113: 109: 104: 102: 98: 97:Alu sequences 94: 90: 86: 78: 76: 74: 70: 66: 62: 58: 52: 50: 46: 42: 34: 32: 30: 26: 22: 18: 185: 169: 145: 141: 133: 121: 105: 82: 53: 38: 16: 15: 231:Categories 162:References 63:only) and 41:Phil Green 57:MacVector 31:package. 69:Mac OS X 61:Mac OS X 179:7753633 156:GenBank 79:Methods 45:cosmids 35:History 27:-Phrap- 219:Consed 177:  29:Consed 214:Phred 67:(for 59:(for 25:Phred 17:Phrap 175:PMID 101:BACs 71:and 233:: 114:.

Index

sequence assembly
Phred
Consed
Phil Green
cosmids
Human Genome Project
MacVector
Mac OS X
CodonCode Aligner
Mac OS X
Microsoft Windows
Phrap documentation
Phred quality scores
Human Genome Project
Alu sequences
BACs
Washington University in St. Louis
Sequence assembly
Phred quality scores
Human Genome Project
Phred quality scores
quality scores
Human Genome Project
GenBank
PMID
7753633
Phrap homepage
Phred
Consed
Categories

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.