Knowledge (XXG)

Prediction by partial matching

Source 📝

36: 1778: 1768: 481: 194:
technique. The underlying model used in most PPM algorithms can also be extended to predict multiple symbols. It is also possible to use non-Markov modeling to either replace or supplement Markov modeling. The symbol size is usually static, typically a single byte, which makes generic handling of any
126:
Predictions are usually reduced to symbol rankings. Each symbol (a letter, bit or any other amount of data) is ranked before it is compressed, and the ranking system determines the corresponding codeword (and therefore the compression rate). In many compression algorithms, the ranking is equivalent
173:
of one. A variant called PPMd increments the pseudocount of the "never-seen" symbol every time the "never-seen" symbol is used. (In other words, PPMd estimates the probability of a new symbol as the ratio of the number of unique symbols to the total number of symbols observed).
198:
Published research on this family of algorithms can be found as far back as the mid-1980s. Software implementations were not popular until the early 1990s because PPM algorithms require a significant amount of
157:
Much of the work in optimizing a PPM model is handling inputs that have not already occurred in the input stream. The obvious way to handle them is to create a "never-seen" symbol which triggers the
114:. PPM models use a set of previous symbols in the uncompressed symbol stream to predict the next symbol in the stream. PPM algorithms can also be used to cluster data into predicted groupings in 131:
the symbols are ranked by their probabilities to appear after previous symbols, and the whole sequence is compressed into a single fraction that is computed according to these probabilities.
214:
PPMd is a public domain implementation of PPMII (PPM with information inheritance) by Dmitry Shkarin which has undergone several incompatible revisions. It is used in the
602: 1270: 1081: 970: 154: âˆ’ 1 symbols. This process is repeated until a match is found or no more symbols remain in context. At that point a fixed prediction is made. 127:
to probability mass function estimation. Given the previous letters (or given a context), each symbol is assigned with a probability. For instance, in
784: 1812: 1476: 1299: 1093: 236:
A PPM algorithm, rather than being used for compression, is used to increase the efficiency of user input in the alternate input method program
1481: 1058: 1211: 560: 503: 57: 304: 272: 1588: 1326: 1265: 1076: 1026: 849: 694: 709: 595: 529: 79: 378: 1701: 1711: 1549: 1400: 1319: 1113: 551: 1684: 1304: 1098: 817: 1446: 774: 1781: 1771: 1674: 1216: 588: 764: 759: 1706: 511: 507: 491: 182:
PPM compression implementations vary greatly in other details. The actual symbol selection is usually recorded using
50: 44: 1633: 1471: 1451: 1395: 1053: 844: 647: 1807: 1716: 1657: 1583: 1431: 1021: 1016: 871: 789: 714: 61: 1721: 1294: 1088: 920: 689: 1662: 1033: 876: 672: 662: 270:
Cleary, J.; Witten, I. (April 1984). "Data Compression Using Adaptive Coding and Partial String Matching".
1287: 1038: 822: 667: 312: 280: 1691: 335: 1375: 837: 799: 620: 404: 204: 200: 317: 1606: 1456: 1441: 1410: 1405: 1314: 1221: 1123: 1108: 891: 285: 161:. But what probability should be assigned to a symbol that has never been seen? This is called the 1679: 1649: 1628: 1534: 1466: 1360: 1048: 864: 854: 749: 729: 724: 428: 394: 1260: 142:). Unbounded variants where the context has no length limitations also exist and are denoted as 1623: 1611: 1593: 1461: 1345: 1282: 1128: 1043: 999: 960: 642: 420: 355: 237: 223: 215: 183: 166: 128: 1598: 1554: 1527: 1522: 1497: 1380: 1365: 1275: 1184: 1179: 1154: 1008: 741: 719: 611: 412: 347: 322: 290: 208: 191: 187: 115: 107: 103: 385:
SchĂźrmann, T.; Grassberger, P. (September 1996). "Entropy estimation of symbol sequences".
1517: 1331: 1255: 1236: 1206: 1174: 1140: 699: 637: 555: 158: 17: 408: 162: 1309: 1103: 832: 827: 684: 657: 629: 249: 1801: 1616: 1564: 1231: 1226: 1201: 1133: 754: 652: 432: 1737: 704: 679: 548: 351: 1696: 1574: 1370: 1246: 1196: 170: 1753: 1544: 1539: 1426: 1385: 1191: 294: 111: 100: 463:
NOTE: requires manually setting the "Cyrillic (Windows)" encoding in browser.
359: 374: 302:
Moffat, A. (November 1990). "Implementing the PPM data compression scheme".
580: 424: 1667: 1512: 1169: 574: 399: 367: 1436: 910: 859: 543: 950: 561:"Arithmetic Coding + Statistical Modeling = Data Compression", Part 2 416: 326: 254: 510:
external links, and converting useful links where appropriate into
1785: 1559: 1390: 983: 930: 940: 794: 779: 769: 138:, determines the order of the PPM model which is denoted as PPM( 584: 915: 881: 474: 230: 29: 568: 454: 346:(2_and_3). Oxford, England: Oxford University Press: 67–75. 219: 203:. Recent PPM implementations are among the best-performing 499: 494:
may not follow Knowledge (XXG)'s policies or guidelines
455:"BMF, PPMd Всё о сжатии данных, изображений и видео" 334:
Cleary, J. G.; Teahan, W. J.; Witten, I. H. (1997).
218:
file format by default. It is also available in the
1746: 1730: 1648: 1573: 1505: 1496: 1419: 1353: 1344: 1245: 1162: 1153: 1069: 1007: 998: 900: 810: 740: 628: 619: 169:, which assigns the "never-seen" symbol a fixed 150:context symbols, a prediction is attempted with 229:Attempts to improve PPM algorithms led to the 596: 8: 146:. If no prediction can be made based on all 1502: 1350: 1159: 1004: 625: 603: 589: 581: 530:Learn how and when to remove this message 398: 316: 284: 80:Learn how and when to remove this message 544:Suite of PPM compressors with benchmarks 368:Solving the problems of context modeling 43:This article includes a list of general 446: 233:series of data compression algorithms. 7: 186:, though it is also possible to use 336:"Unbounded length contexts for PPM" 49:it lacks sufficient corresponding 25: 549:BICOM, a bijective PPM compressor 1777: 1776: 1767: 1766: 479: 379:Original Source from archive.org 134:The number of previous symbols, 34: 1813:Lossless compression algorithms 375:Probability estimation for PPM 93:Prediction by partial matching 1: 352:10.1093/comjnl/40.2_and_3.67 1829: 1658:Compressed data structures 980:RLE + BWT + MTF + Huffman 648:Asymmetric numeral systems 27:Data compression technique 1762: 1017:Discrete cosine transform 947:LZ77 + Huffman + context 295:10.1109/TCOM.1984.1096090 18:PPM compression algorithm 1722:Smallest grammar problem 1663:Compressed suffix array 1212:Nyquist–Shannon theorem 165:. One variant uses the 64:more precise citations. 575:PPM compression in C++ 163:zero-frequency problem 1692:Kolmogorov complexity 1560:Video characteristics 937:LZ77 + Huffman + ANS 190:or even some type of 1782:Compression software 1376:Compression artifact 1332:Psychoacoustic model 500:improve this article 340:The Computer Journal 205:lossless compression 1772:Compression formats 1411:Texture compression 1406:Standard test image 1222:Silence compression 512:footnote references 409:1996Chaos...6..414S 305:IEEE Trans. Commun. 273:IEEE Trans. Commun. 106:technique based on 1680:Information theory 1535:Display resolution 1361:Chroma subsampling 750:Byte pair encoding 695:Shannon–Fano–Elias 577:by RenĂŠ Puschinger 554:2004-04-15 at the 195:file format easy. 1795: 1794: 1644: 1643: 1594:Deblocking filter 1492: 1491: 1340: 1339: 1149: 1148: 994: 993: 571:by Dmitri Shkarin 540: 539: 532: 311:(11): 1917–1921. 192:dictionary coding 184:arithmetic coding 167:Laplace estimator 129:arithmetic coding 99:) is an adaptive 90: 89: 82: 16:(Redirected from 1820: 1808:Data compression 1780: 1779: 1770: 1769: 1599:Lapped transform 1503: 1381:Image resolution 1366:Coding tree unit 1351: 1160: 1005: 626: 612:Data compression 605: 598: 591: 582: 567: 535: 528: 524: 521: 515: 483: 482: 475: 464: 462: 451: 436: 417:10.1063/1.166191 402: 400:cond-mat/0203436 363: 330: 327:10.1109/26.61469 320: 298: 288: 209:natural language 188:Huffman encoding 116:cluster analysis 108:context modeling 104:data compression 85: 78: 74: 71: 65: 60:this article by 51:inline citations 38: 37: 30: 21: 1828: 1827: 1823: 1822: 1821: 1819: 1818: 1817: 1798: 1797: 1796: 1791: 1758: 1742: 1726: 1707:Rate–distortion 1640: 1569: 1488: 1415: 1336: 1241: 1237:Sub-band coding 1145: 1070:Predictive type 1065: 990: 957:LZSS + Huffman 907:LZ77 + Huffman 896: 806: 742:Dictionary type 736: 638:Adaptive coding 615: 609: 569:PPMd compressor 565: 556:Wayback Machine 536: 525: 519: 516: 497: 488:This article's 484: 480: 473: 468: 467: 453: 452: 448: 443: 384: 333: 318:10.1.1.120.8728 301: 269: 266: 246: 180: 159:escape sequence 124: 86: 75: 69: 66: 56:Please help to 55: 39: 35: 28: 23: 22: 15: 12: 11: 5: 1826: 1824: 1816: 1815: 1810: 1800: 1799: 1793: 1792: 1790: 1789: 1774: 1763: 1760: 1759: 1757: 1756: 1750: 1748: 1744: 1743: 1741: 1740: 1734: 1732: 1728: 1727: 1725: 1724: 1719: 1714: 1709: 1704: 1699: 1694: 1689: 1688: 1687: 1677: 1672: 1671: 1670: 1665: 1654: 1652: 1646: 1645: 1642: 1641: 1639: 1638: 1637: 1636: 1631: 1621: 1620: 1619: 1614: 1609: 1601: 1596: 1591: 1586: 1580: 1578: 1571: 1570: 1568: 1567: 1562: 1557: 1552: 1547: 1542: 1537: 1532: 1531: 1530: 1525: 1520: 1509: 1507: 1500: 1494: 1493: 1490: 1489: 1487: 1486: 1485: 1484: 1479: 1474: 1469: 1459: 1454: 1449: 1444: 1439: 1434: 1429: 1423: 1421: 1417: 1416: 1414: 1413: 1408: 1403: 1398: 1393: 1388: 1383: 1378: 1373: 1368: 1363: 1357: 1355: 1348: 1342: 1341: 1338: 1337: 1335: 1334: 1329: 1324: 1323: 1322: 1317: 1312: 1307: 1302: 1292: 1291: 1290: 1280: 1279: 1278: 1273: 1263: 1258: 1252: 1250: 1243: 1242: 1240: 1239: 1234: 1229: 1224: 1219: 1214: 1209: 1204: 1199: 1194: 1189: 1188: 1187: 1182: 1177: 1166: 1164: 1157: 1151: 1150: 1147: 1146: 1144: 1143: 1141:Psychoacoustic 1138: 1137: 1136: 1131: 1126: 1118: 1117: 1116: 1111: 1106: 1101: 1096: 1086: 1085: 1084: 1073: 1071: 1067: 1066: 1064: 1063: 1062: 1061: 1056: 1051: 1041: 1036: 1031: 1030: 1029: 1024: 1013: 1011: 1009:Transform type 1002: 996: 995: 992: 991: 989: 988: 987: 986: 978: 977: 976: 973: 965: 964: 963: 955: 954: 953: 945: 944: 943: 935: 934: 933: 925: 924: 923: 918: 913: 904: 902: 898: 897: 895: 894: 889: 884: 879: 874: 869: 868: 867: 862: 852: 847: 842: 841: 840: 830: 825: 820: 814: 812: 808: 807: 805: 804: 803: 802: 797: 792: 787: 782: 777: 772: 767: 762: 752: 746: 744: 738: 737: 735: 734: 733: 732: 727: 722: 717: 707: 702: 697: 692: 687: 682: 677: 676: 675: 670: 665: 655: 650: 645: 640: 634: 632: 623: 617: 616: 610: 608: 607: 600: 593: 585: 579: 578: 572: 563: 558: 546: 538: 537: 492:external links 487: 485: 478: 472: 471:External links 469: 466: 465: 459:compression.ru 445: 444: 442: 439: 438: 437: 393:(3): 414–427. 382: 371: 364: 331: 299: 286:10.1.1.14.4305 279:(4): 396–402. 265: 262: 261: 260: 252: 250:Language model 245: 242: 226:file formats. 179: 178:Implementation 176: 123: 120: 88: 87: 42: 40: 33: 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 1825: 1814: 1811: 1809: 1806: 1805: 1803: 1787: 1783: 1775: 1773: 1765: 1764: 1761: 1755: 1752: 1751: 1749: 1745: 1739: 1736: 1735: 1733: 1729: 1723: 1720: 1718: 1715: 1713: 1710: 1708: 1705: 1703: 1700: 1698: 1695: 1693: 1690: 1686: 1683: 1682: 1681: 1678: 1676: 1673: 1669: 1666: 1664: 1661: 1660: 1659: 1656: 1655: 1653: 1651: 1647: 1635: 1632: 1630: 1627: 1626: 1625: 1622: 1618: 1615: 1613: 1610: 1608: 1605: 1604: 1602: 1600: 1597: 1595: 1592: 1590: 1587: 1585: 1582: 1581: 1579: 1576: 1572: 1566: 1565:Video quality 1563: 1561: 1558: 1556: 1553: 1551: 1548: 1546: 1543: 1541: 1538: 1536: 1533: 1529: 1526: 1524: 1521: 1519: 1516: 1515: 1514: 1511: 1510: 1508: 1504: 1501: 1499: 1495: 1483: 1480: 1478: 1475: 1473: 1470: 1468: 1465: 1464: 1463: 1460: 1458: 1455: 1453: 1450: 1448: 1445: 1443: 1440: 1438: 1435: 1433: 1430: 1428: 1425: 1424: 1422: 1418: 1412: 1409: 1407: 1404: 1402: 1399: 1397: 1394: 1392: 1389: 1387: 1384: 1382: 1379: 1377: 1374: 1372: 1369: 1367: 1364: 1362: 1359: 1358: 1356: 1352: 1349: 1347: 1343: 1333: 1330: 1328: 1325: 1321: 1318: 1316: 1313: 1311: 1308: 1306: 1303: 1301: 1298: 1297: 1296: 1293: 1289: 1286: 1285: 1284: 1281: 1277: 1274: 1272: 1269: 1268: 1267: 1264: 1262: 1259: 1257: 1254: 1253: 1251: 1248: 1244: 1238: 1235: 1233: 1232:Speech coding 1230: 1228: 1227:Sound quality 1225: 1223: 1220: 1218: 1215: 1213: 1210: 1208: 1205: 1203: 1202:Dynamic range 1200: 1198: 1195: 1193: 1190: 1186: 1183: 1181: 1178: 1176: 1173: 1172: 1171: 1168: 1167: 1165: 1161: 1158: 1156: 1152: 1142: 1139: 1135: 1132: 1130: 1127: 1125: 1122: 1121: 1119: 1115: 1112: 1110: 1107: 1105: 1102: 1100: 1097: 1095: 1092: 1091: 1090: 1087: 1083: 1080: 1079: 1078: 1075: 1074: 1072: 1068: 1060: 1057: 1055: 1052: 1050: 1047: 1046: 1045: 1042: 1040: 1037: 1035: 1032: 1028: 1025: 1023: 1020: 1019: 1018: 1015: 1014: 1012: 1010: 1006: 1003: 1001: 997: 985: 982: 981: 979: 974: 972: 969: 968: 967:LZ77 + Range 966: 962: 959: 958: 956: 952: 949: 948: 946: 942: 939: 938: 936: 932: 929: 928: 926: 922: 919: 917: 914: 912: 909: 908: 906: 905: 903: 899: 893: 890: 888: 885: 883: 880: 878: 875: 873: 870: 866: 863: 861: 858: 857: 856: 853: 851: 848: 846: 843: 839: 836: 835: 834: 831: 829: 826: 824: 821: 819: 816: 815: 813: 809: 801: 798: 796: 793: 791: 788: 786: 783: 781: 778: 776: 773: 771: 768: 766: 763: 761: 758: 757: 756: 753: 751: 748: 747: 745: 743: 739: 731: 728: 726: 723: 721: 718: 716: 713: 712: 711: 708: 706: 703: 701: 698: 696: 693: 691: 688: 686: 683: 681: 678: 674: 671: 669: 666: 664: 661: 660: 659: 656: 654: 651: 649: 646: 644: 641: 639: 636: 635: 633: 631: 627: 624: 622: 618: 613: 606: 601: 599: 594: 592: 587: 586: 583: 576: 573: 570: 564: 562: 559: 557: 553: 550: 547: 545: 542: 541: 534: 531: 523: 520:November 2015 513: 509: 508:inappropriate 505: 501: 495: 493: 486: 477: 476: 470: 461:(in Russian). 460: 456: 450: 447: 440: 434: 430: 426: 422: 418: 414: 410: 406: 401: 396: 392: 388: 383: 380: 376: 373:W.J. Teahan, 372: 369: 365: 361: 357: 353: 349: 345: 341: 337: 332: 328: 324: 319: 314: 310: 307: 306: 300: 296: 292: 287: 282: 278: 275: 274: 268: 267: 263: 259: 257: 253: 251: 248: 247: 243: 241: 239: 234: 232: 227: 225: 221: 217: 212: 210: 207:programs for 206: 202: 196: 193: 189: 185: 177: 175: 172: 168: 164: 160: 155: 153: 149: 145: 141: 137: 132: 130: 121: 119: 117: 113: 109: 105: 102: 98: 94: 84: 81: 73: 70:November 2015 63: 59: 53: 52: 46: 41: 32: 31: 19: 1738:Hutter Prize 1702:Quantization 1607:Compensation 1401:Quantization 1124:Compensation 886: 690:Shannon–Fano 630:Entropy type 566:(in Russian) 526: 517: 502:by removing 489: 458: 449: 390: 386: 343: 339: 308: 303: 276: 271: 255: 235: 228: 213: 197: 181: 156: 151: 147: 143: 139: 135: 133: 125: 96: 92: 91: 76: 67: 48: 1697:Prefix code 1550:Frame types 1371:Color space 1197:Convolution 927:LZ77 + ANS 838:Incremental 811:Other types 730:Levenshtein 171:pseudocount 101:statistical 62:introducing 1802:Categories 1754:Mark Adler 1712:Redundancy 1629:Daubechies 1612:Estimation 1545:Frame rate 1467:Daubechies 1427:Chain code 1386:Macroblock 1192:Companding 1129:Estimation 1049:Daubechies 755:Lempel–Ziv 715:Exp-Golomb 643:Arithmetic 441:References 366:C. Bloom, 112:prediction 45:references 1731:Community 1555:Interlace 941:Zstandard 720:Fibonacci 710:Universal 668:Canonical 504:excessive 360:0010-4620 313:CiteSeerX 281:CiteSeerX 1717:Symmetry 1685:Timeline 1668:FM-index 1513:Bit rate 1506:Concepts 1354:Concepts 1217:Sampling 1170:Bit rate 1163:Concepts 865:Sequitur 700:Tunstall 673:Modified 663:Adaptive 621:Lossless 552:Archived 433:10090433 425:12780271 244:See also 1675:Entropy 1624:Wavelet 1603:Motion 1462:Wavelet 1442:Fractal 1437:Deflate 1420:Methods 1207:Latency 1120:Motion 1044:Wavelet 961:LHA/LZH 911:Deflate 860:Re-Pair 855:Grammar 685:Shannon 658:Huffman 614:methods 498:Please 490:use of 405:Bibcode 264:Sources 58:improve 1786:codecs 1747:People 1650:Theory 1617:Vector 1134:Vector 951:Brotli 901:Hybrid 800:Snappy 653:Golomb 431:  423:  358:  315:  283:  238:Dasher 211:text. 122:Theory 47:, but 1577:parts 1575:Codec 1540:Frame 1498:Video 1482:SPIHT 1391:Pixel 1346:Image 1300:ACELP 1271:ADPCM 1261:Îź-law 1256:A-law 1249:parts 1247:Codec 1155:Audio 1094:ACELP 1082:ADPCM 1059:SPIHT 1000:Lossy 984:bzip2 975:LZHAM 931:LZFSE 833:Delta 725:Gamma 705:Unary 680:Range 429:S2CID 395:arXiv 387:Chaos 258:-gram 1589:DPCM 1396:PSNR 1327:MDCT 1320:WLPC 1305:CELP 1266:DPCM 1114:WLPC 1099:CELP 1077:DPCM 1027:MDCT 971:LZMA 872:LDCT 850:DPCM 795:LZWL 785:LZSS 780:LZRW 770:LZJB 421:PMID 356:ISSN 222:and 144:PPM* 110:and 1634:DWT 1584:DCT 1528:VBR 1523:CBR 1518:ABR 1477:EZW 1472:DWT 1457:RLE 1447:KLT 1432:DCT 1315:LSP 1310:LAR 1295:LPC 1288:FFT 1185:VBR 1180:CBR 1175:ABR 1109:LSP 1104:LAR 1089:LPC 1054:DWT 1039:FFT 1034:DST 1022:DCT 921:LZS 916:LZX 892:RLE 887:PPM 882:PAQ 877:MTF 845:DMC 823:CTW 818:BWT 790:LZW 775:LZO 765:LZ4 760:842 506:or 413:doi 348:doi 323:doi 291:doi 231:PAQ 224:zip 216:RAR 201:RAM 97:PPM 1804:: 1452:LP 1283:FT 1276:DM 828:CM 457:. 427:. 419:. 411:. 403:. 389:. 377:, 354:. 344:40 342:. 338:. 321:. 309:38 289:. 277:32 240:. 220:7z 118:. 1788:) 1784:( 604:e 597:t 590:v 533:) 527:( 522:) 518:( 514:. 496:. 435:. 415:: 407:: 397:: 391:6 381:. 370:. 362:. 350:: 329:. 325:: 297:. 293:: 256:n 152:n 148:n 140:n 136:n 95:( 83:) 77:( 72:) 68:( 54:. 20:)

Index

PPM compression algorithm
references
inline citations
improve
introducing
Learn how and when to remove this message
statistical
data compression
context modeling
prediction
cluster analysis
arithmetic coding
escape sequence
zero-frequency problem
Laplace estimator
pseudocount
arithmetic coding
Huffman encoding
dictionary coding
RAM
lossless compression
natural language
RAR
7z
zip
PAQ
Dasher
Language model
n-gram
IEEE Trans. Commun.

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑