Knowledge (XXG)

Hunt–Szymanski algorithm

Source 📝

460: 449: 666:
The steps that the above algorithm would perform to determine the length of the longest common subsequence for both sequences are shown in the diagram. The algorithm correctly reports that the longest common subsequence of the two sequences is two elements long.
212: 1016: 661: 569: 104:
The description of the algorithm appeared as a technical report by Hunt and McIlroy in 1976. The following year, a variant of the algorithm was finally published in a joint paper by Hunt and Szymanski.
590: 498: 922: 444:{\displaystyle P_{ij}={\begin{cases}0&{\text{ if }}\ i=0{\text{ or }}j=0\\1+P_{i-1,j-1}&{\text{ if }}A_{i}=B_{j}\\\max(P_{i-1,j},P_{i,j-1})&{\text{ if }}A_{i}\neq B_{j}\end{cases}}} 1059:
Black dots represent candidates that would have to be considered by the simple algorithm and the black lines are connections that create common subsequences of length 3.
826: 1316:
Szymanski, T. G. (1975) A special case of the maximal common subsequence problem. Technical Report TR-170, Computer Science Lab., Princeton University.
1068:-candidates that are considered by the Hunt–Szymanski algorithm and the red line is the connection that creates a common subsequence of length 3. 585: 493: 1041:-candidates are marked on the grid. A common subsequence can be created by joining marked coordinates of the grid such that any increase in 36:
which compares a pair of files each represented as a sequence of lines. To this day, variations of this algorithm are found in incremental
124:. The solution is modified so that there are lower time and space requirements for the algorithm when it is working with typical inputs. 1082: 29: 1253: 113:
The Hunt–Szymanski algorithm is a modification to a basic solution for the longest common subsequence problem which has complexity
1384: 1087: 1394: 832: 89:
The algorithm was proposed by Harold S. Stone as a generalization of a special case solved by Thomas G. Szymanski.
1335: 1025:-candidates reduces the amount of time and space needed to find the longest common subsequence of two sequences. 45: 1389: 459: 37: 1077: 237: 1113:. Department of Mathematics and Computer Science, University of Southern Denmark. January 12, 2017. 1360: 1215: 1165: 1137: 1126:"New tabulation and sparse dynamic programming based techniques for sequence similarity problems" 791: 1327: 717:. The Hunt–Szymanski algorithm modifies this algorithm to have a worst-case time complexity of 1352: 1249: 1207: 1157: 1107: 463:
A table showing the recursive steps that the basic longest common subsequence algorithm takes
1344: 1199: 1147: 17: 93:
refined the idea, implemented the first version of the candidate-listing algorithm used by
1273: 98: 761:
The Hunt–Szymanski algorithm only considers what the authors call essential matches, or
1241: 1237: 690: 1378: 90: 41: 1219: 1169: 1364: 1303: 1015: 1184: 1035:-candidates, a grid with each sequence's contents on each axis is created. The 1233: 1152: 1125: 1356: 1277: 1211: 1161: 1348: 1203: 656:{\displaystyle {\begin{aligned}B_{1}=a\\B_{2}=c\\B_{3}=b\end{aligned}}} 564:{\displaystyle {\begin{aligned}A_{1}=a\\A_{2}=b\\A_{3}=c\end{aligned}}} 1185:"Bounds on the Complexity of the Longest Common Subsequence Problem" 1142: 1014: 675:
The above algorithm has worst-case time and space complexities of
458: 743:, though it regularly beats the worst case with typical inputs. 94: 33: 1029:
To create the longest common subsequence from a collection of
182:
be the length of the longest common subsequence for the first
1328:"A fast algorithm for computing longest common subsequences" 437: 32:. It was one of the first non-heuristic algorithms used in 835: 794: 588: 496: 215: 1304:"Sequence Comparison: Some Theory and Some Practice" 917:{\displaystyle P_{ij}>max(P_{i-1,j},P_{i,j-1})} 916: 820: 655: 563: 443: 348: 51:The worst-case complexity for this algorithm is 1278:"An Algorithm for Differential File Comparison" 1056:This is illustrated in the adjacent diagram. 8: 1326:Hunt, James W; Szymanski, Thomas G. (1977). 1183:Aho, A.; Hirschberg, D.; Ullman, J. (1976). 970:There are no common subsequences of length 927:The second point implies two properties of 97:and embedded it into an older framework of 1151: 1141: 893: 868: 840: 834: 812: 799: 793: 637: 617: 597: 589: 587: 545: 525: 505: 497: 495: 428: 415: 406: 383: 358: 338: 325: 316: 290: 262: 245: 232: 220: 214: 128:Basic longest common subsequence solution 937:There is a common subsequence of length 1099: 1267: 1265: 1263: 1261: 1108:"The Hunt-Szymanski Algorithm for LCS" 1047:is accompanied by an increase in  711:is the number of elements in sequence 699:is the number of elements in sequence 7: 170:th element of the second sequence. 1285:Computing Science Technical Report 1083:Longest common subsequence problem 152:th element of the first sequence. 30:longest common subsequence problem 14: 773:-candidates are pairs of indices 1246:Data Structures and Algorithms. 1019:A diagram that shows how using 911: 861: 401: 351: 1: 1306:. Universidade de São Paulo. 1302:Imre Simon (April 2, 1988). 1130:Discrete Applied Mathematics 821:{\displaystyle A_{i}=B_{j}} 1411: 1124:Grabowski, Szymon (2016). 1336:Communications of the ACM 1153:10.1016/j.dam.2015.10.040 579:contains three elements: 487:contains three elements: 1088:Wagner–Fischer algorithm 732:and space complexity of 22:Hunt–Szymanski algorithm 467:Consider the sequences 46:molecular phylogenetics 38:version control systems 28:, is a solution to the 1248:Addison-Wesley, 1983. 1026: 918: 822: 657: 565: 464: 445: 26:Hunt–McIlroy algorithm 1385:Algorithms on strings 1349:10.1145/359581.359603 1204:10.1145/321921.321922 1018: 994:elements of sequence 982:elements of sequence 961:elements of sequence 949:elements of sequence 919: 823: 658: 566: 462: 446: 1291:. Bell Laboratories. 1078:Levenshtein distance 833: 792: 586: 494: 213: 81:is rather expected. 1395:Dynamic programming 1274:McIlroy, M. Douglas 1232:See Section 5.6 of 1062:Red dots represent 976:for any fewer than 48:research software. 1192:Journal of the ACM 1027: 914: 818: 653: 651: 561: 559: 465: 441: 436: 66:, but in practice 747:Essential matches 409: 319: 265: 252: 248: 1402: 1369: 1368: 1332: 1323: 1317: 1314: 1308: 1307: 1299: 1293: 1292: 1282: 1272:Hunt, James W.; 1269: 1256: 1230: 1224: 1223: 1189: 1180: 1174: 1173: 1155: 1145: 1121: 1115: 1114: 1112: 1104: 1067: 1052: 1046: 1040: 1034: 1024: 1010: 999: 993: 987: 981: 975: 966: 960: 954: 948: 942: 932: 923: 921: 920: 915: 910: 909: 885: 884: 848: 847: 827: 825: 824: 819: 817: 816: 804: 803: 784: 772: 766: 756: 742: 731: 716: 710: 704: 698: 685: 662: 660: 659: 654: 652: 642: 641: 622: 621: 602: 601: 578: 570: 568: 567: 562: 560: 550: 549: 530: 529: 510: 509: 486: 478: 472: 450: 448: 447: 442: 440: 439: 433: 432: 420: 419: 410: 407: 400: 399: 375: 374: 343: 342: 330: 329: 320: 317: 313: 312: 266: 263: 250: 249: 246: 228: 227: 205: 199: 193: 187: 181: 169: 163: 151: 145: 123: 80: 65: 24:, also known as 18:computer science 1410: 1409: 1405: 1404: 1403: 1401: 1400: 1399: 1375: 1374: 1373: 1372: 1330: 1325: 1324: 1320: 1315: 1311: 1301: 1300: 1296: 1280: 1271: 1270: 1259: 1238:Hopcroft, J. E. 1231: 1227: 1187: 1182: 1181: 1177: 1123: 1122: 1118: 1110: 1106: 1105: 1101: 1096: 1074: 1063: 1048: 1042: 1036: 1030: 1020: 1013: 1006: 995: 989: 983: 977: 971: 962: 956: 950: 944: 938: 928: 889: 864: 836: 831: 830: 808: 795: 790: 789: 774: 768: 762: 759: 752: 749: 733: 718: 712: 706: 700: 694: 676: 673: 650: 649: 633: 630: 629: 613: 610: 609: 593: 584: 583: 574: 558: 557: 541: 538: 537: 521: 518: 517: 501: 492: 491: 482: 474: 468: 457: 435: 434: 424: 411: 404: 379: 354: 345: 344: 334: 321: 314: 286: 277: 276: 243: 233: 216: 211: 210: 201: 195: 189: 183: 179: 174: 165: 161: 156: 147: 143: 138: 135: 130: 114: 111: 99:Douglas McIlroy 87: 67: 52: 12: 11: 5: 1408: 1406: 1398: 1397: 1392: 1387: 1377: 1376: 1371: 1370: 1343:(5): 350–353. 1318: 1309: 1294: 1257: 1225: 1175: 1116: 1098: 1097: 1095: 1092: 1091: 1090: 1085: 1080: 1073: 1070: 1012: 1003: 1002: 1001: 968: 955:and the first 925: 924: 913: 908: 905: 902: 899: 896: 892: 888: 883: 880: 877: 874: 871: 867: 863: 860: 857: 854: 851: 846: 843: 839: 828: 815: 811: 807: 802: 798: 758: 750: 748: 745: 691:big O notation 672: 669: 664: 663: 648: 645: 640: 636: 632: 631: 628: 625: 620: 616: 612: 611: 608: 605: 600: 596: 592: 591: 572: 571: 556: 553: 548: 544: 540: 539: 536: 533: 528: 524: 520: 519: 516: 513: 508: 504: 500: 499: 456: 453: 452: 451: 438: 431: 427: 423: 418: 414: 408: if  405: 403: 398: 395: 392: 389: 386: 382: 378: 373: 370: 367: 364: 361: 357: 353: 350: 347: 346: 341: 337: 333: 328: 324: 318: if  315: 311: 308: 305: 302: 299: 296: 293: 289: 285: 282: 279: 278: 275: 272: 269: 264: or  261: 258: 255: 247: if  244: 242: 239: 238: 236: 231: 226: 223: 219: 194:and the first 177: 159: 141: 134: 131: 129: 126: 110: 107: 86: 83: 13: 10: 9: 6: 4: 3: 2: 1407: 1396: 1393: 1391: 1390:Combinatorics 1388: 1386: 1383: 1382: 1380: 1366: 1362: 1358: 1354: 1350: 1346: 1342: 1338: 1337: 1329: 1322: 1319: 1313: 1310: 1305: 1298: 1295: 1290: 1286: 1279: 1276:(June 1976). 1275: 1268: 1266: 1264: 1262: 1258: 1255: 1254:0-201-00023-7 1251: 1247: 1243: 1242:Ullman, J. D. 1239: 1235: 1229: 1226: 1221: 1217: 1213: 1209: 1205: 1201: 1197: 1193: 1186: 1179: 1176: 1171: 1167: 1163: 1159: 1154: 1149: 1144: 1139: 1136:(C): 96–103. 1135: 1131: 1127: 1120: 1117: 1109: 1103: 1100: 1093: 1089: 1086: 1084: 1081: 1079: 1076: 1075: 1071: 1069: 1066: 1060: 1057: 1054: 1051: 1045: 1039: 1033: 1023: 1017: 1009: 1004: 998: 992: 986: 980: 974: 969: 965: 959: 953: 947: 943:in the first 941: 936: 935: 934: 933:-candidates: 931: 906: 903: 900: 897: 894: 890: 886: 881: 878: 875: 872: 869: 865: 858: 855: 852: 849: 844: 841: 837: 829: 813: 809: 805: 800: 796: 788: 787: 786: 782: 778: 771: 767:-candidates. 765: 755: 751: 746: 744: 740: 736: 729: 725: 721: 715: 709: 703: 697: 692: 689: 683: 679: 670: 668: 646: 643: 638: 634: 626: 623: 618: 614: 606: 603: 598: 594: 582: 581: 580: 577: 554: 551: 546: 542: 534: 531: 526: 522: 514: 511: 506: 502: 490: 489: 488: 485: 480: 477: 471: 461: 454: 429: 425: 421: 416: 412: 396: 393: 390: 387: 384: 380: 376: 371: 368: 365: 362: 359: 355: 339: 335: 331: 326: 322: 309: 306: 303: 300: 297: 294: 291: 287: 283: 280: 273: 270: 267: 259: 256: 253: 240: 234: 229: 224: 221: 217: 209: 208: 207: 204: 198: 192: 186: 180: 171: 168: 162: 153: 150: 144: 132: 127: 125: 121: 117: 108: 106: 102: 100: 96: 92: 91:James W. Hunt 84: 82: 78: 74: 70: 63: 59: 55: 49: 47: 43: 39: 35: 31: 27: 23: 19: 1340: 1334: 1321: 1312: 1297: 1288: 1284: 1245: 1228: 1195: 1191: 1178: 1133: 1129: 1119: 1102: 1064: 1061: 1058: 1055: 1049: 1043: 1037: 1031: 1028: 1021: 1007: 996: 990: 984: 978: 972: 963: 957: 951: 945: 939: 929: 926: 780: 776: 769: 763: 760: 753: 738: 734: 727: 723: 719: 713: 707: 701: 695: 687: 681: 677: 674: 665: 575: 573: 483: 481: 475: 469: 466: 202: 196: 190: 188:elements of 184: 175: 172: 166: 157: 154: 148: 139: 136: 119: 115: 112: 103: 88: 76: 72: 68: 61: 57: 53: 50: 42:wiki engines 25: 21: 15: 1198:(1): 1–12. 1011:-candidates 1005:Connecting 785:such that: 757:-candidates 1379:Categories 1234:Aho, A. V. 1094:References 671:Complexity 1357:0001-0782 1212:0004-5411 1162:0166-218X 1143:1312.2217 904:− 873:− 693:), where 422:≠ 394:− 363:− 307:− 295:− 200:elements 133:Algorithm 109:Algorithm 1220:10957346 1170:14005194 1072:See also 1365:3226080 455:Example 164:be the 146:be the 85:History 1363:  1355:  1252:  1218:  1210:  1168:  1160:  251:  44:, and 20:, the 1361:S2CID 1331:(PDF) 1281:(PDF) 1216:S2CID 1188:(PDF) 1166:S2CID 1138:arXiv 1111:(PDF) 1353:ISSN 1250:ISBN 1208:ISSN 1158:ISSN 850:> 726:log 705:and 473:and 173:Let 155:Let 137:Let 95:diff 75:log 60:log 34:diff 1345:doi 1200:doi 1148:doi 1134:212 988:or 688:see 349:max 16:In 1381:: 1359:. 1351:. 1341:20 1339:. 1333:. 1289:41 1287:. 1283:. 1260:^ 1244:, 1240:, 1236:, 1214:. 1206:. 1196:23 1194:. 1190:. 1164:. 1156:. 1146:. 1132:. 1128:. 1053:. 779:, 739:mn 724:mn 682:mn 479:. 206:. 178:ij 101:. 40:, 1367:. 1347:: 1222:. 1202:: 1172:. 1150:: 1140:: 1065:k 1050:j 1044:i 1038:k 1032:k 1022:k 1008:k 1000:. 997:B 991:j 985:A 979:i 973:k 967:. 964:B 958:j 952:A 946:i 940:k 930:k 912:) 907:1 901:j 898:, 895:i 891:P 887:, 882:j 879:, 876:1 870:i 866:P 862:( 859:x 856:a 853:m 845:j 842:i 838:P 814:j 810:B 806:= 801:i 797:A 783:) 781:j 777:i 775:( 770:k 764:k 754:k 741:) 737:( 735:O 730:) 728:m 722:( 720:O 714:B 708:n 702:A 696:m 686:( 684:) 680:( 678:O 647:b 644:= 639:3 635:B 627:c 624:= 619:2 615:B 607:a 604:= 599:1 595:B 576:B 555:c 552:= 547:3 543:A 535:b 532:= 527:2 523:A 515:a 512:= 507:1 503:A 484:A 476:B 470:A 430:j 426:B 417:i 413:A 402:) 397:1 391:j 388:, 385:i 381:P 377:, 372:j 369:, 366:1 360:i 356:P 352:( 340:j 336:B 332:= 327:i 323:A 310:1 304:j 301:, 298:1 292:i 288:P 284:+ 281:1 274:0 271:= 268:j 260:0 257:= 254:i 241:0 235:{ 230:= 225:j 222:i 218:P 203:B 197:j 191:A 185:i 176:P 167:j 160:j 158:B 149:i 142:i 140:A 122:) 120:n 118:( 116:O 79:) 77:n 73:n 71:( 69:O 64:) 62:n 58:n 56:( 54:O

Index

computer science
longest common subsequence problem
diff
version control systems
wiki engines
molecular phylogenetics
James W. Hunt
diff
Douglas McIlroy

big O notation

Levenshtein distance
Longest common subsequence problem
Wagner–Fischer algorithm
"The Hunt-Szymanski Algorithm for LCS"
"New tabulation and sparse dynamic programming based techniques for sequence similarity problems"
arXiv
1312.2217
doi
10.1016/j.dam.2015.10.040
ISSN
0166-218X
S2CID
14005194
"Bounds on the Complexity of the Longest Common Subsequence Problem"
doi
10.1145/321921.321922
ISSN
0004-5411

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.