Knowledge

Matching wildcards

Source 📝

635: 123: 630:{\displaystyle {\begin{aligned}m_{00}&=(p_{0}=t_{0})\\m_{0j}&=(p_{j-1}={\text{‘*’}})\land m_{0,j-1}\\m_{i0}&={\text{false}}\\m_{ij}&={\begin{cases}m_{i-1,j-1}&{\text{for}}\;p_{i-1}=t_{j-1}\lor p_{i-1}={\text{‘?’}}\\m_{i,j-1}\lor m_{i-1,j}&{\text{for}}\;p_{i-1}={\text{‘*’}}\\{\text{false}}&{\text{for}}\;p_{i-1}\neq t_{j-1}\end{cases}}&&\quad {\text{for}}\;1\leq i\leq |p|,1\leq j\leq |t|.\end{aligned}}} 809:
in the text. Lars Mathiesen changes the return to three classes, match, no-match, and ABORT (no match possible at all for asterisk recursion.) The ABORT value is returned when the text is consumed too early or when another asterisk match has failed, guaranteeing a linear performance with respect to
813:
Asterisk advancement in recursion. This wildmatch tweak is relatively more minor. It applies to when the recursion wants to match "*X" on "abcX": when an asterisk is followed by a literal like "X", it is obvious that only the last comparison with equal lengths would have a chance of producing a
830:
The recursive algorithms are in general easier to reason about, and with the ABORT modification they perform acceptably in terms of worst-case complexity. On strings without * they take linear-to-string-size time to match since there is a fixed one-to-one relation.
875:
The iterative functions above implement backtracking by saving an old set of pattern/text pointers, and reverting to it should a match fails. According to Kurt, since only one successful match is required, only one such set needs to be saved.
826:
of the indexes, following the dynamic programming formulation of the problem. The "ABORT" technique is applicable to it as well. On typical patterns (as tested by Cantatore) it is slower than the greedy-call implementations.
690:
Pattern matching with wildcards, an unanchored string search with the equivalent of both wildcards defined. Has an exponential runtime unless a length-bound is given in the pattern matching with flexible wildcards
810:
the number of asterisks. (The overall complexity is additionally quadratic to the number of characters left to match.) Git/Rsync's wildmatch ABORT also covers invalid inputs. The new INN uwildmat does the same.
776:
The general form of these algorithms are the same. On recursion the algorithm slices the input into substrings, and considers a match to have happened when ONE of the substrings return a positive match. For
891:
are less used in practice due to lack of backreference support, wildcard matching in general does not come with a similarly rich set of features. (In fact, many of the algorithms above only has support for
128: 704:, but the technique was criticized on grounds of performance and reliability considerations. Non-recursive algorithms for matching wildcards have gained favor in light of these considerations. 707:
Among both recursive and non-recursive algorithms, strategies for performing the pattern matching operation vary widely, as evidenced among the variety of example algorithms referenced below.
711:
development and performance optimization techniques have been demonstrably brought to bear on certain algorithms, particularly those developed by critics of the recursive algorithms.
797:. They usually differ by less-important things like support for features and by more important factors such as minor but highly effective optimizations. Some of them include: 1557: 47:
command-line or text editor or file manager, as well as the interfaces for some search engines and databases. Wildcard matching is a subset of the problem of matching
801:
The ABORT signal against over-recursion (Lars Mathiesen 1991). While it is correct to naively recurse by the entire rest of the strings (pattern and text) on
1152: 822:
Martin Richter's algorithm is an exception to this pattern, although the overall operation is equivalent. On * it recurses into increasing
805:
and making sure that ONE of the substrings return a positive match, the running time becomes exponential for rejecting a match with many
1572: 843: 1160:
SOFSEM 2007: Theory and Practice of Computer Science, 33rd Conference on Current Trends in Theory and Practice of Computer Science
1567: 1547: 1447: 1432: 1085: 1111: 1552: 996: 1354: 888: 1010: 1562: 1292: 1171: 904:-based nrgrep algorithm provides a more streamlined implementation with emphasis on efficient suffixes. See also 1577: 1254: 1466: 1273: 90:), but on Windows programmers tend to only discuss a simplified syntax supported by the native C runtime: 36: 1327: 1215:
Wu, Xindong; Qiang, Ji-Peng; Xie, Fei (12 September 2014). "Pattern Matching with Flexible Wildcards".
1090: 976: 900:.) The Russ Cox implementation of Thompson NFA can be trivially modified for such. Gustavo Navarro's 668: 314: 1368: 683:
Pattern matching with don't cares or gaps, an unanchored string search with only the equivalent of
48: 992: 1523: 1232: 1163: 1137: 937: 932: 927: 905: 884: 880: 32: 814:
match. This is seen earlier in uwildmat in 2000 and more implicitly in van Rossum's fnmatch for
109:
This article mainly discusses the Windows formulation of the problem, unless otherwise stated.
1061: 44: 1188:
Clifford, Peter; Clifford, Raphaël (January 2007). "Simple deterministic wildcard matching".
1515: 1224: 1197: 922: 917: 761: 20: 117:
Stated in zero-based indices, the wildcard-matching problem can be defined recursively as:
1406: 52: 957: 1451: 1277: 1115: 1541: 1500: 1385: 1014: 961: 663:
characters respectively. This is the formulation used by Richter's algorithm and the
1236: 1167: 1527: 1470: 724: 40: 1028: 1228: 1201: 667:
algorithm found in Cantatore's collection. This description is similar to the
1485: 1032: 731: 708: 701: 87: 1047: 752: 735: 1332: 1258: 105:
matches arbitrary many (including zero) occurrences of any character.
1519: 1312: 879:
In addition, the problem of wildcard matching can be converted into
839:
The following are developed by critics of the recursive algorithms:
767: 849:
Alessandro Cantatore's collection of wildcard matching algorithms
723:
when there is more suffix to match against. This is a form of
887:. Although non-recursive regular expression matchers such as 553: 1433:"Matching Wildcards: An Improved Algorithm for Big Data" 857:
MATCH("da*da*da*", "daaadabadmanda")
700:
Early algorithms for matching wildcards often relied on
679:
Directly related problems in computer science include:
31:) is useful in comparing text strings that may contain 16:
Algorithm to compare text strings using wildcard syntax
852:
Dogan Kurt's iterative matcher and slower NFA matcher.
126: 1501:"NR-grep: a fast and flexible pattern-matching tool" 1486:"Regular Expression Matching Can Be Simple And Fast" 897: 893: 868: 856: 815: 794: 790: 786: 782: 778: 741:
Filip's algorithm and Vignesh Murugesan's algorithm
684: 102: 98: 86:The pattern can be based on any common syntax (see 629: 727:, also done by some regular expression matchers. 101:matches exactly one occurrence of any character. 1151:Iliopoulos, Costas S.; Rahman, M. Sohel (2007). 1448:"Recursive solutions for glob pattern matching" 1153:"Pattern Matching Algorithms with Don't Cares" 8: 1380: 1378: 1131: 1129: 1127: 1125: 844:Kirk J. Krauss's wildcard-matching algorithm 764:'s BSD libc fnmatch, also part of Apple libc 719:The recursion generally happens on matching 63:A wildcard matcher tests a wildcard pattern 1401: 1399: 1112:"Wildcard Matching Recursive Algorithm C++" 901: 1371:. Beren Minor's Mirrors. 21 November 2019. 1248: 1246: 1217:Journal of Computer Science and Technology 566: 514: 473: 352: 35:. Common uses of these algorithms include 906:regular expression § Implementations 744:Martin Richter's algorithm (identical to 615: 607: 587: 579: 561: 538: 519: 509: 502: 493: 478: 468: 448: 423: 410: 395: 376: 357: 347: 321: 309: 293: 280: 264: 238: 223: 208: 185: 168: 155: 135: 127: 125: 1079: 1077: 1075: 993:"MS-DOS and Windows Wildcard Characters" 867:Jack Handy's incorrect algorithm (fails 779:dowild("*X", "abcX") 1306: 1304: 1302: 949: 783:dowild("X", "abcX") 1326:van Rossum, Guido (20 November 2019). 1105: 1103: 1101: 787:dowild("X", "bcX") 647:is the result of matching the pattern 1558:History of human–computer interaction 1499:Navarro, Gonzalo (10 November 2001). 1048:"Welcome to Regular-Expressions.info" 1011:"Apache Lucene - Query Parser Syntax" 869:MATCH("*?", "xx") 791:dowild("X", "cX") 7: 1467:"Wildcard string compare (globbing)" 795:dowild("X", "X") 855:Siler's incorrect algorithm (fails 748:and related to the 7-zip algorithm) 1086:"Matching Wildcards: An Algorithm" 14: 1508:Software: Practice and Experience 978:UNIX Shell Programming QuickStart 1328:"freebsd/lib/libc/gen/fnmatch.c" 94:No escape characters are defined 1407:"uwildmat.c in trunk/lib – INN" 1274:"Compare strings with wildcard" 758:and multibyte character sets): 560: 1190:Information Processing Letters 1138:"Wildcard matching algorithms" 1136:Cantatore, Alessandro (2003). 616: 608: 588: 580: 228: 201: 174: 148: 75:match, returns true only when 1: 1357:. opensource.apple.com. 1999. 1293:"WildCard Matching algorithm" 1162:. Harrachov, Czech Republic. 1313:"Wildcard Matching Methods" 1291:Murugesan, Vignesh (2014). 997:Microsoft Developer Network 1594: 1435:. Develop for Performance. 1050:. RegularExpressions.info. 1017:2.9.4 Documentation. 2006. 755:implementations (supports 738:algorithm (sh-like syntax) 1573:User interface techniques 1229:10.1007/s11390-014-1464-3 1202:10.1016/j.ipl.2006.08.002 885:text-replacement approach 781:, it would greedily call 835:Non-recursive algorithms 79:matches the entirety of 67:against an input string 1046:Goyvaerts, Jan (2018). 975:Quigley, Ellie (2005). 889:Thompson's construction 883:matching using a naive 37:command-line interfaces 1386:"git/git: wildmatch.c" 863:The following is not: 631: 1568:Software architecture 1548:Computer file formats 1431:Krauss, Kirk (2018). 1084:Krauss, Kirk (2008). 958:"Wildcard characters" 632: 1465:Handy, Jack (2005). 1369:"fnmatch_internal.c" 1062:"Wildcard Expansion" 715:Recursive algorithms 669:Levenshtein distance 124: 1253:Salz, Rich (1991). 49:regular expressions 23:, an algorithm for 1553:Computing commands 1091:Dr. Dobb's Journal 1066:docs.microsoft.com 938:List of algorithms 933:Wildcard character 928:Glob (programming) 881:regular expression 627: 625: 552: 25:matching wildcards 1514:(13): 1265–1312. 1110:Deadlock (2015). 651:against the text 564: 512: 505: 496: 495:‘*’ 471: 413: 412:‘?’ 350: 283: 226: 225:‘*’ 71:. It performs an 45:Microsoft Windows 1585: 1563:Pattern matching 1532: 1531: 1505: 1496: 1490: 1489: 1481: 1475: 1474: 1462: 1456: 1455: 1443: 1437: 1436: 1428: 1422: 1421: 1419: 1417: 1403: 1394: 1393: 1382: 1373: 1372: 1365: 1359: 1358: 1351: 1345: 1344: 1342: 1340: 1323: 1317: 1316: 1308: 1297: 1296: 1288: 1282: 1281: 1269: 1263: 1262: 1250: 1241: 1240: 1212: 1206: 1205: 1185: 1179: 1178: 1176: 1170:. Archived from 1157: 1148: 1142: 1141: 1133: 1120: 1119: 1107: 1096: 1095: 1081: 1070: 1069: 1058: 1052: 1051: 1043: 1037: 1036: 1025: 1019: 1018: 1007: 1001: 1000: 989: 983: 982: 972: 966: 965: 954: 923:Pattern calculus 918:Pattern matching 903: 899: 895: 870: 858: 817: 808: 804: 796: 792: 788: 784: 780: 762:Guido van Rossum 757: 722: 686: 675:Related problems 636: 634: 633: 628: 626: 619: 611: 591: 583: 565: 562: 558: 556: 555: 549: 548: 530: 529: 513: 510: 506: 503: 497: 494: 489: 488: 472: 469: 465: 464: 440: 439: 414: 411: 406: 405: 387: 386: 368: 367: 351: 348: 344: 343: 301: 300: 284: 281: 272: 271: 255: 254: 227: 224: 219: 218: 193: 192: 173: 172: 160: 159: 140: 139: 104: 100: 21:computer science 1593: 1592: 1588: 1587: 1586: 1584: 1583: 1582: 1578:User interfaces 1538: 1537: 1536: 1535: 1520:10.1002/spe.411 1503: 1498: 1497: 1493: 1483: 1482: 1478: 1464: 1463: 1459: 1445: 1444: 1440: 1430: 1429: 1425: 1415: 1413: 1405: 1404: 1397: 1384: 1383: 1376: 1367: 1366: 1362: 1353: 1352: 1348: 1338: 1336: 1325: 1324: 1320: 1310: 1309: 1300: 1290: 1289: 1285: 1271: 1270: 1266: 1252: 1251: 1244: 1214: 1213: 1209: 1187: 1186: 1182: 1174: 1155: 1150: 1149: 1145: 1135: 1134: 1123: 1109: 1108: 1099: 1083: 1082: 1073: 1060: 1059: 1055: 1045: 1044: 1040: 1029:"SQL Wildcards" 1027: 1026: 1022: 1009: 1008: 1004: 991: 990: 986: 981:. InformIT.com. 974: 973: 969: 956: 955: 951: 946: 914: 837: 806: 802: 756: 720: 717: 698: 677: 645: 624: 623: 557: 551: 550: 534: 515: 507: 499: 498: 474: 466: 444: 419: 416: 415: 391: 372: 353: 345: 317: 310: 302: 289: 286: 285: 273: 260: 257: 256: 234: 204: 194: 181: 178: 177: 164: 151: 141: 131: 122: 121: 115: 61: 53:string matching 33:wildcard syntax 27:(also known as 17: 12: 11: 5: 1591: 1589: 1581: 1580: 1575: 1570: 1565: 1560: 1555: 1550: 1540: 1539: 1534: 1533: 1491: 1476: 1457: 1452:Stack Overflow 1446:Siler (2013). 1438: 1423: 1395: 1374: 1360: 1346: 1318: 1298: 1283: 1278:Stack Overflow 1272:Filip (2014). 1264: 1242: 1223:(5): 740–750. 1207: 1180: 1177:on 2019-12-17. 1143: 1121: 1116:Stack Overflow 1097: 1071: 1053: 1038: 1020: 1002: 984: 967: 948: 947: 945: 942: 941: 940: 935: 930: 925: 920: 913: 910: 873: 872: 861: 860: 853: 850: 847: 836: 833: 820: 819: 811: 774: 773: 772: 771: 765: 749: 742: 739: 716: 713: 697: 694: 693: 692: 688: 676: 673: 643: 638: 637: 622: 618: 614: 610: 606: 603: 600: 597: 594: 590: 586: 582: 578: 575: 572: 569: 559: 554: 547: 544: 541: 537: 533: 528: 525: 522: 518: 508: 501: 500: 492: 487: 484: 481: 477: 467: 463: 460: 457: 454: 451: 447: 443: 438: 435: 432: 429: 426: 422: 418: 417: 409: 404: 401: 398: 394: 390: 385: 382: 379: 375: 371: 366: 363: 360: 356: 346: 342: 339: 336: 333: 330: 327: 324: 320: 316: 315: 313: 308: 305: 303: 299: 296: 292: 288: 287: 279: 276: 274: 270: 267: 263: 259: 258: 253: 250: 247: 244: 241: 237: 233: 230: 222: 217: 214: 211: 207: 203: 200: 197: 195: 191: 188: 184: 180: 179: 176: 171: 167: 163: 158: 154: 150: 147: 144: 142: 138: 134: 130: 129: 114: 111: 107: 106: 95: 60: 57: 15: 13: 10: 9: 6: 4: 3: 2: 1590: 1579: 1576: 1574: 1571: 1569: 1566: 1564: 1561: 1559: 1556: 1554: 1551: 1549: 1546: 1545: 1543: 1529: 1525: 1521: 1517: 1513: 1509: 1502: 1495: 1492: 1487: 1480: 1477: 1472: 1468: 1461: 1458: 1453: 1449: 1442: 1439: 1434: 1427: 1424: 1412: 1411:inn.eyrie.org 1408: 1402: 1400: 1396: 1392:. 2020-01-20. 1391: 1387: 1381: 1379: 1375: 1370: 1364: 1361: 1356: 1350: 1347: 1335: 1334: 1329: 1322: 1319: 1314: 1311:Kurt, Dogan. 1307: 1305: 1303: 1299: 1294: 1287: 1284: 1279: 1275: 1268: 1265: 1260: 1256: 1249: 1247: 1243: 1238: 1234: 1230: 1226: 1222: 1218: 1211: 1208: 1203: 1199: 1195: 1191: 1184: 1181: 1173: 1169: 1165: 1161: 1154: 1147: 1144: 1139: 1132: 1130: 1128: 1126: 1122: 1117: 1113: 1106: 1104: 1102: 1098: 1093: 1092: 1087: 1080: 1078: 1076: 1072: 1067: 1063: 1057: 1054: 1049: 1042: 1039: 1034: 1030: 1024: 1021: 1016: 1015:Apache Lucene 1012: 1006: 1003: 998: 994: 988: 985: 980: 979: 971: 968: 963: 962:ScienceDirect 959: 953: 950: 943: 939: 936: 934: 931: 929: 926: 924: 921: 919: 916: 915: 911: 909: 907: 890: 886: 882: 877: 866: 865: 864: 854: 851: 848: 846:, used by IBM 845: 842: 841: 840: 834: 832: 828: 825: 812: 800: 799: 798: 769: 766: 763: 760: 759: 754: 750: 747: 743: 740: 737: 733: 730: 729: 728: 726: 714: 712: 710: 705: 703: 695: 689: 682: 681: 680: 674: 672: 670: 666: 662: 658: 655:truncated at 654: 650: 646: 620: 612: 604: 601: 598: 595: 592: 584: 576: 573: 570: 567: 545: 542: 539: 535: 531: 526: 523: 520: 516: 490: 485: 482: 479: 475: 461: 458: 455: 452: 449: 445: 441: 436: 433: 430: 427: 424: 420: 407: 402: 399: 396: 392: 388: 383: 380: 377: 373: 369: 364: 361: 358: 354: 340: 337: 334: 331: 328: 325: 322: 318: 311: 306: 304: 297: 294: 290: 277: 275: 268: 265: 261: 251: 248: 245: 242: 239: 235: 231: 220: 215: 212: 209: 205: 198: 196: 189: 186: 182: 169: 165: 161: 156: 152: 145: 143: 136: 132: 120: 119: 118: 112: 110: 96: 93: 92: 91: 89: 84: 82: 78: 74: 70: 66: 58: 56: 54: 50: 46: 42: 38: 34: 30: 26: 22: 1511: 1507: 1494: 1479: 1471:Code Project 1460: 1441: 1426: 1414:. Retrieved 1410: 1389: 1363: 1349: 1337:. Retrieved 1331: 1321: 1286: 1267: 1220: 1216: 1210: 1196:(2): 53–54. 1193: 1189: 1183: 1172:the original 1159: 1146: 1089: 1065: 1056: 1041: 1023: 1005: 987: 977: 970: 952: 878: 874: 862: 838: 829: 823: 821: 816:FNM_PATHNAME 775: 745: 725:backtracking 718: 706: 699: 678: 664: 660: 656: 652: 648: 641: 639: 116: 108: 85: 80: 76: 72: 68: 64: 62: 55:in general. 41:Bourne shell 28: 24: 18: 1484:Cox, Ross. 1416:27 November 1355:"fnmatch.c" 1339:21 November 1255:"wildmat.c" 97:Wildcards: 59:The problem 39:, e.g. the 1542:Categories 944:References 751:C library 113:Definition 1033:W3Schools 732:Rich Salz 709:Test case 702:recursion 605:≤ 599:≤ 577:≤ 571:≤ 543:− 532:≠ 524:− 483:− 453:− 442:∨ 434:− 400:− 389:∨ 381:− 362:− 338:− 326:− 249:− 232:∧ 213:− 1237:16824910 1168:14538871 999:Library. 912:See also 746:Snippets 691:variant. 687:defined. 665:Snippets 88:globbing 73:anchored 29:globbing 1528:3175806 1035:. 2018. 964:. 2018. 770:fnmatch 753:fnmatch 736:wildmat 696:History 1526:  1390:GitHub 1333:GitHub 1259:GitHub 1235:  1166:  824:either 640:where 1524:S2CID 1504:(PDF) 1233:S2CID 1175:(PDF) 1164:S2CID 1156:(PDF) 768:Glibc 504:false 282:false 1418:2019 1341:2019 896:and 793:and 659:and 51:and 1516:doi 1225:doi 1198:doi 1194:101 902:BDM 563:for 511:for 470:for 349:for 43:or 19:In 1544:: 1522:. 1512:31 1510:. 1506:. 1469:. 1450:. 1409:. 1398:^ 1388:. 1377:^ 1330:. 1301:^ 1276:. 1257:. 1245:^ 1231:. 1221:29 1219:. 1192:. 1158:. 1124:^ 1114:. 1100:^ 1088:. 1074:^ 1064:. 1031:. 1013:. 995:. 960:. 908:. 789:, 785:, 734:' 671:. 644:ij 137:00 83:. 1530:. 1518:: 1488:. 1473:. 1454:. 1420:. 1343:. 1315:. 1295:. 1280:. 1261:. 1239:. 1227:: 1204:. 1200:: 1140:. 1118:. 1094:. 1068:. 898:* 894:? 871:) 859:) 818:. 807:* 803:* 721:* 685:? 661:j 657:i 653:t 649:p 642:m 621:. 617:| 613:t 609:| 602:j 596:1 593:, 589:| 585:p 581:| 574:i 568:1 546:1 540:j 536:t 527:1 521:i 517:p 491:= 486:1 480:i 476:p 462:j 459:, 456:1 450:i 446:m 437:1 431:j 428:, 425:i 421:m 408:= 403:1 397:i 393:p 384:1 378:j 374:t 370:= 365:1 359:i 355:p 341:1 335:j 332:, 329:1 323:i 319:m 312:{ 307:= 298:j 295:i 291:m 278:= 269:0 266:i 262:m 252:1 246:j 243:, 240:0 236:m 229:) 221:= 216:1 210:j 206:p 202:( 199:= 190:j 187:0 183:m 175:) 170:0 166:t 162:= 157:0 153:p 149:( 146:= 133:m 103:* 99:? 81:s 77:p 69:s 65:p

Index

computer science
wildcard syntax
command-line interfaces
Bourne shell
Microsoft Windows
regular expressions
string matching
globbing
Levenshtein distance
recursion
Test case
backtracking
Rich Salz
wildmat
fnmatch
Guido van Rossum
Glibc
Kirk J. Krauss's wildcard-matching algorithm
regular expression
text-replacement approach
Thompson's construction
regular expression § Implementations
Pattern matching
Pattern calculus
Glob (programming)
Wildcard character
List of algorithms
"Wildcard characters"
ScienceDirect
UNIX Shell Programming QuickStart

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.