Knowledge

Metaphone

Source 📝

249:
names commonly found in the United States. It improves encoding for proper names in particular to a considerable extent. The author claims that in general it improves accuracy for all words from the approximately 89% of Double Metaphone to 98%. Developers can also now set switches in code to cause the algorithm to encode Metaphone keys 1) taking non-initial vowels into account, as well as 2) encoding voiced and unvoiced consonants differently. This allows the result set to be more closely focused if the developer finds that the search results include too many words that don't resemble the search term closely enough. Metaphone 3 is sold as C++, Java, C#, PHP, Perl, and PL/SQL source, Ruby and Python wrappers accessing a Java jar, and also Metaphone 3 for Spanish and German pronunciation available as Java and C# source. The latest revision of the Metaphone 3 algorithm is v2.5.4, released March 2015. The Metaphone3 Java source code for an earlier version, 2.1.3, lacking a large number of encoding corrections made in the current version, version 2.5.4, was included as part of the OpenRefine project and is publicly viewable.
56:. Contrary to the original algorithm whose application is limited to English only, this version takes into account spelling peculiarities of a number of other languages. In 2009 Philips released a third version, called Metaphone 3, which achieves an accuracy of approximately 99% for English words, non-English words familiar to Americans, and first names and family names commonly found in the United States, having been developed according to modern engineering standards against a test harness of prepared correct encodings. 296:
the same as 'D'. Consider, also, that all English speakers often pronounce 'Z' where 'S' is spelled, almost always when a noun ending in a voiced consonant or a liquid is pluralized, for example "seasons", "beams", "examples", etc. Not encoding vowels after an initial vowel sound will help to group words where a vowel and a consonant may be transposed in the misspelling or alternative pronunciation.
295:
This approximate encoding is necessary to account for the way English speakers vary their pronunciations and misspell or otherwise vary words and names they are trying to spell. Vowels, of course, are notoriously highly variable. British speakers often complain that Americans seem to pronounce 'T's
158:
To implement Metaphone without purchasing a (source code) copy of Metaphone 3, the reference implementation of Double Metaphone can be used. Alternatively, version 2.1.3 of Metaphone 3, an earlier 2009 version without a number of encoding corrections made in the current version, version 2.5.4, has
248:
A professional version was released in October 2009, developed by the same author, Lawrence Philips. It is a commercial product sold as source code. Metaphone 3 further improves phonetic encoding of words in the English language, non-English words familiar to Americans, and first names and family
48:
algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar. As with Soundex, similar-sounding words should share the same keys. Metaphone is
154:
This table does not constitute a complete description of the original Metaphone algorithm, and the algorithm cannot be coded correctly from it. Original Metaphone contained many errors and was superseded by Double Metaphone, and in turn Double Metaphone and original Metaphone were superseded by
180:
It is called "Double" because it can return both a primary and a secondary code for a string; this accounts for some ambiguous cases as well as for multiple variants of surnames with common ancestry. For example, encoding the name "Smith" yields a primary code of
105:'C' transforms to 'X' if followed by 'IA' or 'H' (unless in latter case, it is part of '-SCH-', in which case it transforms to 'K'). 'C' transforms to 'S' if followed by 'I', 'E', or 'Y'. Otherwise, 'C' transforms to 'K'. 281:
words that start with a vowel sound will have an 'A', representing any vowel, as the first character of the encoding (in Double Metaphone and Metaphone 3 - original Metaphone just preserves the actual vowel),
316:
languages. On the other hand, rough phonetic encoding causes language dependency — or, in a language variant, average language-speaker dependency — mainly for non-English variants.
240:, and other origins. Thus it uses a much more complex ruleset for coding than its predecessor; for example, it tests for approximately 100 different contexts of the use of the letter C alone. 353: 171:
The Double Metaphone phonetic encoding algorithm is the second generation of this algorithm. Its implementation was described in the June 2000 issue of
572: 445: 287:
voiced/unvoiced consonant pairs will be mapped to the same encoding. (Examples of voiced/unvoiced consonant pairs are D/T, B/P, Z/S, G/K, etc.).
489: 637: 92:
AEIOU are also used, but only at the beginning of the code. This table summarizes most of the rules in the original implementation:
111:
Drop 'G' if followed by 'H' and 'H' is not at the end or before a vowel. Drop 'G' if followed by 'N' or 'NED' and is at the end.
324: 44:, published by Lawrence Philips in 1990, for indexing words by their English pronunciation. It fundamentally improves on the 257:
There are some misconceptions about the Metaphone algorithms that should be addressed. The following statements are true:
668: 363: 305: 114:'G' transforms to 'J' if before 'I', 'E', or 'Y', and it is not in 'GG'. Otherwise, 'G' transforms to 'K'. 358: 173: 132:'T' transforms to 'X' if followed by 'IA' or 'IO'. 'TH' transforms to '0'. Drop 'T' if followed by 'CH'. 268:
produce phonetic representations of the input words and names; rather, the output is an intentionally
320: 155:
Metaphone 3, which corrects thousands of miscodings that will be produced by the first two versions.
496: 213: 41: 328: 381:
Hanging on the Metaphone, Lawrence Philips. Computer Language, Vol. 7, No. 12 (December), 1990.
177:. It makes a number of fundamental design improvements over the original Metaphone algorithm. 31: 237: 233: 229: 217: 209: 205: 108:'D' transforms to 'J' if followed by 'GE', 'GY', or 'GI'. Otherwise, 'D' transforms to 'T'. 225: 221: 662: 313: 391: 261:
All of them are designed to address regular, "dictionary" words, not just names, and
85: 81: 69: 615: 332: 160: 138:'WH' transforms to 'W' if at the beginning. Drop 'W' if not followed by a vowel. 17: 348: 642: 632: 627: 141:'X' transforms to 'S' if at the beginning. Otherwise, 'X' transforms to 'KS'. 648: 490:"Best Faces Forward: A Large-scale Study of People Search in the Enterprise" 466: 319:
Perhaps the first example of stable adaptation of non-English metaphone was
284:
vowels after an initial vowel sound will be disregarded and not encoded, and
99:
If the word begins with 'KN', 'GN', 'PN', 'AE', 'WR', drop the first letter.
65: 417: 309: 45: 618:
Metaphone for Brazilian Portuguese, in C with PHP and PostgreSQL port.
599: 488:
Guy, Ido; Ur, Sigalit; Ronen, Inbal; Weber, Sara; Oral, Tolga (2012).
52:
Philips later produced a new version of the algorithm, which he named
652: 450: 89: 88:", and the others represent their usual English pronunciations. The 621: 77: 73: 30:"Lawrence Philips" redirects here. For the football player, see 522: 204:
Double Metaphone tries to account for myriad irregularities in
547: 49:
available as a built-in operator in a number of systems.
129:'S' transforms to 'X' if followed by 'H', 'IO', or 'IA'. 354:
New York State Identification and Intelligence System
272:
phonetic representation, according to this standard:
189:, while the name "Schmidt" yields a primary code of 602:, By Lawrence Phillips, June 1, 2000, Dr Dobb's, 117:Drop 'H' if after vowel and not before a vowel. 331:municipality of Brazil, and it evolved to the 308:and other languages, having been preferred to 96:Drop duplicate adjacent letters, except for C. 68:symbols 0BFHJKLMNPRSTWXY. The '0' represents " 102:Drop 'B' if after 'M' at the end of the word. 8: 624:Metaphone for Brazilian Portuguese, in Java. 159:been made available under the terms of the 150:Drop all vowels unless it is the beginning. 610:Metaphone algorithms for other languages 523:"Lawrence Philips' Metaphone Algorithm" 467:"The double metaphone search algorithm" 374: 638:Double Metaphone algorithm for Amharic 633:Double Metaphone algorithm for Bangla 600:The Double Metaphone Search Algorithm 53: 27:Phonetic algorithm for indexing words 7: 144:Drop 'Y' if not followed by a vowel. 64:Original Metaphone codes use the 16 573:"OpenRefine source for Metaphone3" 25: 465:Philips, Lawrence (June 2000). 1: 622:Brazilian Portuguese in Java 300:Metaphone of other languages 163:via the OpenRefine project. 628:Spanish Metaphone in Python 416:Philips, Lawrence (1999) . 364:Approximate string matching 685: 548:"Anthropomorphic Software" 327:as a database solution in 29: 643:Russian Metaphone in Ruby 616:Brazilian Portuguese in C 392:"Morfoedro - Technology" 304:Metaphone is useful for 264:Metaphone algorithms do 193:and a secondary code of 185:and a secondary code of 123:'PH' transforms to 'F'. 120:'CK' transforms to 'K'. 147:'Z' transforms to 'S'. 135:'V' transforms to 'F'. 126:'Q' transforms to 'K'. 359:Match Rating Approach 253:Common misconceptions 321:Brazilian Portuguese 669:Phonetic algorithms 502:on December 1, 2023 471:C/C++ Users Journal 325:originated in ~2008 174:C/C++ Users Journal 80:), 'X' represents " 418:"Double Metaphone" 167:Double Metaphone 42:phonetic algorithm 552:www.amorphics.com 521:Atkinson, Kevin. 76:approximation of 32:Lawrence Phillips 16:(Redirected from 676: 649:Double Metaphone 604:Original article 588: 587: 585: 583: 569: 563: 562: 560: 558: 544: 538: 537: 535: 533: 518: 512: 511: 509: 507: 501: 495:. Archived from 494: 485: 479: 478: 462: 456: 455: 442: 436: 435: 433: 431: 422: 413: 407: 406: 404: 402: 396:www.morfoedro.it 388: 382: 379: 336: 306:English variants 54:Double Metaphone 21: 18:Double Metaphone 684: 683: 679: 678: 677: 675: 674: 673: 659: 658: 612: 596: 591: 581: 579: 571: 570: 566: 556: 554: 546: 545: 541: 531: 529: 520: 519: 515: 505: 503: 499: 492: 487: 486: 482: 464: 463: 459: 444: 443: 439: 429: 427: 420: 415: 414: 410: 400: 398: 390: 389: 385: 380: 376: 372: 345: 334: 329:Várzea Paulista 302: 255: 246: 169: 62: 35: 28: 23: 22: 15: 12: 11: 5: 682: 680: 672: 671: 661: 660: 657: 656: 646: 640: 635: 630: 625: 619: 611: 608: 607: 606: 595: 594:External links 592: 590: 589: 564: 539: 513: 480: 457: 454:. 19 May 2022. 437: 408: 383: 373: 371: 368: 367: 366: 361: 356: 351: 344: 341: 335:metaphone-ptbr 301: 298: 293: 292: 291: 290: 289: 288: 285: 282: 274: 273: 262: 254: 251: 245: 242: 168: 165: 152: 151: 148: 145: 142: 139: 136: 133: 130: 127: 124: 121: 118: 115: 112: 109: 106: 103: 100: 97: 61: 58: 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 681: 670: 667: 666: 664: 655:in JavaScript 654: 650: 647: 644: 641: 639: 636: 634: 631: 629: 626: 623: 620: 617: 614: 613: 609: 605: 601: 598: 597: 593: 578: 574: 568: 565: 553: 549: 543: 540: 528: 524: 517: 514: 498: 491: 484: 481: 476: 472: 468: 461: 458: 453: 452: 447: 441: 438: 426: 419: 412: 409: 397: 393: 387: 384: 378: 375: 369: 365: 362: 360: 357: 355: 352: 350: 347: 346: 342: 340: 338: 330: 326: 322: 317: 315: 314:Indo-European 311: 307: 299: 297: 286: 283: 280: 279: 278: 277: 276: 275: 271: 267: 263: 260: 259: 258: 252: 250: 243: 241: 239: 235: 231: 227: 223: 219: 215: 211: 207: 202: 200: 196: 192: 188: 184: 178: 176: 175: 166: 164: 162: 156: 149: 146: 143: 140: 137: 134: 131: 128: 125: 122: 119: 116: 113: 110: 107: 104: 101: 98: 95: 94: 93: 91: 87: 83: 79: 75: 71: 67: 59: 57: 55: 50: 47: 43: 39: 33: 19: 603: 580:. Retrieved 576: 567: 555:. Retrieved 551: 542: 530:. Retrieved 526: 516: 506:February 23, 504:. Retrieved 497:the original 483: 474: 470: 460: 449: 446:"OpenRefine" 440: 430:February 23, 428:. Retrieved 424: 411: 399:. Retrieved 395: 386: 377: 318: 303: 294: 269: 265: 256: 247: 203: 198: 194: 190: 186: 182: 179: 172: 170: 157: 153: 63: 51: 37: 36: 477:(6): 38–43. 312:in several 270:approximate 244:Metaphone 3 201:in common. 197:—both have 161:BSD License 577:github.com 527:aspell.net 425:GNU Aspell 370:References 349:Caverphone 653:Metaphone 337:algorithm 72:" (as an 66:consonant 60:Procedure 38:Metaphone 663:Category 343:See also 333:current 214:Germanic 310:Soundex 238:Chinese 234:Spanish 230:Italian 206:English 46:Soundex 557:16 May 532:16 May 451:GitHub 401:16 May 226:French 218:Celtic 210:Slavic 90:vowels 84:" or " 582:2 Nov 500:(PDF) 493:(PDF) 421:(CPP) 323:: it 222:Greek 74:ASCII 40:is a 651:and 584:2020 559:2018 534:2018 508:2024 432:2024 403:2018 266:not 208:of 199:XMT 195:SMT 191:XMT 187:XMT 183:SM0 665:: 575:. 550:. 525:. 475:18 473:. 469:. 448:. 423:. 394:. 339:. 236:, 232:, 228:, 224:, 220:, 216:, 212:, 86:ch 82:sh 70:th 645:. 586:. 561:. 536:. 510:. 434:. 405:. 78:Θ 34:. 20:)

Index

Double Metaphone
Lawrence Phillips
phonetic algorithm
Soundex
Double Metaphone
consonant
th
ASCII
Θ
sh
ch
vowels
BSD License
C/C++ Users Journal
English
Slavic
Germanic
Celtic
Greek
French
Italian
Spanish
Chinese
English variants
Soundex
Indo-European
Brazilian Portuguese
originated in ~2008
Várzea Paulista
current metaphone-ptbr algorithm

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.