Knowledge (XXG)

Naive Bayes spam filtering

Source 📝

1117:" in spam email, but will seldom see it in other email. The filter doesn't know these probabilities in advance, and must first be trained so it can build them up. To train the filter, the user must manually indicate whether a new email is spam or not. For all words in each training email, the filter will adjust the probabilities that each word will appear in spam or legitimate email in its database. For instance, Bayesian spam filters will typically have learned a very high spam probability for the words "Viagra" and "refinance", but a very low spam probability for words seen only in legitimate email, such as the names of friends and family members. 3394:
which would contain the sensitive words like «Viagra». However, since many mail clients disable the display of linked pictures for security reasons, the spammer sending links to distant pictures might reach fewer targets. Also, a picture's size in bytes is bigger than the equivalent text's size, so the spammer needs more bandwidth to send messages directly including pictures. Some filters are more inclined to decide that a message is spam if it has mostly graphical contents. A solution used by
3127:. More generally, some bayesian filtering filters simply ignore all the words which have a spamicity next to 0.5, as they contribute little to a good decision. The words taken into consideration are those whose spamicity is next to 0.0 (distinctive signs of legitimate messages), or next to 1.0 (distinctive signs of spam). A method can be for example to keep only those ten words, in the examined message, which have the greatest 1888:. This condition is not generally satisfied (for example, in natural languages like English the probability of finding an adjective is affected by the probability of having a noun), but it is a useful idealization, especially since the statistical correlations between individual words are usually not known. On this basis, one can derive the following formula from Bayes' theorem: 3308: 1563: 36: 3145:(sequences of words) instead of isolated natural languages words. For example, with a "context window" of four words, they compute the spamicity of "Viagra is good for", instead of computing the spamicities of "Viagra", "is", "good", and "for". This method gives more sensitivity to context and eliminates the Bayesian noise better, at the expense of a bigger database. 3389:
Words that normally appear in large quantities in spam may also be transformed by spammers. For example, «Viagra» would be replaced with «Viaagra» or «V!agra» in the spam message. The recipient of the message can still read the changed words, but each of these words is met more rarely by the Bayesian
3393:
Another technique used to try to defeat Bayesian spam filters is to replace text with pictures, either directly included or linked. The whole text of the message, or some part of it, is replaced with a picture where the same text is "drawn". The spam filter is usually unable to analyze this picture,
1871:
is approximated to the frequency of messages containing "replica" in the messages identified as ham during the learning phase. For these approximations to make sense, the set of learned messages needs to be big and representative enough. It is also advisable that the learned set of messages conforms
1053:
Bayesian algorithms were used for email filtering as early as 1996. Although naive Bayesian filters did not become popular until later, multiple programs were released in 1998 to address the growing problem of unwanted email. The first scholarly publication on Bayesian spam filtering was by Sahami
3385:
tactics include insertion of random innocuous words that are not normally associated with spam, thereby decreasing the email's spam score, making it more likely to slip past a Bayesian spam filter. However, with (for example) Paul Graham's scheme only the most significant probabilities are used, so
3153:
There are other ways of combining individual probabilities for different words than using the "naive" approach. These methods differ from it on the assumptions they make on the statistical properties of the input data. These different hypotheses result in radically different formulas for combining
3611:
Sharpen your pencils, this is the mathematical background (such as it is).* The paper that started the ball rolling: Paul Graham's A Plan for Spam.* Gary Robinson has an interesting essay suggesting some improvements to Graham's original approach.* Gary Robinson's Linux Journal article discussed
3360:
The spam that a user receives is often related to the online user's activities. For example, a user may have been subscribed to an online newsletter that the user considers to be spam. This online newsletter is likely to contain words that are common to all newsletters, such as the name of the
1193:". Most people who are used to receiving e-mail know that this message is likely to be spam, more precisely a proposal to sell counterfeit copies of well-known brands of watches. The spam detection software, however, does not "know" such facts; all it can do is compute probabilities. 1875:
Of course, determining whether a message is spam or ham based only on the presence of the word "replica" is error-prone, which is why bayesian spam software tries to consider several words and combine their spamicities to determine a message's overall probability of being spam.
3381:, a technique used by spammers in an attempt to degrade the effectiveness of spam filters that rely on Bayesian filtering. A spammer practicing Bayesian poisoning will send out emails with large amounts of legitimate text (gathered from legitimate news or literary sources). 2779:
More generally, the words that were encountered only a few times during the learning phase cause a problem, because it would be an error to trust blindly the information they provide. A simple solution is to simply avoid taking such unreliable words into account as well.
2775:
In the case a word has never been met during the learning phase, both the numerator and the denominator are equal to zero, both in the general formula and in the spamicity formula. The software can decide to discard such words for which there is no information available.
3364:
The legitimate e-mails a user receives will tend to be different. For example, in a corporate environment, the company name and the names of clients or customers will be mentioned often. The filter will assign a lower spam probability to emails containing those names.
1124:) are used to compute the probability that an email with a particular set of words in it belongs to either category. Each word in the email contributes to the email's spam probability, or only the most interesting words. This contribution is called the 2058: 3368:
The word probabilities are unique to each user and can evolve over time with corrective training whenever the filter incorrectly classifies an email. As a result, Bayesian spam filtering accuracy after training is often superior to pre-defined rules.
3414:
While Bayesian filtering is used widely to identify spam email, the technique can classify (or "cluster") almost any sort of data. It has uses in science, medicine, and engineering. One example is a general purpose classification program called
2422: 1356: 1146:
The initial training can usually be refined when wrong judgements from the software are identified (false positives or false negatives). That allows the software to dynamically adapt to the ever-evolving nature of spam.
2901: 2558: 3833: 1789: 3269: 3390:
filter, which hinders its learning process. As a general rule, this spamming technique does not work very well, because the derived words end up recognized by the filter just like the normal ones.
1677:
The filters that use this hypothesis are said to be "not biased", meaning that they have no prejudice regarding the incoming email. This assumption permits simplifying the general formula to:
916: 2765: 2711: 2662: 954: 3925:
Zheng, J.; Tang, Yongchuan (2005). "One Generalization of the Naive Bayes to Fuzzy Sets and the Design of the Fuzzy Naive Bayes Classifier". In Mira, Jose; Álvarez, Jose R (eds.).
1672: 3664: 1894: 1132:. Then, the email's spam probability is computed over all words in the email, and if the total exceeds a certain threshold (say 95%), the filter will mark the email as a spam. 911: 2949: 901: 742: 3871:. Lyon, France: Software and Knowledge Engineering Laboratory Institute of Informatics and Telecommunications National Centre for Scientific Research “Demokritos”: 1–13. 2281: 2230: 2157: 1205: 3067: 1869: 1832: 1539: 1469: 1399: 3006: 1884:
Most bayesian spam filtering algorithms are based on formulas that are strictly valid (from a probabilistic standpoint) only if the words present in the message are
1500: 1430: 1154:(pre-defined rules about the contents, looking at the message's envelope, etc.), resulting in even higher filtering accuracy, sometimes at the cost of adaptiveness. 3113: 2186: 2113: 949: 3028: 2971: 2084: 1834:
used in this formula is approximated to the frequency of messages containing "replica" in the messages identified as spam during the learning phase. Similarly,
906: 757: 488: 1031:
Naive Bayes classifiers work by correlating the use of tokens (typically words, or sometimes other things), with spam and non-spam e-mails and then using
3649: 989: 792: 3361:
newsletter and its originating email address. A Bayesian spam filter will eventually assign a higher probability based on the user's specific patterns.
3811: 3863:
Androutsopoulos, Ion; Paliouras, Georgios; Karkaletsis, Vangelis; Sakkis, Georgios; Spyropoulos, Constantine D.; Stamatopoulos, Panagiotis (2000).
868: 417: 3123:"Neutral" words like "the", "a", "some", or "is" (in English), or their equivalents in other languages, can be ignored. These are also known as 4155: 1177:
a second time, to compute the probability that the message is spam, taking into consideration all of its words (or a relevant subset of them);
3942: 3909: 3567: 2797: 2433: 1139:
technique, email marked as spam can then be automatically moved to a "Junk" email folder, or even deleted outright. Some software implement
1045:
spam detection rates that are generally acceptable to users. It is one of the oldest ways of doing spam filtering, with roots in the 1990s.
3518: 2783:
Applying again Bayes' theorem, and assuming the classification between spam and ham of the emails containing a given word ("replica") is a
926: 689: 224: 3489: 1101:, oft cited as a Bayesian filter, is not intended to use a Bayes filter in production, but includes the ″unigram″ feature for reference. 4195: 3671: 944: 3698: 4170: 3983: 777: 752: 701: 3864: 3628: 3483: 3347: 1683: 1602: 825: 820: 473: 3171: 3867:; Rajman, M; Zaragoza, H (eds.). "Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach". 3279: 3138:
Some software products take into account the fact that a given word appears several times in the examined message, others don't.
483: 121: 3325: 1580: 3599: 878: 982: 642: 463: 3419:
which was originally used to classify stars according to spectral characteristics that were otherwise too subtle to notice.
1794:
This is functionally equivalent to asking, "what percentage of occurrences of the word 'replica' appear in spam messages?"
3546: 3403: 853: 555: 331: 3752: 3329: 3318: 1584: 1573: 1041:
is a baseline technique for dealing with spam that can tailor itself to the email needs of individual users and give low
4356: 4150: 4032: 810: 747: 657: 635: 478: 468: 1872:
to the 50% hypothesis about repartition between spam and ham, i.e. that the datasets of spam and ham are of same size.
1113:
of occurring in spam email and in legitimate email. For instance, most email users will frequently encounter the word "
3726: 961: 873: 858: 319: 141: 54: 1174:
a first time, to compute the probability that the message is spam, knowing that a given word appears in this message;
848: 921: 598: 493: 281: 214: 174: 3791: 2255:
is lower than the threshold, the message is considered as likely ham, otherwise it is considered as likely spam.
975: 581: 349: 219: 2722: 2670: 2566: 4165: 4062: 4027: 3443: 3386:
that padding the text out with non-spam-related words does not affect the detection probability significantly.
3286: 3158: 2244: 1885: 603: 523: 446: 364: 194: 156: 151: 111: 106: 3416: 2053:{\displaystyle p={\frac {p_{1}p_{2}\cdots p_{N}}{p_{1}p_{2}\cdots p_{N}+(1-p_{1})(1-p_{2})\cdots (1-p_{N})}}} 1797:
This quantity is called "spamicity" (or "spaminess") of the word "replica", and can be computed. The number
1025: 550: 399: 299: 126: 3807: 1621: 4000: 3976: 2240: 1002: 730: 706: 608: 369: 344: 304: 116: 49: 3929:. Lecture Notes in Computer Science. Vol. 3562. Berlin: Springer, Berlin, Heidelberg. p. 281. 1143:
mechanisms that define a time frame during which the user is allowed to review the software's decision.
1125: 684: 506: 458: 314: 229: 101: 4205: 4175: 4142: 3882: 3428: 2951:
is the corrected probability for the message to be spam, knowing that it contains a given word ;
2268: 1093:, make use of Bayesian spam filtering techniques, and the functionality is sometimes embedded within 1090: 1009: 613: 563: 44: 4180: 4022: 3836:; Jin, Q; Jiang, X; Park, J (eds.). "A Modified Minimum Risk Bayes and It's Application in Spam". 3448: 2417:{\displaystyle {\frac {1}{p}}-1={\frac {(1-p_{1})(1-p_{2})\dots (1-p_{N})}{p_{1}p_{2}\dots p_{N}}}} 1121: 1057:
Variants of the basic technique have been implemented in a number of research works and commercial
716: 652: 623: 528: 354: 287: 273: 259: 234: 184: 136: 96: 3869:
4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2000)
3571: 1615:
Statistics show that the current probability of any message being spam is 80%, at the very least:
1351:{\displaystyle \Pr(S|W)={\frac {\Pr(W|S)\cdot \Pr(S)}{\Pr(W|S)\cdot \Pr(S)+\Pr(W|H)\cdot \Pr(H)}}} 4351: 3872: 3433: 3378: 1062: 1017: 694: 618: 404: 199: 3511: 2912: 2191: 2118: 4346: 4160: 4103: 3969: 3948: 3938: 3905: 3643: 3479: 3469: 3035: 2788: 1837: 1800: 1546: 1507: 1437: 1367: 1197: 1167: 1129: 1058: 1032: 787: 630: 543: 339: 309: 254: 249: 204: 146: 4301: 4248: 3930: 3845: 3775:"Bayesian Noise Reduction: Contextual Symmetry Logic Utilizing Pattern Consistency Analysis" 3695: 2982: 2251:
is typically compared to a given threshold to decide whether the message is spam or not. If
1476: 1406: 1098: 1066: 1013: 815: 568: 518: 428: 412: 382: 244: 239: 189: 179: 77: 3086: 2164: 2091: 1074: 4313: 4283: 4017: 3795: 3702: 3624: 3438: 2784: 843: 647: 513: 453: 3083:
is equal to zero (and where the spamicity is not defined), and evaluates in this case to
3886: 4323: 4293: 4243: 4185: 4108: 4098: 4042: 3128: 3013: 2956: 2069: 1136: 1042: 863: 394: 131: 17: 3927:
Artificial Intelligence and Knowledge Engineering Applications: A Bioinspired Approach
3076:
This corrected probability is used instead of the spamicity in the combining formula.
1401:
is the probability that a message is a spam, knowing that the word "replica" is in it;
4340: 4258: 4223: 4123: 4057: 3714: 2232:
that the second word (for example "watches") appears, given that the message is spam;
1151: 782: 711: 593: 324: 209: 3961: 2159:
that the first word (for example "replica") appears, given that the message is spam;
4306: 4263: 4233: 4083: 3595: 1163: 1078: 3849: 2275:
can be computed in the log domain by rewriting the original equation as follows:
4268: 4078: 3382: 1150:
Some spam filters combine the results of both Bayesian spam filtering and other
1110: 1094: 1070: 588: 82: 3542: 3377:
Depending on the implementation, Bayesian spam filtering may be susceptible to
4238: 4215: 4088: 4047: 4009: 3748: 3124: 1140: 1086: 1021: 1006: 737: 433: 359: 3952: 4273: 4253: 4128: 3475: 2239:
Spam filtering software based on this formula is sometimes referred to as a
1082: 896: 677: 3718: 3808:"Gmail uses Google's innovative technology to keep spam out of your inbox" 3030:
is the number of occurrences of this word during the learning phase ;
4318: 4228: 4113: 4093: 3992: 1502:
is the overall probability that any given message is not spam (is "ham");
3934: 1185:
Computing the probability that a message containing a given word is spam
4200: 4118: 4052: 2896:{\displaystyle \Pr '(S|W)={\frac {s\cdot \Pr(S)+n\cdot \Pr(S|W)}{s+n}}} 1190: 672: 3788: 3583:
Gary Robinson's f(x) and combining algorithms, as used in SpamAssassin
2553:{\displaystyle \ln \left({\frac {1}{p}}-1\right)=\sum _{i=1}^{N}\left} 2259:
Other expression of the formula for combining individual probabilities
3395: 1114: 423: 3285:
Individual probabilities can be combined with the techniques of the
2716:
Hence the alternate formula for computing the combined probability:
1471:
is the probability that the word "replica" appears in spam messages;
1196:
The formula used by the software to determine that, is derived from
3877: 3774: 1541:
is the probability that the word "replica" appears in ham messages.
1065:
implement Bayesian spam filtering. Users can also install separate
4278: 4037: 3904:. London; Berlin: Springer- Verlag Heidelberg Berlin. p. 70. 3399: 667: 662: 389: 3902:
The NaĂŻve Bayes Model for Unsupervised Word Sense Disambiguation
3832:
Zhu, Z.; Jia, Z; Xiao, H; Zhang, G; Liang, H.; Wang, P. (2014).
3965: 1170:. Bayes' theorem is used several times in the context of spam: 3301: 1784:{\displaystyle \Pr(S|W)={\frac {\Pr(W|S)}{\Pr(W|S)+\Pr(W|H)}}} 1556: 29: 3406:
on every mid to large size image, analyzing the text inside.
3264:{\displaystyle p=C^{-1}(-2\ln(p_{1}p_{2}\cdots p_{N}),2N)\,} 3157:
For example, assuming the individual probabilities follow a
3008:
is the probability of any incoming message to be spam ;
2977:
we give to background information about incoming spam ;
1035:
to calculate a probability that an email is or is not spam.
1432:
is the overall probability that any given message is spam;
3517:. AAAI'98 Workshop on Learning for Text Categorization. 2267:
is not directly computed using the above formula due to
955:
List of datasets in computer vision and image processing
3510:
M. Sahami; S. Dumais; D. Heckerman; E. Horvitz (1998).
3451:
mail client with native implementation of Bayes filters
2791:, some programs decide to use a corrected probability: 1189:
Let's suppose the suspected message contains the word "
1120:
After training, the word probabilities (also known as
3328:. Please help to ensure that disputed statements are 3174: 3089: 3038: 3016: 2985: 2959: 2915: 2800: 2725: 2673: 2569: 2436: 2284: 2194: 2167: 2121: 2094: 2072: 1897: 1840: 1803: 1686: 1624: 1583:. Please help to ensure that disputed statements are 1510: 1479: 1440: 1409: 1370: 1208: 2086:
is the probability that the suspect message is spam;
4292: 4214: 4141: 4071: 4008: 3999: 3263: 3107: 3061: 3022: 3000: 2965: 2943: 2895: 2759: 2705: 2656: 2552: 2416: 2224: 2180: 2151: 2107: 2078: 2052: 1863: 1826: 1783: 1666: 1533: 1494: 1463: 1424: 1393: 1350: 3039: 2986: 2917: 2859: 2838: 2802: 1841: 1804: 1758: 1735: 1713: 1687: 1646: 1625: 1511: 1480: 1441: 1410: 1371: 1333: 1310: 1295: 1272: 1258: 1235: 1209: 1180:sometimes a third time, to deal with rare words. 3165:degrees of freedom, one could use the formula: 3079:This formula can be extended to the case where 3665:"State of Spam, a Monthly Report - Report #33" 3512:"A Bayesian approach to filtering junk e-mail" 950:List of datasets for machine-learning research 3977: 2247:assumptions between the features. The result 983: 8: 3749:"SpamProbe - Bayesian Spam Filtering Tweaks" 3719:"A statistical approach to the spam problem" 3570:. Ubuntu manuals. 2010-09-18. Archived from 43:It has been suggested that this article be 4005: 3984: 3970: 3962: 3410:General applications of Bayesian filtering 2760:{\displaystyle p={\frac {1}{1+e^{\eta }}}} 2706:{\displaystyle {\frac {1}{p}}-1=e^{\eta }} 2657:{\displaystyle \eta =\sum _{i=1}^{N}\left} 990: 976: 68: 3876: 3348:Learn how and when to remove this message 3260: 3239: 3226: 3216: 3185: 3173: 3088: 3048: 3037: 3015: 2984: 2958: 2930: 2914: 2868: 2829: 2815: 2799: 2748: 2732: 2724: 2697: 2674: 2672: 2643: 2621: 2591: 2580: 2568: 2539: 2517: 2487: 2476: 2448: 2435: 2405: 2392: 2382: 2367: 2342: 2320: 2304: 2285: 2283: 2211: 2205: 2193: 2172: 2166: 2138: 2132: 2120: 2099: 2093: 2071: 2038: 2013: 1991: 1969: 1956: 1946: 1934: 1921: 1911: 1904: 1896: 1850: 1839: 1813: 1802: 1767: 1744: 1722: 1710: 1696: 1685: 1623: 1603:Learn how and when to remove this message 1520: 1509: 1478: 1450: 1439: 1408: 1380: 1369: 1319: 1281: 1244: 1232: 1218: 1207: 3663:Dylan Mors & Dermot Harnett (2009). 3324:Relevant discussion may be found on the 1579:Relevant discussion may be found on the 3838:Lecture Notes in Electrical Engineering 3460: 76: 3648:: CS1 maint: archived copy as title ( 3641: 3471:Spam: A Shadow History of the Internet 3602:from the original on 6 September 2010 1667:{\displaystyle \Pr(S)=0.8;\Pr(H)=0.2} 7: 3612:using the chi squared distribution. 3404:OCR (Optical Character Recognition) 3280:inverse of the chi-squared function 945:Glossary of artificial intelligence 27:Technique for filtering spam e-mail 4171:Distributed Checksum Clearinghouse 3844:. Dordrecht: Springer: 2155–2159. 3696:Introduction to Bayesian Filtering 2243:, as "naive" refers to the strong 1880:Combining individual probabilities 25: 4156:Challenge–response spam filtering 3993:Unsolicited digital communication 3598:. SpamBayes project. 2010-09-18. 1109:Particular words have particular 3306: 1561: 34: 3900:Hristea, Florentina T. (2013). 3814:from the original on 2015-09-13 3755:from the original on 2012-03-01 3729:from the original on 2010-10-22 3631:from the original on 2016-10-07 3549:from the original on 2012-10-23 3524:from the original on 2007-09-27 3492:from the original on 2019-03-23 1545:(For a full demonstration, see 1024:, an approach commonly used in 3773:Jonathan A. Zdziarski (2004). 3545:. MozillaZine. November 2009. 3402:email system is to perform an 3257: 3245: 3209: 3194: 3154:the individual probabilities. 3102: 3096: 3069:is the spamicity of this word. 3056: 3049: 3042: 2995: 2989: 2938: 2931: 2924: 2876: 2869: 2862: 2847: 2841: 2823: 2816: 2809: 2627: 2608: 2523: 2504: 2373: 2354: 2348: 2329: 2326: 2307: 2219: 2212: 2198: 2146: 2139: 2125: 2044: 2025: 2019: 2000: 1997: 1978: 1858: 1851: 1844: 1821: 1814: 1807: 1775: 1768: 1761: 1752: 1745: 1738: 1730: 1723: 1716: 1704: 1697: 1690: 1655: 1649: 1634: 1628: 1528: 1521: 1514: 1489: 1483: 1458: 1451: 1444: 1419: 1413: 1388: 1381: 1374: 1342: 1336: 1327: 1320: 1313: 1304: 1298: 1289: 1282: 1275: 1267: 1261: 1252: 1245: 1238: 1226: 1219: 1212: 365:Relevance vector machine (RVM) 1: 3850:10.1007/978-94-007-7618-0_261 854:Computational learning theory 418:Expectation–maximization (EM) 1547:Bayes' theorem#Extended form 811:Coefficient of determination 658:Convolutional neural network 370:Support vector machine (SVM) 3141:Some software products use 2427:Taking logs on both sides: 1061:products. Many modern mail 962:Outline of machine learning 859:Empirical risk minimization 60:Proposed since August 2024. 4373: 4191:Naive Bayes spam filtering 2944:{\displaystyle \Pr '(S|W)} 2225:{\displaystyle p(W_{2}|S)} 2152:{\displaystyle p(W_{1}|S)} 1039:Naive Bayes spam filtering 599:Feedforward neural network 350:Artificial neural networks 582:Artificial neural network 4166:Disposable email address 4028:Directory harvest attack 3444:Markovian discrimination 3287:Markovian discrimination 3159:chi-squared distribution 3062:{\displaystyle \Pr(S|W)} 2269:floating-point underflow 1864:{\displaystyle \Pr(W|H)} 1827:{\displaystyle \Pr(W|S)} 1553:The spamliness of a word 1534:{\displaystyle \Pr(W|H)} 1464:{\displaystyle \Pr(W|S)} 1394:{\displaystyle \Pr(S|W)} 1067:email filtering programs 891:Journals and conferences 838:Mathematical foundations 748:Temporal difference (TD) 604:Recurrent neural network 524:Conditional random field 447:Dimensionality reduction 195:Dimensionality reduction 157:Quantum machine learning 152:Neuromorphic engineering 112:Self-supervised learning 107:Semi-supervised learning 3131: |0.5 âˆ’  2771:Dealing with rare words 1158:Mathematical foundation 1073:email filters, such as 1003:Naive Bayes classifiers 300:Apprenticeship learning 18:Bayesian spam filtering 3468:Brunton, Finn (2013). 3265: 3109: 3063: 3024: 3002: 3001:{\displaystyle \Pr(S)} 2967: 2945: 2897: 2761: 2707: 2658: 2596: 2554: 2492: 2418: 2241:naive Bayes classifier 2226: 2182: 2153: 2109: 2080: 2054: 1865: 1828: 1785: 1668: 1535: 1496: 1495:{\displaystyle \Pr(H)} 1465: 1426: 1425:{\displaystyle \Pr(S)} 1395: 1352: 1128:and is computed using 849:Bias–variance tradeoff 731:Reinforcement learning 707:Spiking neural network 117:Reinforcement learning 50:Naive Bayes classifier 3747:Brian Burton (2003). 3266: 3110: 3108:{\displaystyle Pr(S)} 3064: 3025: 3003: 2968: 2946: 2898: 2762: 2708: 2659: 2576: 2555: 2472: 2419: 2227: 2183: 2181:{\displaystyle p_{2}} 2154: 2110: 2108:{\displaystyle p_{1}} 2081: 2055: 1866: 1829: 1786: 1669: 1536: 1497: 1466: 1427: 1396: 1353: 1126:posterior probability 1020:features to identify 1016:. They typically use 685:Neural radiance field 507:Structured prediction 230:Structured prediction 102:Unsupervised learning 4176:Email authentication 3787:Paul Graham (2002), 3596:"Background Reading" 3574:on 29 September 2010 3543:"Junk Mail Controls" 3429:Anti-spam techniques 3317:factual accuracy is 3172: 3087: 3036: 3014: 2983: 2957: 2913: 2798: 2723: 2671: 2567: 2434: 2282: 2192: 2165: 2119: 2092: 2070: 1895: 1838: 1801: 1684: 1622: 1572:factual accuracy is 1508: 1477: 1438: 1407: 1368: 1206: 1122:likelihood functions 874:Statistical learning 772:Learning with humans 564:Local outlier factor 4357:Bayesian estimation 4023:Bulk email software 3935:10.1007/11499305_29 3887:2000cs........9009A 3449:Mozilla Thunderbird 2188:is the probability 2115:is the probability 1026:text classification 717:Electrochemical RAM 624:reservoir computing 355:Logistic regression 274:Supervised learning 260:Multimodal learning 235:Feature engineering 180:Generative modeling 142:Rule-based learning 137:Curriculum learning 97:Supervised learning 72:Part of a series on 3794:2004-04-04 at the 3701:2012-02-06 at the 3694:Process Software, 3434:Bayesian poisoning 3379:Bayesian poisoning 3261: 3105: 3059: 3020: 2998: 2963: 2941: 2893: 2757: 2703: 2654: 2550: 2414: 2222: 2178: 2149: 2105: 2076: 2050: 1886:independent events 1861: 1824: 1781: 1664: 1531: 1492: 1461: 1422: 1391: 1348: 285: • 200:Density estimation 4334: 4333: 4161:Context filtering 4137: 4136: 3944:978-3-540-26319-7 3911:978-3-642-33692-8 3358: 3357: 3350: 3073:(Demonstration:) 3023:{\displaystyle n} 2966:{\displaystyle s} 2891: 2789:beta distribution 2755: 2682: 2456: 2412: 2293: 2079:{\displaystyle p} 2048: 1779: 1613: 1612: 1605: 1346: 1097:software itself. 1000: 999: 805:Model diagnostics 788:Human-in-the-loop 631:Boltzmann machine 544:Anomaly detection 340:Linear regression 255:Ontology learning 250:Grammar induction 225:Semantic analysis 220:Association rules 205:Anomaly detection 147:Neuro-symbolic AI 67: 66: 62: 16:(Redirected from 4364: 4302:Advance-fee scam 4249:Keyword stuffing 4006: 3986: 3979: 3972: 3963: 3957: 3956: 3922: 3916: 3915: 3897: 3891: 3890: 3880: 3860: 3854: 3853: 3829: 3823: 3822: 3820: 3819: 3804: 3798: 3785: 3779: 3778: 3770: 3764: 3763: 3761: 3760: 3744: 3738: 3737: 3735: 3734: 3711: 3705: 3692: 3686: 3685: 3683: 3682: 3676: 3670:. Archived from 3669: 3660: 3654: 3653: 3647: 3639: 3637: 3636: 3621: 3615: 3614: 3608: 3607: 3592: 3586: 3585: 3580: 3579: 3564: 3558: 3557: 3555: 3554: 3539: 3533: 3532: 3530: 3529: 3523: 3516: 3507: 3501: 3500: 3498: 3497: 3465: 3353: 3346: 3342: 3339: 3333: 3330:reliably sourced 3310: 3309: 3302: 3270: 3268: 3267: 3262: 3244: 3243: 3231: 3230: 3221: 3220: 3193: 3192: 3119:Other heuristics 3114: 3112: 3111: 3106: 3068: 3066: 3065: 3060: 3052: 3029: 3027: 3026: 3021: 3007: 3005: 3004: 2999: 2972: 2970: 2969: 2964: 2950: 2948: 2947: 2942: 2934: 2923: 2902: 2900: 2899: 2894: 2892: 2890: 2879: 2872: 2830: 2819: 2808: 2766: 2764: 2763: 2758: 2756: 2754: 2753: 2752: 2733: 2712: 2710: 2709: 2704: 2702: 2701: 2683: 2675: 2663: 2661: 2660: 2655: 2653: 2649: 2648: 2647: 2626: 2625: 2595: 2590: 2559: 2557: 2556: 2551: 2549: 2545: 2544: 2543: 2522: 2521: 2491: 2486: 2468: 2464: 2457: 2449: 2423: 2421: 2420: 2415: 2413: 2411: 2410: 2409: 2397: 2396: 2387: 2386: 2376: 2372: 2371: 2347: 2346: 2325: 2324: 2305: 2294: 2286: 2231: 2229: 2228: 2223: 2215: 2210: 2209: 2187: 2185: 2184: 2179: 2177: 2176: 2158: 2156: 2155: 2150: 2142: 2137: 2136: 2114: 2112: 2111: 2106: 2104: 2103: 2085: 2083: 2082: 2077: 2059: 2057: 2056: 2051: 2049: 2047: 2043: 2042: 2018: 2017: 1996: 1995: 1974: 1973: 1961: 1960: 1951: 1950: 1940: 1939: 1938: 1926: 1925: 1916: 1915: 1905: 1870: 1868: 1867: 1862: 1854: 1833: 1831: 1830: 1825: 1817: 1790: 1788: 1787: 1782: 1780: 1778: 1771: 1748: 1733: 1726: 1711: 1700: 1673: 1671: 1670: 1665: 1608: 1601: 1597: 1594: 1588: 1585:reliably sourced 1565: 1564: 1557: 1540: 1538: 1537: 1532: 1524: 1501: 1499: 1498: 1493: 1470: 1468: 1467: 1462: 1454: 1431: 1429: 1428: 1423: 1400: 1398: 1397: 1392: 1384: 1357: 1355: 1354: 1349: 1347: 1345: 1323: 1285: 1270: 1248: 1233: 1222: 1135:As in any other 1054:et al. in 1998. 1014:e-mail filtering 992: 985: 978: 939:Related articles 816:Confusion matrix 569:Isolation forest 514:Graphical models 293: 292: 245:Learning to rank 240:Feature learning 78:Machine learning 69: 58: 38: 37: 30: 21: 4372: 4371: 4367: 4366: 4365: 4363: 4362: 4361: 4337: 4336: 4335: 4330: 4314:Make Money Fast 4288: 4284:URL redirection 4210: 4133: 4067: 4018:Address munging 3995: 3990: 3960: 3945: 3924: 3923: 3919: 3912: 3899: 3898: 3894: 3862: 3861: 3857: 3831: 3830: 3826: 3817: 3815: 3806: 3805: 3801: 3796:Wayback Machine 3789:A Plan for Spam 3786: 3782: 3772: 3771: 3767: 3758: 3756: 3746: 3745: 3741: 3732: 3730: 3713: 3712: 3708: 3703:Wayback Machine 3693: 3689: 3680: 3678: 3674: 3667: 3662: 3661: 3657: 3640: 3634: 3632: 3625:"Archived copy" 3623: 3622: 3618: 3605: 3603: 3594: 3593: 3589: 3577: 3575: 3566: 3565: 3561: 3552: 3550: 3541: 3540: 3536: 3527: 3525: 3521: 3514: 3509: 3508: 3504: 3495: 3493: 3486: 3478:. p. 136. 3467: 3466: 3462: 3458: 3439:Email filtering 3425: 3412: 3375: 3354: 3343: 3337: 3334: 3323: 3315:This section's 3311: 3307: 3300: 3295: 3235: 3222: 3212: 3181: 3170: 3169: 3151: 3121: 3085: 3084: 3034: 3033: 3012: 3011: 2981: 2980: 2955: 2954: 2916: 2911: 2910: 2880: 2831: 2801: 2796: 2795: 2785:random variable 2773: 2744: 2737: 2721: 2720: 2693: 2669: 2668: 2639: 2617: 2601: 2597: 2565: 2564: 2535: 2513: 2497: 2493: 2447: 2443: 2432: 2431: 2401: 2388: 2378: 2377: 2363: 2338: 2316: 2306: 2280: 2279: 2261: 2201: 2190: 2189: 2168: 2163: 2162: 2128: 2117: 2116: 2095: 2090: 2089: 2068: 2067: 2034: 2009: 1987: 1965: 1952: 1942: 1941: 1930: 1917: 1907: 1906: 1893: 1892: 1882: 1836: 1835: 1799: 1798: 1734: 1712: 1682: 1681: 1620: 1619: 1609: 1598: 1592: 1589: 1578: 1570:This section's 1566: 1562: 1555: 1506: 1505: 1475: 1474: 1436: 1435: 1405: 1404: 1366: 1365: 1271: 1234: 1204: 1203: 1187: 1160: 1107: 1051: 996: 967: 966: 940: 932: 931: 892: 884: 883: 844:Kernel machines 839: 831: 830: 806: 798: 797: 778:Active learning 773: 765: 764: 733: 723: 722: 648:Diffusion model 584: 574: 573: 546: 536: 535: 509: 499: 498: 454:Factor analysis 449: 439: 438: 422: 385: 375: 374: 295: 294: 278: 277: 276: 265: 264: 170: 162: 161: 127:Online learning 92: 80: 63: 39: 35: 28: 23: 22: 15: 12: 11: 5: 4370: 4368: 4360: 4359: 4354: 4349: 4339: 4338: 4332: 4331: 4329: 4328: 4327: 4326: 4316: 4311: 4310: 4309: 4298: 4296: 4294:Internet fraud 4290: 4289: 4287: 4286: 4281: 4276: 4271: 4266: 4261: 4256: 4251: 4246: 4244:Google bombing 4241: 4236: 4231: 4226: 4220: 4218: 4212: 4211: 4209: 4208: 4203: 4198: 4193: 4188: 4186:List poisoning 4183: 4178: 4173: 4168: 4163: 4158: 4153: 4147: 4145: 4139: 4138: 4135: 4134: 4132: 4131: 4126: 4121: 4116: 4111: 4106: 4101: 4096: 4091: 4086: 4081: 4075: 4073: 4069: 4068: 4066: 4065: 4060: 4055: 4050: 4045: 4043:Email spoofing 4040: 4035: 4030: 4025: 4020: 4014: 4012: 4003: 3997: 3996: 3991: 3989: 3988: 3981: 3974: 3966: 3959: 3958: 3943: 3917: 3910: 3892: 3855: 3824: 3799: 3780: 3765: 3739: 3706: 3687: 3655: 3616: 3587: 3568:"Installation" 3559: 3534: 3502: 3484: 3459: 3457: 3454: 3453: 3452: 3446: 3441: 3436: 3431: 3424: 3421: 3411: 3408: 3374: 3371: 3356: 3355: 3314: 3312: 3305: 3299: 3296: 3294: 3291: 3272: 3271: 3259: 3256: 3253: 3250: 3247: 3242: 3238: 3234: 3229: 3225: 3219: 3215: 3211: 3208: 3205: 3202: 3199: 3196: 3191: 3188: 3184: 3180: 3177: 3150: 3147: 3129:absolute value 3120: 3117: 3104: 3101: 3098: 3095: 3092: 3071: 3070: 3058: 3055: 3051: 3047: 3044: 3041: 3031: 3019: 3009: 2997: 2994: 2991: 2988: 2978: 2962: 2952: 2940: 2937: 2933: 2929: 2926: 2922: 2919: 2904: 2903: 2889: 2886: 2883: 2878: 2875: 2871: 2867: 2864: 2861: 2858: 2855: 2852: 2849: 2846: 2843: 2840: 2837: 2834: 2828: 2825: 2822: 2818: 2814: 2811: 2807: 2804: 2772: 2769: 2768: 2767: 2751: 2747: 2743: 2740: 2736: 2731: 2728: 2714: 2713: 2700: 2696: 2692: 2689: 2686: 2681: 2678: 2652: 2646: 2642: 2638: 2635: 2632: 2629: 2624: 2620: 2616: 2613: 2610: 2607: 2604: 2600: 2594: 2589: 2586: 2583: 2579: 2575: 2572: 2561: 2560: 2548: 2542: 2538: 2534: 2531: 2528: 2525: 2520: 2516: 2512: 2509: 2506: 2503: 2500: 2496: 2490: 2485: 2482: 2479: 2475: 2471: 2467: 2463: 2460: 2455: 2452: 2446: 2442: 2439: 2425: 2424: 2408: 2404: 2400: 2395: 2391: 2385: 2381: 2375: 2370: 2366: 2362: 2359: 2356: 2353: 2350: 2345: 2341: 2337: 2334: 2331: 2328: 2323: 2319: 2315: 2312: 2309: 2303: 2300: 2297: 2292: 2289: 2260: 2257: 2237: 2236: 2233: 2221: 2218: 2214: 2208: 2204: 2200: 2197: 2175: 2171: 2160: 2148: 2145: 2141: 2135: 2131: 2127: 2124: 2102: 2098: 2087: 2075: 2061: 2060: 2046: 2041: 2037: 2033: 2030: 2027: 2024: 2021: 2016: 2012: 2008: 2005: 2002: 1999: 1994: 1990: 1986: 1983: 1980: 1977: 1972: 1968: 1964: 1959: 1955: 1949: 1945: 1937: 1933: 1929: 1924: 1920: 1914: 1910: 1903: 1900: 1881: 1878: 1860: 1857: 1853: 1849: 1846: 1843: 1823: 1820: 1816: 1812: 1809: 1806: 1792: 1791: 1777: 1774: 1770: 1766: 1763: 1760: 1757: 1754: 1751: 1747: 1743: 1740: 1737: 1732: 1729: 1725: 1721: 1718: 1715: 1709: 1706: 1703: 1699: 1695: 1692: 1689: 1675: 1674: 1663: 1660: 1657: 1654: 1651: 1648: 1645: 1642: 1639: 1636: 1633: 1630: 1627: 1611: 1610: 1569: 1567: 1560: 1554: 1551: 1543: 1542: 1530: 1527: 1523: 1519: 1516: 1513: 1503: 1491: 1488: 1485: 1482: 1472: 1460: 1457: 1453: 1449: 1446: 1443: 1433: 1421: 1418: 1415: 1412: 1402: 1390: 1387: 1383: 1379: 1376: 1373: 1359: 1358: 1344: 1341: 1338: 1335: 1332: 1329: 1326: 1322: 1318: 1315: 1312: 1309: 1306: 1303: 1300: 1297: 1294: 1291: 1288: 1284: 1280: 1277: 1274: 1269: 1266: 1263: 1260: 1257: 1254: 1251: 1247: 1243: 1240: 1237: 1231: 1228: 1225: 1221: 1217: 1214: 1211: 1198:Bayes' theorem 1186: 1183: 1182: 1181: 1178: 1175: 1168:Bayes' theorem 1159: 1156: 1137:spam filtering 1130:Bayes' theorem 1106: 1103: 1050: 1047: 1043:false positive 1033:Bayes' theorem 1005:are a popular 998: 997: 995: 994: 987: 980: 972: 969: 968: 965: 964: 959: 958: 957: 947: 941: 938: 937: 934: 933: 930: 929: 924: 919: 914: 909: 904: 899: 893: 890: 889: 886: 885: 882: 881: 876: 871: 866: 864:Occam learning 861: 856: 851: 846: 840: 837: 836: 833: 832: 829: 828: 823: 821:Learning curve 818: 813: 807: 804: 803: 800: 799: 796: 795: 790: 785: 780: 774: 771: 770: 767: 766: 763: 762: 761: 760: 750: 745: 740: 734: 729: 728: 725: 724: 721: 720: 714: 709: 704: 699: 698: 697: 687: 682: 681: 680: 675: 670: 665: 655: 650: 645: 640: 639: 638: 628: 627: 626: 621: 616: 611: 601: 596: 591: 585: 580: 579: 576: 575: 572: 571: 566: 561: 553: 547: 542: 541: 538: 537: 534: 533: 532: 531: 526: 521: 510: 505: 504: 501: 500: 497: 496: 491: 486: 481: 476: 471: 466: 461: 456: 450: 445: 444: 441: 440: 437: 436: 431: 426: 420: 415: 410: 402: 397: 392: 386: 381: 380: 377: 376: 373: 372: 367: 362: 357: 352: 347: 342: 337: 329: 328: 327: 322: 317: 307: 305:Decision trees 302: 296: 282:classification 272: 271: 270: 267: 266: 263: 262: 257: 252: 247: 242: 237: 232: 227: 222: 217: 212: 207: 202: 197: 192: 187: 182: 177: 175:Classification 171: 168: 167: 164: 163: 160: 159: 154: 149: 144: 139: 134: 132:Batch learning 129: 124: 119: 114: 109: 104: 99: 93: 90: 89: 86: 85: 74: 73: 65: 64: 42: 40: 33: 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 4369: 4358: 4355: 4353: 4350: 4348: 4345: 4344: 4342: 4325: 4322: 4321: 4320: 4317: 4315: 4312: 4308: 4305: 4304: 4303: 4300: 4299: 4297: 4295: 4291: 4285: 4282: 4280: 4277: 4275: 4272: 4270: 4267: 4265: 4262: 4260: 4259:Referrer spam 4257: 4255: 4252: 4250: 4247: 4245: 4242: 4240: 4237: 4235: 4232: 4230: 4227: 4225: 4222: 4221: 4219: 4217: 4213: 4207: 4204: 4202: 4199: 4197: 4194: 4192: 4189: 4187: 4184: 4182: 4179: 4177: 4174: 4172: 4169: 4167: 4164: 4162: 4159: 4157: 4154: 4152: 4149: 4148: 4146: 4144: 4140: 4130: 4127: 4125: 4124:Telemarketing 4122: 4120: 4117: 4115: 4112: 4110: 4107: 4105: 4102: 4100: 4097: 4095: 4092: 4090: 4087: 4085: 4082: 4080: 4077: 4076: 4074: 4070: 4064: 4061: 4059: 4058:Pink contract 4056: 4054: 4051: 4049: 4046: 4044: 4041: 4039: 4036: 4034: 4031: 4029: 4026: 4024: 4021: 4019: 4016: 4015: 4013: 4011: 4007: 4004: 4002: 3998: 3994: 3987: 3982: 3980: 3975: 3973: 3968: 3967: 3964: 3954: 3950: 3946: 3940: 3936: 3932: 3928: 3921: 3918: 3913: 3907: 3903: 3896: 3893: 3888: 3884: 3879: 3874: 3870: 3866: 3859: 3856: 3851: 3847: 3843: 3839: 3835: 3828: 3825: 3813: 3809: 3803: 3800: 3797: 3793: 3790: 3784: 3781: 3776: 3769: 3766: 3754: 3750: 3743: 3740: 3728: 3724: 3723:Linux Journal 3720: 3716: 3715:Gary Robinson 3710: 3707: 3704: 3700: 3697: 3691: 3688: 3677:on 2009-10-07 3673: 3666: 3659: 3656: 3651: 3645: 3630: 3626: 3620: 3617: 3613: 3601: 3597: 3591: 3588: 3584: 3573: 3569: 3563: 3560: 3548: 3544: 3538: 3535: 3520: 3513: 3506: 3503: 3491: 3487: 3485:9780262018876 3481: 3477: 3473: 3472: 3464: 3461: 3455: 3450: 3447: 3445: 3442: 3440: 3437: 3435: 3432: 3430: 3427: 3426: 3422: 3420: 3418: 3409: 3407: 3405: 3401: 3397: 3391: 3387: 3384: 3380: 3373:Disadvantages 3372: 3370: 3366: 3362: 3352: 3349: 3341: 3331: 3327: 3321: 3320: 3313: 3304: 3303: 3297: 3292: 3290: 3288: 3283: 3281: 3277: 3254: 3251: 3248: 3240: 3236: 3232: 3227: 3223: 3217: 3213: 3206: 3203: 3200: 3197: 3189: 3186: 3182: 3178: 3175: 3168: 3167: 3166: 3164: 3160: 3155: 3149:Mixed methods 3148: 3146: 3144: 3139: 3136: 3134: 3130: 3126: 3118: 3116: 3099: 3093: 3090: 3082: 3077: 3074: 3053: 3045: 3032: 3017: 3010: 2992: 2979: 2976: 2960: 2953: 2935: 2927: 2920: 2909: 2908: 2907: 2887: 2884: 2881: 2873: 2865: 2856: 2853: 2850: 2844: 2835: 2832: 2826: 2820: 2812: 2805: 2794: 2793: 2792: 2790: 2786: 2781: 2777: 2770: 2749: 2745: 2741: 2738: 2734: 2729: 2726: 2719: 2718: 2717: 2698: 2694: 2690: 2687: 2684: 2679: 2676: 2667: 2666: 2665: 2664:. Therefore, 2650: 2644: 2640: 2636: 2633: 2630: 2622: 2618: 2614: 2611: 2605: 2602: 2598: 2592: 2587: 2584: 2581: 2577: 2573: 2570: 2546: 2540: 2536: 2532: 2529: 2526: 2518: 2514: 2510: 2507: 2501: 2498: 2494: 2488: 2483: 2480: 2477: 2473: 2469: 2465: 2461: 2458: 2453: 2450: 2444: 2440: 2437: 2430: 2429: 2428: 2406: 2402: 2398: 2393: 2389: 2383: 2379: 2368: 2364: 2360: 2357: 2351: 2343: 2339: 2335: 2332: 2321: 2317: 2313: 2310: 2301: 2298: 2295: 2290: 2287: 2278: 2277: 2276: 2274: 2270: 2266: 2258: 2256: 2254: 2250: 2246: 2242: 2234: 2216: 2206: 2202: 2195: 2173: 2169: 2161: 2143: 2133: 2129: 2122: 2100: 2096: 2088: 2073: 2066: 2065: 2064: 2039: 2035: 2031: 2028: 2022: 2014: 2010: 2006: 2003: 1992: 1988: 1984: 1981: 1975: 1970: 1966: 1962: 1957: 1953: 1947: 1943: 1935: 1931: 1927: 1922: 1918: 1912: 1908: 1901: 1898: 1891: 1890: 1889: 1887: 1879: 1877: 1873: 1855: 1847: 1818: 1810: 1795: 1772: 1764: 1755: 1749: 1741: 1727: 1719: 1707: 1701: 1693: 1680: 1679: 1678: 1661: 1658: 1652: 1643: 1640: 1637: 1631: 1618: 1617: 1616: 1607: 1604: 1596: 1586: 1582: 1576: 1575: 1568: 1559: 1558: 1552: 1550: 1548: 1525: 1517: 1504: 1486: 1473: 1455: 1447: 1434: 1416: 1403: 1385: 1377: 1364: 1363: 1362: 1339: 1330: 1324: 1316: 1307: 1301: 1292: 1286: 1278: 1264: 1255: 1249: 1241: 1229: 1223: 1215: 1202: 1201: 1200: 1199: 1194: 1192: 1184: 1179: 1176: 1173: 1172: 1171: 1169: 1165: 1164:email filters 1157: 1155: 1153: 1148: 1144: 1142: 1138: 1133: 1131: 1127: 1123: 1118: 1116: 1112: 1111:probabilities 1104: 1102: 1100: 1096: 1092: 1088: 1084: 1080: 1076: 1072: 1068: 1064: 1060: 1055: 1048: 1046: 1044: 1040: 1036: 1034: 1029: 1027: 1023: 1019: 1015: 1011: 1008: 1004: 993: 988: 986: 981: 979: 974: 973: 971: 970: 963: 960: 956: 953: 952: 951: 948: 946: 943: 942: 936: 935: 928: 925: 923: 920: 918: 915: 913: 910: 908: 905: 903: 900: 898: 895: 894: 888: 887: 880: 877: 875: 872: 870: 867: 865: 862: 860: 857: 855: 852: 850: 847: 845: 842: 841: 835: 834: 827: 824: 822: 819: 817: 814: 812: 809: 808: 802: 801: 794: 791: 789: 786: 784: 783:Crowdsourcing 781: 779: 776: 775: 769: 768: 759: 756: 755: 754: 751: 749: 746: 744: 741: 739: 736: 735: 732: 727: 726: 718: 715: 713: 712:Memtransistor 710: 708: 705: 703: 700: 696: 693: 692: 691: 688: 686: 683: 679: 676: 674: 671: 669: 666: 664: 661: 660: 659: 656: 654: 651: 649: 646: 644: 641: 637: 634: 633: 632: 629: 625: 622: 620: 617: 615: 612: 610: 607: 606: 605: 602: 600: 597: 595: 594:Deep learning 592: 590: 587: 586: 583: 578: 577: 570: 567: 565: 562: 560: 558: 554: 552: 549: 548: 545: 540: 539: 530: 529:Hidden Markov 527: 525: 522: 520: 517: 516: 515: 512: 511: 508: 503: 502: 495: 492: 490: 487: 485: 482: 480: 477: 475: 472: 470: 467: 465: 462: 460: 457: 455: 452: 451: 448: 443: 442: 435: 432: 430: 427: 425: 421: 419: 416: 414: 411: 409: 407: 403: 401: 398: 396: 393: 391: 388: 387: 384: 379: 378: 371: 368: 366: 363: 361: 358: 356: 353: 351: 348: 346: 343: 341: 338: 336: 334: 330: 326: 325:Random forest 323: 321: 318: 316: 313: 312: 311: 308: 306: 303: 301: 298: 297: 290: 289: 284: 283: 275: 269: 268: 261: 258: 256: 253: 251: 248: 246: 243: 241: 238: 236: 233: 231: 228: 226: 223: 221: 218: 216: 213: 211: 210:Data cleaning 208: 206: 203: 201: 198: 196: 193: 191: 188: 186: 183: 181: 178: 176: 173: 172: 166: 165: 158: 155: 153: 150: 148: 145: 143: 140: 138: 135: 133: 130: 128: 125: 123: 122:Meta-learning 120: 118: 115: 113: 110: 108: 105: 103: 100: 98: 95: 94: 88: 87: 84: 79: 75: 71: 70: 61: 56: 52: 51: 46: 41: 32: 31: 19: 4307:Lottery scam 4264:Scraper site 4234:Doorway page 4190: 4104:Mobile phone 4084:Cold calling 3926: 3920: 3901: 3895: 3868: 3865:Gallinari, P 3858: 3841: 3837: 3827: 3816:. Retrieved 3802: 3783: 3768: 3757:. Retrieved 3742: 3731:. Retrieved 3722: 3709: 3690: 3679:. Retrieved 3672:the original 3658: 3633:. Retrieved 3619: 3610: 3604:. Retrieved 3590: 3582: 3576:. Retrieved 3572:the original 3562: 3551:. Retrieved 3537: 3526:. Retrieved 3505: 3494:. Retrieved 3470: 3463: 3413: 3392: 3388: 3376: 3367: 3363: 3359: 3344: 3335: 3316: 3284: 3275: 3273: 3162: 3156: 3152: 3142: 3140: 3137: 3132: 3122: 3080: 3078: 3075: 3072: 2974: 2905: 2782: 2778: 2774: 2715: 2562: 2426: 2272: 2264: 2262: 2252: 2248: 2245:independence 2238: 2062: 1883: 1874: 1796: 1793: 1676: 1614: 1599: 1590: 1571: 1544: 1360: 1195: 1188: 1161: 1149: 1145: 1134: 1119: 1108: 1079:SpamAssassin 1056: 1052: 1038: 1037: 1030: 1018:bag-of-words 1001: 869:PAC learning 556: 405: 400:Hierarchical 332: 286: 280: 59: 48: 4269:Social spam 4181:Greylisting 4151:Client-side 4079:Auto dialer 2271:. Instead, 1095:mail server 1071:Server-side 1007:statistical 753:Multi-agent 690:Transformer 589:Autoencoder 345:Naive Bayes 83:data mining 4341:Categories 4274:Spam blogs 4239:Forum spam 4216:Spamdexing 4089:Flyposting 4048:Image spam 4010:Email spam 3878:cs/0009009 3818:2015-09-05 3759:2009-01-19 3733:2007-07-19 3681:2009-12-30 3635:2016-07-09 3606:2010-09-18 3578:2010-09-18 3553:2010-01-16 3528:2007-08-15 3496:2017-09-13 3456:References 3298:Advantages 3293:Discussion 3125:Stop words 1152:heuristics 1141:quarantine 1087:Bogofilter 1022:email spam 738:Q-learning 636:Restricted 434:Mean shift 383:Clustering 360:Perceptron 288:regression 190:Clustering 185:Regression 4352:Anti-spam 4254:Link farm 4224:Blog spam 4143:Anti-spam 4109:Newsgroup 4099:Messaging 4001:Protocols 3953:0302-9743 3476:MIT Press 3417:AutoClass 3326:talk page 3233:⋯ 3207:⁡ 3198:− 3187:− 2857:⋅ 2836:⋅ 2750:η 2699:η 2685:− 2637:⁡ 2631:− 2615:− 2606:⁡ 2578:∑ 2571:η 2533:⁡ 2527:− 2511:− 2502:⁡ 2474:∑ 2459:− 2441:⁡ 2399:… 2361:− 2352:… 2336:− 2314:− 2296:− 2032:− 2023:⋯ 2007:− 1985:− 1963:⋯ 1928:⋯ 1581:talk page 1331:⋅ 1293:⋅ 1256:⋅ 1162:Bayesian 1083:SpamBayes 1010:technique 897:ECML PKDD 879:VC theory 826:ROC curve 758:Self-play 678:DeepDream 519:Bayes net 310:Ensembles 91:Paradigms 4347:Spamming 4319:Phishing 4229:Cloaking 4206:Spamhaus 4114:Robocall 4094:Junk fax 3812:Archived 3792:Archived 3753:Archived 3727:Archived 3717:(2003). 3699:Archived 3644:cite web 3629:Archived 3600:Archived 3547:Archived 3519:Archived 3490:Archived 3423:See also 3338:May 2013 3319:disputed 3143:patterns 2975:strength 2921:′ 2806:′ 2263:Usually 1593:May 2024 1574:disputed 1166:utilize 1059:software 320:Boosting 169:Problems 4201:SpamCop 4119:Spambot 4063:Spambot 4053:Joe job 3883:Bibcode 3398:in its 3383:Spammer 3278:is the 2973:is the 2906:where: 2063:where: 1361:where: 1191:replica 1105:Process 1063:clients 1049:History 902:NeurIPS 719:(ECRAM) 673:AlexNet 315:Bagging 55:Discuss 3951:  3941:  3908:  3482:  3396:Google 3274:where 3161:with 2 2235:etc... 1115:Viagra 1099:CRM114 1089:, and 695:Vision 551:RANSAC 429:OPTICS 424:DBSCAN 408:-means 215:AutoML 45:merged 4324:Voice 4279:Sping 4196:SORBS 4072:Other 4038:DNSWL 4033:DNSBL 3873:arXiv 3834:Li, S 3675:(PDF) 3668:(PDF) 3522:(PDF) 3515:(PDF) 3400:Gmail 3289:too. 2787:with 1075:DSPAM 917:IJCAI 743:SARSA 702:Mamba 668:LeNet 663:U-Net 489:t-SNE 413:Fuzzy 390:BIRCH 47:into 4129:VoIP 3949:ISSN 3939:ISBN 3906:ISBN 3650:link 3480:ISBN 2563:Let 1091:ASSP 927:JMLR 912:ICLR 907:ICML 793:RLHF 609:LSTM 395:CURE 81:and 3931:doi 3846:doi 3842:269 3135:|. 1662:0.2 1641:0.8 1549:.) 1012:of 653:SOM 643:GAN 619:ESN 614:GRU 559:-NN 494:SDL 484:PGD 479:PCA 474:NMF 469:LDA 464:ICA 459:CCA 335:-NN 53:. ( 4343:: 3947:. 3937:. 3881:. 3840:. 3810:. 3751:. 3725:. 3721:. 3646:}} 3642:{{ 3627:. 3609:. 3581:. 3488:. 3474:. 3282:. 3204:ln 3133:pI 3115:. 3040:Pr 2987:Pr 2918:Pr 2860:Pr 2839:Pr 2803:Pr 2634:ln 2603:ln 2530:ln 2499:ln 2438:ln 1842:Pr 1805:Pr 1759:Pr 1736:Pr 1714:Pr 1688:Pr 1647:Pr 1626:Pr 1512:Pr 1481:Pr 1442:Pr 1411:Pr 1372:Pr 1334:Pr 1311:Pr 1296:Pr 1273:Pr 1259:Pr 1236:Pr 1210:Pr 1085:, 1081:, 1077:, 1069:. 1028:. 922:ML 3985:e 3978:t 3971:v 3955:. 3933:: 3914:. 3889:. 3885:: 3875:: 3852:. 3848:: 3821:. 3777:. 3762:. 3736:. 3684:. 3652:) 3638:. 3556:. 3531:. 3499:. 3351:) 3345:( 3340:) 3336:( 3332:. 3322:. 3276:C 3258:) 3255:N 3252:2 3249:, 3246:) 3241:N 3237:p 3228:2 3224:p 3218:1 3214:p 3210:( 3201:2 3195:( 3190:1 3183:C 3179:= 3176:p 3163:N 3103:) 3100:S 3097:( 3094:r 3091:P 3081:n 3057:) 3054:W 3050:| 3046:S 3043:( 3018:n 2996:) 2993:S 2990:( 2961:s 2939:) 2936:W 2932:| 2928:S 2925:( 2888:n 2885:+ 2882:s 2877:) 2874:W 2870:| 2866:S 2863:( 2854:n 2851:+ 2848:) 2845:S 2842:( 2833:s 2827:= 2824:) 2821:W 2817:| 2813:S 2810:( 2746:e 2742:+ 2739:1 2735:1 2730:= 2727:p 2695:e 2691:= 2688:1 2680:p 2677:1 2651:] 2645:i 2641:p 2628:) 2623:i 2619:p 2612:1 2609:( 2599:[ 2593:N 2588:1 2585:= 2582:i 2574:= 2547:] 2541:i 2537:p 2524:) 2519:i 2515:p 2508:1 2505:( 2495:[ 2489:N 2484:1 2481:= 2478:i 2470:= 2466:) 2462:1 2454:p 2451:1 2445:( 2407:N 2403:p 2394:2 2390:p 2384:1 2380:p 2374:) 2369:N 2365:p 2358:1 2355:( 2349:) 2344:2 2340:p 2333:1 2330:( 2327:) 2322:1 2318:p 2311:1 2308:( 2302:= 2299:1 2291:p 2288:1 2273:p 2265:p 2253:p 2249:p 2220:) 2217:S 2213:| 2207:2 2203:W 2199:( 2196:p 2174:2 2170:p 2147:) 2144:S 2140:| 2134:1 2130:W 2126:( 2123:p 2101:1 2097:p 2074:p 2045:) 2040:N 2036:p 2029:1 2026:( 2020:) 2015:2 2011:p 2004:1 2001:( 1998:) 1993:1 1989:p 1982:1 1979:( 1976:+ 1971:N 1967:p 1958:2 1954:p 1948:1 1944:p 1936:N 1932:p 1923:2 1919:p 1913:1 1909:p 1902:= 1899:p 1859:) 1856:H 1852:| 1848:W 1845:( 1822:) 1819:S 1815:| 1811:W 1808:( 1776:) 1773:H 1769:| 1765:W 1762:( 1756:+ 1753:) 1750:S 1746:| 1742:W 1739:( 1731:) 1728:S 1724:| 1720:W 1717:( 1708:= 1705:) 1702:W 1698:| 1694:S 1691:( 1659:= 1656:) 1653:H 1650:( 1644:; 1638:= 1635:) 1632:S 1629:( 1606:) 1600:( 1595:) 1591:( 1587:. 1577:. 1529:) 1526:H 1522:| 1518:W 1515:( 1490:) 1487:H 1484:( 1459:) 1456:S 1452:| 1448:W 1445:( 1420:) 1417:S 1414:( 1389:) 1386:W 1382:| 1378:S 1375:( 1343:) 1340:H 1337:( 1328:) 1325:H 1321:| 1317:W 1314:( 1308:+ 1305:) 1302:S 1299:( 1290:) 1287:S 1283:| 1279:W 1276:( 1268:) 1265:S 1262:( 1253:) 1250:S 1246:| 1242:W 1239:( 1230:= 1227:) 1224:W 1220:| 1216:S 1213:( 991:e 984:t 977:v 557:k 406:k 333:k 291:) 279:( 57:) 20:)

Index

Bayesian spam filtering
merged
Naive Bayes classifier
Discuss
Machine learning
data mining
Supervised learning
Unsupervised learning
Semi-supervised learning
Self-supervised learning
Reinforcement learning
Meta-learning
Online learning
Batch learning
Curriculum learning
Rule-based learning
Neuro-symbolic AI
Neuromorphic engineering
Quantum machine learning
Classification
Generative modeling
Regression
Clustering
Dimensionality reduction
Density estimation
Anomaly detection
Data cleaning
AutoML
Association rules
Semantic analysis

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑