1480:
5984:
1870:(2001), the UK exception only allows content mining for non-commercial purposes. UK copyright law also does not allow this provision to be overridden by contractual terms and conditions. Since 2020 also Switzerland has been regulating data mining by allowing it in the research field under certain conditions laid down by art. 24d of the Swiss Copyright Act. This new article entered into force on 1 April 2020.
5994:
6004:
1819:, "'n practice, HIPAA may not offer any greater protection than the longstanding regulations in the research arena,' says the AAHC. More importantly, the rule's goal of protection through informed consent is approach a level of incomprehensibility to average individuals." This underscores the necessity for data anonymity in data aggregation and mining practices.
1446:(dependency modeling) – Searches for relationships between variables. For example, a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.
1722:, but a result of the preparation of data before—and for the purposes of—the analysis. The threat to an individual's privacy comes into play when the data, once compiled, cause the data miner, or anyone who has access to the newly compiled data set, to be able to identify specific individuals, especially when the data were originally anonymous.
1877:
facilitated stakeholder discussion on text and data mining in 2013, under the title of
Licences for Europe. The focus on the solution to this legal issue, such as licensing rather than limitations and exceptions, led to representatives of universities, researchers, libraries, civil society groups and
1511:
The final step of knowledge discovery from data is to verify that the patterns produced by the data mining algorithms occur in the wider data set. Not all patterns found by the algorithms are necessarily valid. It is common for data mining algorithms to find patterns in the training set which are not
1798:
In the United
Kingdom in particular there have been cases of corporations using data mining as a way to target certain groups of customers forcing them to pay unfairly high prices. These groups tend to be people of lower socio-economic status who are not savvy to the ways they can be exploited in
1631:-based language developed by the Data Mining Group (DMG) and supported as exchange format by many data mining applications. As the name suggests, it only covers prediction models, a particular data mining task of high importance to business applications. However, extensions to cover (for example)
1404:
Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while remaining concise enough to be mined within an acceptable time limit. A
1178:
refer to the use of data mining methods to sample parts of a larger population data set that are (or may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered. These methods can, however, be used in creating new hypotheses to test against the
1573:
1536:
If the learned patterns do not meet the desired standards, it is necessary to re-evaluate and change the pre-processing and data mining steps. If the learned patterns do meet the desired standards, then the final step is to interpret the learned patterns and turn them into knowledge.
1520:
of data on which the data mining algorithm was not trained. The learned patterns are applied to this test set, and the resulting output is compared to the desired output. For example, a data mining algorithm trying to distinguish "spam" from "legitimate" e-mails would be trained on a
1897:, upholds the legality of content mining in America, and other fair use countries such as Israel, Taiwan and South Korea. As content mining is transformative, that is it does not supplant the original work, it is viewed as being lawful under fair use. For example, as part of the
1290:
have grown in size and complexity, direct "hands-on" data analysis has increasingly been augmented with indirect, automated data processing, aided by other discoveries in computer science, specially in the field of machine learning, such as
1757:" data sets can potentially contain enough information to allow identification of individuals, as occurred when journalists were able to find several individuals based on a set of search histories that were inadvertently released by AOL.
1717:
involves combining data together (possibly from various sources) in a way that facilitates analysis (but that also might make identification of private, individual-level data deducible or otherwise apparent). This is not data mining
3011:
1901:
the presiding judge on the case ruled that Google's digitization project of in-copyright books was lawful, in part because of the transformative uses that the digitization project displayed—one being text and data mining.
1396:. However, 3–4 times as many people reported using CRISP-DM. Several teams of researchers have published reviews of data mining process models, and Azevedo and Santos conducted a comparison of CRISP-DM and SEMMA in 2008.
1001:
with an overall goal of extracting information (with intelligent methods) from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the
1487:
through a bot operated by statistician Tyler Vigen, apparently showing a close link between the best word winning a spelling bee competition and the number of people in the United States killed by venomous
1850:) without the permission of the copyright owner is not legal. Where a database is pure data in Europe, it may be that there is no copyright—but database rights may exist, so data mining becomes subject to
1323:
by exploiting the way data is stored and indexed in databases to execute the actual learning and discovery algorithms more efficiently, allowing such methods to be applied to ever-larger data sets.
1440:(outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or data errors that require further investigation due to being out of standard range.
1148:. Neither the data collection, data preparation, nor result interpretation and reporting is part of the data mining step, although they do belong to the overall KDD process as additional steps.
1216:
appeared around 1990 in the database community, with generally positive connotations. For a short time in 1980s, the phrase "database mining"™, was used, but since it was trademarked by HNC, a
1768:, the patrons of Walgreens filed a lawsuit against the company in 2011 for selling prescription information to data mining companies who in turn provided the data to pharmaceutical companies.
4177:
4151:
1616:
standard (JDM 1.0). Development on successors to these processes (CRISP-DM 2.0 and JDM 2.0) was active in 2006 but has stalled since. JDM 2.0 was withdrawn without reaching a final draft.
884:
3754:
1529:
been trained. The accuracy of the patterns can then be measured from how many e-mails they correctly classify. Several statistical methods may be used to evaluate the algorithm, such as
2468:
922:
2096:: The UIMA (Unstructured Information Management Architecture) is a component framework for analyzing unstructured content such as text, audio and video – originally developed by IBM.
1815:(HIPAA). The HIPAA requires individuals to give their "informed consent" regarding information they provide and its intended present and future uses. According to an article in
1764:
leading to the provider violates Fair
Information Practices. This indiscretion can cause financial, emotional, or bodily harm to the indicated individual. In one instance of
2586:
1826:(FERPA) applies only to the specific areas that each such law addresses. The use of data mining by the majority of businesses in the U.S. is not controlled by any legislation.
879:
3829:
1159:, regardless of the amount of data. In contrast, data mining uses machine learning and statistical models to uncover clandestine or hidden patterns in a large volume of data.
1812:
869:
1508:, but the same problem can arise at different phases of the process and thus a train/test split—when applicable at all—may not be sufficient to prevent this from happening.
710:
1911:
1568:
1597:
1582:
2488:
1209:
in 1983. Lovell indicates that the practice "masquerades under a variety of aliases, ranging from "experimentation" (positive) to "fishing" or "snooping" (negative).
3283:
2128:: an integrated software application for data mining, business intelligence, and modeling that implements the Learning and Intelligent OptimizatioN (LION) approach.
1609:
1458:– is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as "legitimate" or as "spam".
1358:
1286:(1800s). The proliferation, ubiquity and increasing power of computer technology have dramatically increased data collection, storage, and manipulation ability. As
3918:
3320:
917:
3192:
3167:
1866:. The UK was the second country in the world to do so after Japan, which introduced an exception in 2009 for data mining. However, due to the restriction of the
3867:
1780:
874:
725:
2952:
1492:
Data mining can unintentionally be misused, producing results that appear to be significant but which do not actually predict future behavior and cannot be
1315:(1990s). Data mining is the process of applying these methods with the intention of uncovering hidden patterns. in large data sets. It bridges the gap from
456:
5020:
957:
760:
1144:. For example, the data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a
1140:. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and
4507:
1955:
1823:
1452:– is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.
3807:
3450:
2815:
1553:). Since 1989, this ACM SIG has hosted an annual international conference and published its proceedings, and since 1999 it has published a biannual
1898:
1863:
836:
3432:
385:
4045:
Poncelet, Pascal; Masseglia, Florent; and
Teisseire, Maguelonne (editors) (October 2007); "Data Mining Patterns: New Methods and Applications",
3778:
3751:
3565:
2753:
5737:
5709:
1496:
on a new sample of data, therefore bearing little use. This is sometimes caused by investigating too many hypotheses and not performing proper
1783:, developed between 1998 and 2000, currently effectively expose European users to privacy exploitation by U.S. companies. As a consequence of
5762:
4110:
4082:
4054:
4040:
3945:
3396:
3137:
3107:
3045:
2924:
2794:
2717:
2348:
3708:
5613:
2647:
1667:
While the term "data mining" itself may have no ethical implications, it is often associated with the mining of information in relation to
1589:
894:
657:
192:
1464:– attempts to find a function that models the data with the least error that is, for estimating the relationships among data or datasets.
5767:
5039:
2614:
1840:
1761:
1546:
912:
5272:
4979:
1921:
The following applications are available under free/open-source licenses. Public access to application source code is also available.
1116:
or automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records (
1027:
745:
720:
669:
3646:
5919:
5747:
5277:
4068:
4011:
3990:
3905:
3837:
3671:
2899:
1530:
793:
788:
441:
1155:
and data mining is that data analysis is used to test models and hypotheses on the dataset, e.g., analyzing the effectiveness of a
3223:
6007:
5101:
3473:
3077:
1624:
451:
89:
1389:
Polls conducted in 2002, 2004, 2007 and 2014 show that the CRISP-DM methodology is the leading methodology used by data miners.
5395:
3531:
2680:
2553:
1023:
5648:
2801:
Thus, data mining should have been more appropriately named "knowledge mining from data," which is unfortunately somewhat long
846:
5686:
5305:
5013:
4885:
4213:
2166:: platform for automation of engineering simulation and analysis, multidisciplinary optimization and data mining provided by
1788:
1497:
950:
610:
431:
3379:
GĂĽnnemann, Stephan; Kremer, Hardy; Seidl, Thomas (2011). "An extension of the PMML standard to subspace clustering models".
3159:
2975:
1258:
communities. However, the term data mining became more popular in the business and press communities. Currently, the terms
5828:
5805:
5535:
5525:
4939:
4500:
3995:
3291:
2328:
2029:
2005:
1971:
1867:
1125:
821:
523:
299:
1779:
has rather strong privacy laws, and efforts are underway to further strengthen the rights of the consumers. However, the
5909:
5497:
5405:
5310:
5086:
5071:
4864:
4565:
2461:
1981:
1650:
778:
715:
625:
603:
446:
436:
3915:
3362:
3312:
1791:, there has been increased discussion to revoke this agreement, as in particular the data will be fully exposed to the
5997:
5732:
5230:
4590:
3189:
2621:
2137:
2103:
1975:
1959:
1949:
1691:
929:
841:
826:
287:
109:
3859:
816:
3605:
1479:
5969:
5618:
4974:
4890:
4585:
4550:
3955:
3937:
2323:
2288:
2219:
2079:
1455:
1205:
889:
566:
461:
249:
182:
142:
1525:
of sample e-mails. Once trained, the learned patterns would be applied to the test set of e-mails on which it had
5987:
5914:
5889:
5752:
5400:
5006:
4843:
4743:
4156:
3287:
2944:
2940:
2742:
2427:
2417:
2333:
2278:
2015:
1443:
1292:
1243:
1133:
943:
549:
317:
187:
3343:
1417:
data sets before data mining. The target set is then cleaned. Data cleaning removes the observations containing
6033:
5838:
5671:
5257:
5126:
4764:
4759:
4717:
4493:
4297:
4293:
2548:
2483:
2068:
2044:
2001:
1792:
571:
491:
414:
332:
162:
124:
119:
79:
74:
1608:
There have been some efforts to define standards for the data mining process, for example, the 1999 European
5899:
5833:
5724:
5540:
5200:
4833:
4222:
4118:
2568:
2543:
2478:
2455:
2412:
2353:
2099:
2004:): A suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the
1668:
1656:
1644:
1467:
1414:
1312:
1145:
1129:
1078:
1074:
518:
367:
267:
94:
4182:
3695: ... issued a decision that invalidated Safe Harbor (effective immediately), as currently implemented.
5964:
5795:
5676:
5443:
5433:
5428:
4859:
4656:
4634:
4324:
2538:
1304:
1066:
698:
674:
576:
337:
312:
272:
84:
3799:
3458:
2812:
1470:– providing a more compact representation of the data set, including visualization and report generation.
5934:
5904:
5894:
5790:
5704:
5580:
5520:
5487:
5477:
5360:
5325:
5315:
5252:
5121:
5096:
5091:
5056:
4951:
4818:
4560:
2397:
2235:
1984:: cross-platform tool for regression and classification problems based on a Genetic Programming variant.
1851:
1082:
1019:
652:
474:
426:
282:
197:
69:
3421:
1859:
1574:
European
Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
3560:
2746:
6028:
5694:
5666:
5638:
5633:
5462:
5438:
5390:
5373:
5368:
5350:
5340:
5335:
5297:
5247:
5242:
5159:
5105:
4823:
4738:
4606:
4555:
4454:
3774:
3583:
2533:
2432:
2382:
2363:
2083:
2048:
1620:
1238:
1196:
1141:
581:
531:
5959:
5884:
5800:
5785:
5550:
5330:
5287:
5282:
5179:
5169:
5141:
4774:
4712:
4629:
4580:
4394:
4379:
4307:
3099:
2563:
2528:
2338:
1874:
1674:
The ways in which data mining can be used can in some cases and contexts raise questions regarding
1632:
1461:
1320:
1283:
1011:
1003:
684:
620:
591:
496:
322:
255:
241:
227:
202:
152:
104:
64:
3717:
3129:
5924:
5823:
5699:
5656:
5565:
5507:
5492:
5482:
5267:
5066:
4206:
3978:
3402:
3265:
3003:
2872:
2676:
2643:
2473:
2313:
2157:
2153:
1855:
1843:
1754:
1710:
1386:
or a simplified process such as (1) Pre-processing, (2) Data Mining, and (3) Results
Validation.
1316:
1300:
1156:
662:
586:
372:
167:
1862:, this led to the UK government to amend its copyright law in 2014 to allow content mining as a
1220:–based company, to pitch their Database Mining Workstation; researchers consequently turned to
5944:
5874:
5853:
5815:
5623:
5590:
5570:
5262:
5174:
5048:
4686:
4384:
4374:
4238:
4106:
4078:
4064:
4050:
4036:
4007:
3986:
3974:
3941:
3901:
3493:
3485:
3392:
3257:
3051:
3041:
2995:
2920:
2895:
2790:
2713:
2672:
2497:
2303:
2273:
2021:
1945:
1765:
1701:
Data mining requires data preparation which uncovers information or patterns which compromise
1683:
1659:
can be found throughout business, medicine, science, finance, construction, and surveillance.
1437:
1418:
1380:
1279:
1121:
1015:
990:
755:
598:
511:
307:
277:
222:
217:
172:
114:
3497:
1968:: The Konstanz Information Miner, a user-friendly and comprehensive data analytics framework.
5777:
5661:
5628:
5423:
5345:
5234:
5220:
5215:
5164:
5151:
5076:
5029:
4869:
4575:
4334:
4283:
4268:
4248:
4233:
3638:
3384:
3249:
3204:
2987:
2864:
2839:
2523:
2318:
2293:
2283:
2245:
2183:
2087:
2025:
1941:
1890:
1714:
1613:
1554:
1501:
1449:
1296:
1255:
1246:
coined the term "knowledge discovery in databases" for the first workshop on the same topic
1117:
1031:
994:
978:
783:
536:
486:
396:
380:
350:
212:
207:
157:
147:
45:
5848:
5742:
5714:
5608:
5560:
5545:
5530:
5385:
5380:
5320:
5210:
5184:
5136:
5081:
4769:
4733:
4672:
4624:
4464:
4399:
4389:
4359:
4302:
4273:
4263:
4172:
4146:
4092:
4019:
4003:
3922:
3758:
3623:
3609:
3569:
3366:
3347:
3227:
3196:
2819:
2786:
2343:
2308:
2229:
1931:
1702:
1687:
1564:
1493:
1062:
1058:
1007:
986:
811:
615:
481:
421:
3892:
Cabena, Peter; Hadjnian, Pablo; Stadler, Rolf; Verhees, Jaap; Zanasi, Alessandro (1997);
3675:
5954:
5858:
5757:
5603:
5575:
4516:
4474:
4469:
4434:
4414:
4409:
4364:
4339:
4258:
4032:
3316:
3230:. In Proceedings of the IADIS European Conference on Data Mining 2008, pp 182–185.
3220:
3069:
3034:
2437:
2422:
2407:
2392:
2189:
1784:
1776:
1578:
1410:
1308:
1200:
1070:
831:
362:
99:
31:
3501:
6022:
5843:
5131:
4946:
4691:
4439:
4429:
4404:
4278:
4243:
4199:
4167:
4141:
4098:
3970:
3897:
3524:
3203:. Volume 21 Issue 1, March 2006, pp 1–24, Cambridge University Press, New York,
2668:
2558:
2402:
2298:
2193:
2075:
1484:
1199:
hypothesis. The term "data mining" was used in a similarly critical way by economist
1174:
1164:
1152:
1137:
1091:
1046:
750:
679:
561:
292:
177:
3674:. Washington, D.C. Congressional Research Service. p. 6. R44257. Archived from
3406:
3269:
2684:
5939:
5598:
4958:
4895:
4779:
4449:
4444:
4424:
4419:
4349:
4344:
4319:
4312:
4288:
3911:
3007:
2738:
2573:
2501:
2493:
2268:
2199:
2150:: suite of multilingual text and entity analytics products that enable data mining.
2062:
1706:
1522:
1422:
1319:
and artificial intelligence (which usually provide the mathematical background) to
1006:" process, or KDD. Aside from the raw analysis step, it also involves database and
3350:, International Conferences on Knowledge Discovery and Data Mining, ACM, New York.
1278:
has occurred for centuries. Early methods of identifying patterns in data include
3642:
5929:
5467:
4570:
4329:
2358:
2131:
2072:
2056:
2052:
1879:
1808:
1513:
1505:
556:
1195:
to refer to what they considered the bad practice of analyzing data without an
5949:
5879:
5472:
5205:
5061:
4934:
4707:
3960:
3474:"Data Mining and Domestic Security: Connecting the Dots to Make Sense of Data"
3208:
2991:
2843:
2778:
2705:
2681:"The Elements of Statistical Learning: Data Mining, Inference, and Prediction"
2442:
2249:
2209:
2179:
2167:
2125:
2065:: An open-source machine learning library for the Python programming language;
1847:
1549:'s (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining (
1113:
998:
982:
705:
401:
327:
30:"Web mining" redirects here. For web browser-based cryptocurrency mining, see
3860:"Judge grants summary judgment in favor of Google Books – a fair use victory"
2999:
2514:
For more information about extracting information out of data (as opposed to
1753:
anonymous, so that individuals may not readily be identified. However, even "
5454:
5415:
4838:
4681:
4616:
4459:
4354:
3983:
The
Elements of Statistical Learning: Data Mining, Inference, and Prediction
3742:, Biotech Business Week, retrieved 17 November 2009 from LexisNexis Academic
3489:
3388:
3359:
3163:
3133:
3103:
3073:
3055:
2377:
2141:
1406:
1217:
1097:
864:
645:
3562:
Think Before You Dig: Privacy
Implications of Data Mining & Aggregation
3261:
2610:
1247:
3952:
High
Performance Data Mining: Scaling Algorithms, Applications and Systems
2917:
Machine
Learning Forensics for Law Enforcement, Security, and Intelligence
1990:: a collection of ready-to-use machine learning algorithms written in the
17:
5515:
4676:
4611:
4545:
4253:
3600:
2387:
2213:
1894:
1795:, and attempts to reach an agreement with the United States have failed.
1739:
Who will be able to mine the data and use the data and their derivatives.
1517:
1287:
1054:
1038:
974:
1733:
The purpose of the data collection and any (known) data mining projects.
3830:"Text and Data Mining:Its importance and the need for change in Europe"
3381:
Proceedings of the 2011 workshop on Predictive markup language modeling
2876:
2225:
2173:
2038:: Data mining and statistics software under the GNU Project similar to
1925:
1675:
1042:
640:
3253:
1655:
Data mining is used wherever there is digital data available. Notable
4186:
4160:
3340:
2617:
2239:
2163:
2147:
2119:
2115:
The following applications are available under proprietary licenses.
2011:
1987:
1695:
1679:
1593:
1550:
391:
2868:
1974:: a real-time big data stream mining with concept drift tool in the
1682:. In particular, data mining government or commercial data sets for
4998:
3752:
UK Researchers Given Data Mining Right Under New UK Copyright Laws.
2102:: A suite of machine learning software applications written in the
1991:
4651:
4646:
4641:
4485:
1965:
1807:
In the United States, privacy concerns have been addressed by the
1478:
1393:
1357:
It exists, however, in many variations on this theme, such as the
635:
630:
357:
2222:: Visualisation-oriented data mining software, also for teaching.
27:
Process of extracting and discovering patterns in large data sets
3692:
3190:"A survey of Knowledge Discovery and Data Mining process models"
2890:
Charemza, Wojciech W.; Deadman, Derek F. (1992). "Data Mining".
2093:
2039:
2035:
1997:
1937:
1275:
5002:
4916:
4800:
4527:
4489:
4195:
4029:
Handbook of Statistical Analysis & Data Mining Applications
3709:"UK companies targeted for using big data to exploit customers"
3313:"Google Scholar: Top publications - Data Mining & Analysis"
2945:"Lesson: Data Mining, and Knowledge Discovery: An Introduction"
973:
is the process of extracting and discovering patterns in large
4000:
Web Data Mining: Exploring Hyperlinks, Contents and Usage Data
2203:
1628:
4059:
Tan, Pang-Ning; Steinbach, Michael; and Kumar, Vipin (2005);
3624:"Big data's impact on privacy, security and consumer welfare"
3160:"What main methodology are you using for data mining (2014)?"
3130:"What main methodology are you using for data mining (2007)?"
3100:"What main methodology are you using for data mining (2004)?"
3070:"What main methodology are you using for data mining (2002)?"
1619:
For exchanging the extracted models—in particular for use in
1392:
The only other data mining standard named in these polls was
1057:
and is frequently applied to any form of large-scale data or
4103:
Data Mining: Practical Machine Learning Tools and Techniques
4073:
Theodoridis, Sergios; and Koutroumbas, Konstantinos (2009);
4017:
Murphy, Chris (16 May 2011). "Is Data Mining Free Speech?".
3800:"Licences for Europe – Structured Stakeholder Dialogue 2013"
3672:"U.S.–E.U. Data Privacy: From Safe Harbor to Privacy Shield"
3740:
BIOMEDICINE; HIPAA Privacy Rule Impedes Biomedical Research
3284:"Microsoft Academic Search: Top conferences in data mining"
1822:
U.S. information privacy legislation such as HIPAA and the
1251:
1187:
In the 1960s, statisticians and economists used terms like
1136:). This usually involves using database techniques such as
2919:. Boca Raton, FL: CRC Press (Taylor & Francis Group).
2134:: data and text mining software by Megaputer Intelligence.
1882:
publishers to leave the stakeholder dialogue in May 2013.
1073:, analysis, and statistics) as well as any application of
1026:
considerations, post-processing of discovered structures,
3422:"The Promise and Pitfalls of Data Mining: Ethical Issues"
3240:
Hawkins, Douglas M (2004). "The problem of overfitting".
2469:
Automatic number plate recognition in the United Kingdom
2055:
computing, data mining, and graphics. It is part of the
923:
List of datasets in computer vision and image processing
4191:
3894:
Discovering Data Mining: From Concept to Implementation
3451:"The End of Illegal Domestic Spying? Don't Count on It"
2830:
Olson, D. L. (2007). Data mining in business services.
1709:
obligations. A common way for this to occur is through
3036:
Data Mining: Concepts, Models, Methods, and Algorithms
2747:"From Data Mining to Knowledge Discovery in Databases"
1917:
Free open-source data mining software and applications
1742:
The status of security surrounding access to the data.
3242:
Journal of Chemical Information and Computer Sciences
2813:
OKAIRP 2005 Fall Conference, Arizona State University
1560:
Computer science conferences on data mining include:
3916:
Data mining: an overview from a database perspective
2644:"Encyclopædia Britannica: Definition of Data Mining"
2587:
International Journal of Data Warehousing and Mining
2252:
for creating & productionising custom ML models.
5867:
5814:
5776:
5723:
5685:
5647:
5589:
5506:
5452:
5414:
5359:
5296:
5229:
5193:
5150:
5114:
5047:
4967:
4927:
4878:
4852:
4811:
4752:
4726:
4700:
4665:
4599:
4538:
3525:"A Framework for Mining Instant Messaging Services"
1934:: A chemical structure miner and web search engine.
1813:
Health Insurance Portability and Accountability Act
1811:via the passage of regulatory controls such as the
3950:Guo, Yike; and Grossman, Robert (editors) (1999);
3716:
3670:Weiss, Martin A.; Archick, Kristin (19 May 2016).
3033:
1912:Category:Data mining and machine learning software
1569:Conference on Information and Knowledge Management
1545:The premier professional body in the field is the
1433:Data mining involves six common classes of tasks:
1045:and knowledge from large amounts of data, not the
4027:Nisbet, Robert; Elder, John; Miner, Gary (2009);
3927:Knowledge and data Engineering, IEEE Transactions
2111:Proprietary data-mining software and applications
1598:International Conference on Very Large Data Bases
1583:Conference on Knowledge Discovery and Data Mining
4101:; Frank, Eibe; Hall, Mark A. (30 January 2011).
1512:present in the general data set. This is called
4087:Weiss, Sholom M.; and Indurkhya, Nitin (1998);
3738:Biotech Business Week Editors (June 30, 2008);
1928:: Text and search results clustering framework.
1846:, the mining of in-copyright works (such as by
1725:It is recommended to be aware of the following
1610:Cross Industry Standard Process for Data Mining
1359:Cross-industry standard process for data mining
1940:: A university research project with advanced
1333:knowledge discovery in databases (KDD) process
918:List of datasets for machine-learning research
5038:Note: This template roughly follows the 2012
5014:
4980:Data warehousing products and their producers
4501:
4207:
2212:Data Miner: data mining software provided by
1635:have been proposed independently of the DMG.
1413:. Pre-processing is essential to analyze the
951:
8:
3221:KDD, SEMMA and CRISP-DM: a parallel overview
2489:Quantitative structure–activity relationship
1588:Data mining topics are also present in many
2894:. Aldershot: Edward Elgar. pp. 14–31.
5021:
5007:
4999:
4924:
4913:
4808:
4797:
4535:
4524:
4508:
4494:
4486:
4214:
4200:
4192:
3834:Association of European Research Libraries
3478:Columbia Science and Technology Law Review
2855:Lovell, Michael C. (1983). "Data Mining".
1516:. To overcome this, the evaluation uses a
958:
944:
36:
3556:
3554:
3552:
1854:owners' rights that are protected by the
1824:Family Educational Rights and Privacy Act
1250:and this term became more popular in the
1022:considerations, interestingness metrics,
977:involving methods at the intersection of
4176:) is being considered for deletion. See
4150:) is being considered for deletion. See
2733:
2731:
2729:
2238:: automated custom ML models managed by
1112:The actual data mining task is the semi-
3866:. Antonelli Law Ltd. 19 November 2013.
3572:, NASCIO Research Brief, September 2004
2976:"Data mining: past, present and future"
2708:; Kamber, Micheline; Pei, Jian (2011).
2597:
2192:: data mining software provided by the
1274:The manual extraction of patterns from
1101:—or, when referring to actual methods,
44:
5738:Knowledge representation and reasoning
3932:Feldman, Ronen; Sanger, James (2007);
3602:AOL search data identified individuals
2892:New Directions in Econometric Practice
2857:The Review of Economics and Statistics
1893:, and in particular its provision for
1500:. A simple version of this problem in
1041:because the goal is the extraction of
5763:Philosophy of artificial intelligence
3870:from the original on 29 November 2014
3140:from the original on 17 November 2012
2605:
2603:
2601:
2176:Omics Explorer: data mining software.
1361:(CRISP-DM) which defines six phases:
1335:is commonly defined with the stages:
7:
5082:Energy consumption (Green computing)
3965:Data mining: concepts and techniques
3472:Taipale, Kim A. (15 December 2003).
3431:. American Statistical Association.
3429:ASA Section on Government Statistics
3110:from the original on 8 February 2017
3080:from the original on 16 January 2017
2783:Data mining: concepts and techniques
2710:Data Mining: Concepts and Techniques
2024:: A component-based data mining and
1590:data management/database conferences
5768:Distributed artificial intelligence
5040:ACM Computing Classification System
2955:from the original on 30 August 2012
2228:: data mining software provided by
2202:: data mining software provided by
2140:: data mining software provided by
1762:personally identifiable information
1749:Data may also be modified so as to
1547:Association for Computing Machinery
913:Glossary of artificial intelligence
5273:Integrated development environment
4865:MultiDimensional eXpressions (MDX)
3963:, Micheline Kamber, and Jian Pei.
3810:from the original on 23 March 2013
3170:from the original on 1 August 2016
1745:How collected data can be updated.
25:
5748:Automated planning and scheduling
5278:Software configuration management
4180:to help reach a consensus. ›
4154:to help reach a consensus. ›
2712:(3rd ed.). Morgan Kaufmann.
2122:KnowledgeSTUDIO: data mining tool
6002:
5992:
5983:
5982:
3584:"Don't Build a Database of Ruin"
3438:from the original on 2022-10-09.
3201:The Knowledge Engineering Review
3188:Lukasz Kurgan and Petr Musilek:
2980:The Knowledge Engineering Review
2274:Anomaly/outlier/change detection
2086:framework with wide support for
1781:U.S.–E.U. Safe Harbor Principles
1625:Predictive Model Markup Language
1085:. Often the more general terms (
1075:computer decision support system
1004:knowledge discovery in databases
5993:
5396:Computational complexity theory
4077:, 4th Edition, Academic Press,
3781:from the original on 2021-12-16
3652:from the original on 2018-06-19
3537:from the original on 2022-10-09
3323:from the original on 2023-02-10
3014:from the original on 2023-07-02
2759:from the original on 2022-10-09
2650:from the original on 2011-02-05
2624:from the original on 2013-10-14
2554:Profiling (information science)
1858:. On the recommendation of the
1698:, has raised privacy concerns.
1483:An example of data produced by
1203:in an article published in the
5180:Network performance evaluation
4886:Business intelligence software
4765:Extract, load, transform (ELT)
4760:Extract, transform, load (ETL)
4128:, Mahwah, NJ: Lawrence Erlbaum
3219:Azevedo, A. and Santos, M. F.
2248:: managed service provided by
2028:software suite written in the
1962:and language engineering tool.
1886:Situation in the United States
1803:Situation in the United States
1789:global surveillance disclosure
1760:The inadvertent revelation of
1557:titled "SIGKDD Explorations".
1498:statistical hypothesis testing
333:Relevance vector machine (RVM)
1:
5551:Multimedia information system
5536:Geographic information system
5526:Enterprise information system
5115:Computer systems organization
4834:Decision support system (DSS)
4047:Information Science Reference
3707:Parker, George (2018-09-30).
3449:Pitts, Chip (15 March 2007).
2642:Clifton, Christopher (2010).
2329:Multilinear subspace learning
2051:and software environment for
1972:Massive Online Analysis (MOA)
1868:Information Society Directive
1592:such as the ICDE Conference,
1081:(e.g., machine learning) and
822:Computational learning theory
386:Expectation–maximization (EM)
5910:Computational social science
5498:Theoretical computer science
5311:Software development process
5087:Electronic design automation
5072:Very Large Scale Integration
4860:Data Mining Extensions (DMX)
4157:Knowledge Discovery Software
3643:10.1016/j.telpol.2014.10.002
3612:, SecurityFocus, August 2006
2974:Coenen, Frans (2011-02-07).
2781:; Kamber, Micheline (2001).
2462:Category:Applied data mining
2186:and data mining experiments.
1651:Category:Applied data mining
1612:(CRISP-DM 1.0) and the 2004
1405:common source for data is a
1037:The term "data mining" is a
779:Coefficient of determination
626:Convolutional neural network
338:Support vector machine (SVM)
5733:Natural language processing
5521:Information storage systems
4621:Ensemble modeling patterns
4591:Single version of the truth
4126:The Handbook of Data Mining
4061:Introduction to Data Mining
3032:Kantardzic, Mehmed (2003).
2949:Introduction to Data Mining
2138:Microsoft Analysis Services
1960:natural language processing
1692:Total Information Awareness
1663:Privacy concerns and ethics
1224:. Other terms used include
930:Outline of machine learning
827:Empirical risk minimization
6050:
5649:Human–computer interaction
5619:Intrusion detection system
5531:Social information systems
5516:Database management system
4975:Comparison of OLAP servers
3956:Kluwer Academic Publishers
3938:Cambridge University Press
3764:Retrieved 14 November 2014
2941:Piatetsky-Shapiro, Gregory
2745:; Smyth, Padhraic (1996).
2743:Piatetsky-Shapiro, Gregory
2459:
2453:
2324:Learning classifier system
2156:: data mining software by
1909:
1736:How the data will be used.
1648:
1642:
1353:Interpretation/evaluation.
1266:are used interchangeably.
1206:Review of Economic Studies
567:Feedforward neural network
318:Artificial neural networks
29:
5978:
5915:Computational engineering
5890:Computational mathematics
5036:
4923:
4912:
4844:Data warehouse automation
4807:
4796:
4534:
4529:Creating a data warehouse
4523:
4229:
3631:Telecommunications Policy
3420:Seltzer, William (2005).
3288:Microsoft Academic Search
3209:10.1017/S0269888906000737
3040:. John Wiley & Sons.
2992:10.1017/S0269888910000378
2844:10.1007/s11628-006-0014-7
2428:Exploratory data analysis
2418:Domain driven data mining
2279:Association rule learning
2082:programming language and
1690:purposes, such as in the
1671:(ethical and otherwise).
1623:—the key standard is the
1444:Association rule learning
1244:Gregory Piatetsky-Shapiro
1179:larger data populations.
1134:sequential pattern mining
550:Artificial neural network
5925:Computational healthcare
5920:Differentiable computing
5839:Graphics processing unit
5258:Domain-specific language
5127:Computational complexity
4183:Data Mining Tool Vendors
4178:templates for discussion
4152:templates for discussion
4105:(3 ed.). Elsevier.
3967:. Morgan kaufmann, 2006.
3934:The Text Mining Handbook
3691:On October 6, 2015, the
2611:"Data Mining Curriculum"
2549:Named-entity recognition
2484:National Security Agency
2349:Structured data analysis
2002:Natural Language Toolkit
1864:limitation and exception
1793:National Security Agency
859:Journals and conferences
806:Mathematical foundations
716:Temporal difference (TD)
572:Recurrent neural network
492:Conditional random field
415:Dimensionality reduction
163:Dimensionality reduction
125:Quantum machine learning
120:Neuromorphic engineering
80:Self-supervised learning
75:Semi-supervised learning
5900:Computational chemistry
5834:Photograph manipulation
5725:Artificial intelligence
5541:Decision support system
4870:XML for Analysis (XMLA)
3588:Harvard Business Review
3389:10.1145/2023598.2023605
2943:; Parker, Gary (2011).
2569:Surveillance capitalism
2544:Information integration
2479:Educational data mining
2456:Examples of data mining
2413:Decision support system
2354:Support vector machines
1948:methods written in the
1799:digital market places.
1657:examples of data mining
1645:Examples of data mining
1313:support vector machines
1151:The difference between
1146:decision support system
1130:association rule mining
1109:—are more appropriate.
1103:artificial intelligence
1079:artificial intelligence
268:Apprenticeship learning
5965:Educational technology
5796:Reinforcement learning
5546:Process control system
5444:Computational geometry
5434:Algorithmic efficiency
5429:Analysis of algorithms
5077:Systems on Chip (SoCs)
4802:Using a data warehouse
4657:Operational data store
4089:Predictive Data Mining
2539:Information extraction
1899:Google Book settlement
1489:
1365:Business understanding
1230:information harvesting
1059:information processing
817:Bias–variance tradeoff
699:Reinforcement learning
675:Spiking neural network
85:Reinforcement learning
5935:Electronic publishing
5905:Computational biology
5895:Computational physics
5791:Unsupervised learning
5705:Distributed computing
5581:Information retrieval
5488:Mathematical analysis
5478:Mathematical software
5361:Theory of computation
5326:Software construction
5316:Requirements analysis
5194:Software organization
5122:Computer architecture
5092:Hardware acceleration
5057:Printed circuit board
4819:Business intelligence
3757:June 9, 2014, at the
3622:Kshetri, Nir (2014).
2398:Business intelligence
2236:Google Cloud Platform
2182:: An environment for
2106:programming language.
1978:programming language.
1852:intellectual property
1817:Biotech Business Week
1482:
1234:information discovery
1083:business intelligence
653:Neural radiance field
475:Structured prediction
198:Structured prediction
70:Unsupervised learning
5695:Concurrent computing
5667:Ubiquitous computing
5639:Application security
5634:Information security
5463:Discrete mathematics
5439:Randomized algorithm
5391:Computability theory
5369:Model of computation
5341:Software maintenance
5336:Software engineering
5298:Software development
5248:Programming language
5243:Programming paradigm
5160:Network architecture
4635:Focal point modeling
4607:Column-oriented DBMS
4556:Dimensional modeling
4395:Protection (privacy)
3455:Washington Spectator
2915:Mena, JesĂşs (2011).
2534:Electronic discovery
2450:Application examples
2433:Predictive analytics
2383:Behavior informatics
2364:Time series analysis
2190:SAS Enterprise Miner
2084:scientific computing
2049:programming language
1729:data are collected:
1627:(PMML), which is an
1621:predictive analytics
1239:knowledge extraction
1142:predictive analytics
1120:), unusual records (
989:. Data mining is an
842:Statistical learning
740:Learning with humans
532:Local outlier factor
5970:Document management
5960:Operations research
5885:Enterprise software
5801:Multi-task learning
5786:Supervised learning
5508:Information systems
5331:Software deployment
5288:Software repository
5142:Real-time computing
4940:Information factory
4713:Early-arriving fact
4630:Data vault modeling
4581:Reverse star schema
4075:Pattern Recognition
3910:M.S. Chen, J. Han,
3840:on 29 November 2014
3804:European Commission
3360:SIGKDD Explorations
2564:Social media mining
2529:Data transformation
2371:Application domains
2339:Regression analysis
1875:European Commission
1835:Situation in Europe
1772:Situation in Europe
1633:subspace clustering
1321:database management
1284:regression analysis
1264:knowledge discovery
1012:data pre-processing
685:Electrochemical RAM
592:reservoir computing
323:Logistic regression
242:Supervised learning
228:Multimodal learning
203:Feature engineering
148:Generative modeling
110:Rule-based learning
105:Curriculum learning
65:Supervised learning
40:Part of a series on
5753:Search methodology
5700:Parallel computing
5657:Interaction design
5566:Computing platform
5493:Numerical analysis
5483:Information theory
5268:Software framework
5231:Software notations
5170:Network components
5067:Integrated circuit
4891:Reporting software
4119:Free Weka software
3975:Tibshirani, Robert
3921:2016-03-03 at the
3608:2010-01-06 at the
3568:2008-12-17 at the
3504:on 5 November 2014
3365:2010-07-29 at the
3346:2010-04-30 at the
3226:2013-01-09 at the
3195:2013-05-26 at the
2818:2014-02-01 at the
2673:Tibshirani, Robert
2474:Customer analytics
2314:Genetic algorithms
2158:Oracle Corporation
2154:Oracle Data Mining
1856:Database Directive
1841:European copyright
1490:
1475:Results validation
1368:Data understanding
1317:applied statistics
1301:genetic algorithms
1162:The related terms
1157:marketing campaign
253: •
168:Density estimation
6016:
6015:
5945:Electronic voting
5875:Quantum Computing
5868:Applied computing
5854:Image compression
5624:Hardware security
5614:Security services
5571:Digital marketing
5351:Open-source model
5263:Modeling language
5175:Network scheduler
4996:
4995:
4992:
4991:
4988:
4987:
4908:
4907:
4904:
4903:
4792:
4791:
4788:
4787:
4687:Sixth normal form
4483:
4482:
4475:Wrangling/munging
4325:Format management
4124:Ye, Nong (2003);
4112:978-0-12-374856-0
4083:978-1-59749-272-0
4055:978-1-59904-162-9
4041:978-0-12-374765-5
3946:978-0-521-83657-9
3929:on 8 (6), 866–883
3637:(11): 1134–1145.
3398:978-1-4503-0837-3
3254:10.1021/ci0342472
3047:978-0-471-22852-3
2926:978-1-4398-6069-4
2796:978-1-55860-489-6
2719:978-0-12-381479-1
2498:Mass surveillance
2304:Ensemble learning
2284:Bayesian networks
1946:outlier detection
1860:Hargreaves review
1766:privacy violation
1684:national security
1594:SIGMOD Conference
1438:Anomaly detection
1122:anomaly detection
991:interdisciplinary
968:
967:
773:Model diagnostics
756:Human-in-the-loop
599:Boltzmann machine
512:Anomaly detection
308:Linear regression
223:Ontology learning
218:Grammar induction
193:Semantic analysis
188:Association rules
173:Anomaly detection
115:Neuro-symbolic AI
16:(Redirected from
6041:
6006:
6005:
5996:
5995:
5986:
5985:
5806:Cross-validation
5778:Machine learning
5662:Social computing
5629:Network security
5424:Algorithm design
5346:Programming team
5306:Control variable
5283:Software library
5221:Software quality
5216:Operating system
5165:Network protocol
5030:Computer science
5023:
5016:
5009:
5000:
4925:
4914:
4809:
4798:
4576:Snowflake schema
4536:
4525:
4510:
4503:
4496:
4487:
4216:
4209:
4202:
4193:
4116:
4024:
3979:Friedman, Jerome
3880:
3879:
3877:
3875:
3856:
3850:
3849:
3847:
3845:
3836:. Archived from
3826:
3820:
3819:
3817:
3815:
3796:
3790:
3789:
3787:
3786:
3771:
3765:
3749:
3743:
3736:
3730:
3729:
3727:
3726:
3720:
3715:. Archived from
3704:
3698:
3697:
3688:
3686:
3680:
3667:
3661:
3660:
3658:
3657:
3651:
3628:
3619:
3613:
3598:
3592:
3591:
3579:
3573:
3558:
3547:
3546:
3544:
3542:
3536:
3529:
3520:
3514:
3513:
3511:
3509:
3500:. Archived from
3469:
3463:
3462:
3457:. Archived from
3446:
3440:
3439:
3437:
3426:
3417:
3411:
3410:
3376:
3370:
3369:, ACM, New York.
3357:
3351:
3338:
3332:
3331:
3329:
3328:
3309:
3303:
3302:
3300:
3299:
3290:. Archived from
3280:
3274:
3273:
3237:
3231:
3217:
3211:
3186:
3180:
3179:
3177:
3175:
3156:
3150:
3149:
3147:
3145:
3126:
3120:
3119:
3117:
3115:
3096:
3090:
3089:
3087:
3085:
3066:
3060:
3059:
3039:
3029:
3023:
3022:
3020:
3019:
2971:
2965:
2964:
2962:
2960:
2937:
2931:
2930:
2912:
2906:
2905:
2887:
2881:
2880:
2852:
2846:
2832:Service Business
2828:
2822:
2810:
2804:
2803:
2775:
2769:
2768:
2766:
2764:
2758:
2751:
2735:
2724:
2723:
2702:
2696:
2695:
2693:
2692:
2683:. Archived from
2677:Friedman, Jerome
2665:
2659:
2658:
2656:
2655:
2639:
2633:
2632:
2630:
2629:
2607:
2524:Data integration
2319:Intention mining
2294:Cluster analysis
2246:Amazon SageMaker
2184:machine learning
2088:machine learning
2078:library for the
2026:machine learning
1942:cluster analysis
1891:US copyright law
1715:Data aggregation
1711:data aggregation
1678:, legality, and
1614:Java Data Mining
1555:academic journal
1502:machine learning
1371:Data preparation
1297:cluster analysis
1256:machine learning
1226:data archaeology
1118:cluster analysis
1107:machine learning
1051:) of data itself
995:computer science
987:database systems
979:machine learning
960:
953:
946:
907:Related articles
784:Confusion matrix
537:Isolation forest
482:Graphical models
261:
260:
213:Learning to rank
208:Feature learning
46:Machine learning
37:
21:
6049:
6048:
6044:
6043:
6042:
6040:
6039:
6038:
6034:Formal sciences
6019:
6018:
6017:
6012:
6003:
5974:
5955:Word processing
5863:
5849:Virtual reality
5810:
5772:
5743:Computer vision
5719:
5715:Multiprocessing
5681:
5643:
5609:Security hacker
5585:
5561:Digital library
5502:
5453:Mathematics of
5448:
5410:
5386:Automata theory
5381:Formal language
5355:
5321:Software design
5292:
5225:
5211:Virtual machine
5189:
5185:Network service
5146:
5137:Embedded system
5110:
5043:
5032:
5027:
4997:
4984:
4963:
4919:
4900:
4874:
4848:
4803:
4784:
4748:
4744:Slowly changing
4734:Dimension table
4722:
4696:
4673:Data dictionary
4661:
4625:Anchor modeling
4595:
4530:
4519:
4517:Data warehouses
4514:
4484:
4479:
4455:Synchronization
4225:
4220:
4181:
4155:
4136:
4131:
4113:
4097:
4093:Morgan Kaufmann
4020:InformationWeek
4016:
3923:Wayback Machine
3888:
3886:Further reading
3883:
3873:
3871:
3858:
3857:
3853:
3843:
3841:
3828:
3827:
3823:
3813:
3811:
3798:
3797:
3793:
3784:
3782:
3773:
3772:
3768:
3759:Wayback Machine
3750:
3746:
3737:
3733:
3724:
3722:
3713:Financial Times
3706:
3705:
3701:
3684:
3682:
3681:on 9 April 2020
3678:
3669:
3668:
3664:
3655:
3653:
3649:
3626:
3621:
3620:
3616:
3610:Wayback Machine
3599:
3595:
3581:
3580:
3576:
3570:Wayback Machine
3559:
3550:
3540:
3538:
3534:
3527:
3522:
3521:
3517:
3507:
3505:
3471:
3470:
3466:
3448:
3447:
3443:
3435:
3424:
3419:
3418:
3414:
3399:
3378:
3377:
3373:
3367:Wayback Machine
3358:
3354:
3348:Wayback Machine
3339:
3335:
3326:
3324:
3311:
3310:
3306:
3297:
3295:
3282:
3281:
3277:
3239:
3238:
3234:
3228:Wayback Machine
3218:
3214:
3197:Wayback Machine
3187:
3183:
3173:
3171:
3158:
3157:
3153:
3143:
3141:
3128:
3127:
3123:
3113:
3111:
3098:
3097:
3093:
3083:
3081:
3068:
3067:
3063:
3048:
3031:
3030:
3026:
3017:
3015:
2973:
2972:
2968:
2958:
2956:
2939:
2938:
2934:
2927:
2914:
2913:
2909:
2902:
2889:
2888:
2884:
2869:10.2307/1924403
2854:
2853:
2849:
2829:
2825:
2820:Wayback Machine
2811:
2807:
2797:
2787:Morgan Kaufmann
2777:
2776:
2772:
2762:
2760:
2756:
2749:
2737:
2736:
2727:
2720:
2704:
2703:
2699:
2690:
2688:
2667:
2666:
2662:
2653:
2651:
2641:
2640:
2636:
2627:
2625:
2609:
2608:
2599:
2595:
2581:Other resources
2578:
2507:
2464:
2458:
2447:
2368:
2344:Sequence mining
2334:Neural networks
2309:Factor analysis
2259:
2230:Hewlett-Packard
2113:
2016:neural networks
1932:Chemicalize.org
1919:
1914:
1908:
1888:
1837:
1832:
1805:
1774:
1703:confidentiality
1688:law enforcement
1665:
1653:
1647:
1641:
1606:
1565:CIKM Conference
1543:
1477:
1431:
1421:and those with
1402:
1329:
1293:neural networks
1272:
1185:
1138:spatial indices
1053:. It also is a
1032:online updating
1008:data management
964:
935:
934:
908:
900:
899:
860:
852:
851:
812:Kernel machines
807:
799:
798:
774:
766:
765:
746:Active learning
741:
733:
732:
701:
691:
690:
616:Diffusion model
552:
542:
541:
514:
504:
503:
477:
467:
466:
422:Factor analysis
417:
407:
406:
390:
353:
343:
342:
263:
262:
246:
245:
244:
233:
232:
138:
130:
129:
95:Online learning
60:
48:
35:
28:
23:
22:
15:
12:
11:
5:
6047:
6045:
6037:
6036:
6031:
6021:
6020:
6014:
6013:
6011:
6010:
6000:
5990:
5979:
5976:
5975:
5973:
5972:
5967:
5962:
5957:
5952:
5947:
5942:
5937:
5932:
5927:
5922:
5917:
5912:
5907:
5902:
5897:
5892:
5887:
5882:
5877:
5871:
5869:
5865:
5864:
5862:
5861:
5859:Solid modeling
5856:
5851:
5846:
5841:
5836:
5831:
5826:
5820:
5818:
5812:
5811:
5809:
5808:
5803:
5798:
5793:
5788:
5782:
5780:
5774:
5773:
5771:
5770:
5765:
5760:
5758:Control method
5755:
5750:
5745:
5740:
5735:
5729:
5727:
5721:
5720:
5718:
5717:
5712:
5710:Multithreading
5707:
5702:
5697:
5691:
5689:
5683:
5682:
5680:
5679:
5674:
5669:
5664:
5659:
5653:
5651:
5645:
5644:
5642:
5641:
5636:
5631:
5626:
5621:
5616:
5611:
5606:
5604:Formal methods
5601:
5595:
5593:
5587:
5586:
5584:
5583:
5578:
5576:World Wide Web
5573:
5568:
5563:
5558:
5553:
5548:
5543:
5538:
5533:
5528:
5523:
5518:
5512:
5510:
5504:
5503:
5501:
5500:
5495:
5490:
5485:
5480:
5475:
5470:
5465:
5459:
5457:
5450:
5449:
5447:
5446:
5441:
5436:
5431:
5426:
5420:
5418:
5412:
5411:
5409:
5408:
5403:
5398:
5393:
5388:
5383:
5378:
5377:
5376:
5365:
5363:
5357:
5356:
5354:
5353:
5348:
5343:
5338:
5333:
5328:
5323:
5318:
5313:
5308:
5302:
5300:
5294:
5293:
5291:
5290:
5285:
5280:
5275:
5270:
5265:
5260:
5255:
5250:
5245:
5239:
5237:
5227:
5226:
5224:
5223:
5218:
5213:
5208:
5203:
5197:
5195:
5191:
5190:
5188:
5187:
5182:
5177:
5172:
5167:
5162:
5156:
5154:
5148:
5147:
5145:
5144:
5139:
5134:
5129:
5124:
5118:
5116:
5112:
5111:
5109:
5108:
5099:
5094:
5089:
5084:
5079:
5074:
5069:
5064:
5059:
5053:
5051:
5045:
5044:
5037:
5034:
5033:
5028:
5026:
5025:
5018:
5011:
5003:
4994:
4993:
4990:
4989:
4986:
4985:
4983:
4982:
4977:
4971:
4969:
4965:
4964:
4962:
4961:
4956:
4955:
4954:
4952:Enterprise bus
4944:
4943:
4942:
4931:
4929:
4921:
4920:
4917:
4910:
4909:
4906:
4905:
4902:
4901:
4899:
4898:
4893:
4888:
4882:
4880:
4876:
4875:
4873:
4872:
4867:
4862:
4856:
4854:
4850:
4849:
4847:
4846:
4841:
4836:
4831:
4826:
4821:
4815:
4813:
4805:
4804:
4801:
4794:
4793:
4790:
4789:
4786:
4785:
4783:
4782:
4777:
4772:
4767:
4762:
4756:
4754:
4750:
4749:
4747:
4746:
4741:
4736:
4730:
4728:
4724:
4723:
4721:
4720:
4715:
4710:
4704:
4702:
4698:
4697:
4695:
4694:
4689:
4684:
4679:
4669:
4667:
4663:
4662:
4660:
4659:
4654:
4649:
4644:
4639:
4638:
4637:
4632:
4627:
4619:
4614:
4609:
4603:
4601:
4597:
4596:
4594:
4593:
4588:
4583:
4578:
4573:
4568:
4563:
4558:
4553:
4548:
4542:
4540:
4532:
4531:
4528:
4521:
4520:
4515:
4513:
4512:
4505:
4498:
4490:
4481:
4480:
4478:
4477:
4472:
4467:
4462:
4457:
4452:
4447:
4442:
4437:
4432:
4427:
4422:
4417:
4412:
4407:
4402:
4397:
4392:
4387:
4382:
4380:Pre-processing
4377:
4372:
4367:
4362:
4357:
4352:
4347:
4342:
4337:
4332:
4327:
4322:
4317:
4316:
4315:
4310:
4305:
4291:
4286:
4281:
4276:
4271:
4266:
4261:
4256:
4251:
4246:
4241:
4236:
4230:
4227:
4226:
4221:
4219:
4218:
4211:
4204:
4196:
4190:
4189:
4165:
4163:
4139:
4135:
4134:External links
4132:
4130:
4129:
4122:
4111:
4099:Witten, Ian H.
4095:
4085:
4071:
4057:
4043:
4033:Academic Press
4025:
4014:
3998:(2007, 2011);
3993:
3971:Hastie, Trevor
3968:
3958:
3948:
3930:
3908:
3889:
3887:
3884:
3882:
3881:
3851:
3821:
3791:
3766:
3744:
3731:
3699:
3662:
3614:
3593:
3574:
3548:
3515:
3464:
3461:on 2007-11-28.
3441:
3412:
3397:
3383:. p. 48.
3371:
3352:
3333:
3317:Google Scholar
3304:
3275:
3232:
3212:
3181:
3151:
3121:
3091:
3061:
3046:
3024:
2966:
2951:. KD Nuggets.
2932:
2925:
2907:
2900:
2882:
2847:
2838:(3), 181–193.
2823:
2805:
2795:
2770:
2725:
2718:
2697:
2669:Hastie, Trevor
2660:
2634:
2620:. 2006-04-30.
2596:
2594:
2591:
2590:
2589:
2583:
2582:
2577:
2576:
2571:
2566:
2561:
2556:
2551:
2546:
2541:
2536:
2531:
2526:
2520:
2512:
2511:
2510:Related topics
2506:
2505:
2491:
2486:
2481:
2476:
2471:
2465:
2454:Main article:
2452:
2451:
2446:
2445:
2440:
2438:Real-time data
2435:
2430:
2425:
2423:Drug discovery
2420:
2415:
2410:
2408:Data warehouse
2405:
2400:
2395:
2393:Bioinformatics
2390:
2385:
2380:
2374:
2373:
2372:
2367:
2366:
2361:
2356:
2351:
2346:
2341:
2336:
2331:
2326:
2321:
2316:
2311:
2306:
2301:
2299:Decision trees
2296:
2291:
2289:Classification
2286:
2281:
2276:
2271:
2265:
2264:
2263:
2258:
2255:
2254:
2253:
2243:
2233:
2223:
2217:
2207:
2197:
2187:
2177:
2171:
2161:
2151:
2145:
2135:
2129:
2123:
2112:
2109:
2108:
2107:
2097:
2091:
2066:
2060:
2042:
2033:
2019:
2009:
1995:
1985:
1979:
1969:
1963:
1953:
1935:
1929:
1918:
1915:
1907:
1904:
1887:
1884:
1836:
1833:
1831:
1828:
1804:
1801:
1785:Edward Snowden
1773:
1770:
1747:
1746:
1743:
1740:
1737:
1734:
1694:Program or in
1664:
1661:
1643:Main article:
1640:
1637:
1605:
1602:
1586:
1585:
1579:KDD Conference
1576:
1571:
1542:
1539:
1476:
1473:
1472:
1471:
1465:
1459:
1456:Classification
1453:
1447:
1441:
1430:
1427:
1411:data warehouse
1401:
1400:Pre-processing
1398:
1384:
1383:
1378:
1375:
1372:
1369:
1366:
1355:
1354:
1351:
1346:
1345:Transformation
1343:
1342:Pre-processing
1340:
1328:
1325:
1309:decision rules
1305:decision trees
1280:Bayes' theorem
1271:
1268:
1201:Michael Lovell
1184:
1181:
966:
965:
963:
962:
955:
948:
940:
937:
936:
933:
932:
927:
926:
925:
915:
909:
906:
905:
902:
901:
898:
897:
892:
887:
882:
877:
872:
867:
861:
858:
857:
854:
853:
850:
849:
844:
839:
834:
832:Occam learning
829:
824:
819:
814:
808:
805:
804:
801:
800:
797:
796:
791:
789:Learning curve
786:
781:
775:
772:
771:
768:
767:
764:
763:
758:
753:
748:
742:
739:
738:
735:
734:
731:
730:
729:
728:
718:
713:
708:
702:
697:
696:
693:
692:
689:
688:
682:
677:
672:
667:
666:
665:
655:
650:
649:
648:
643:
638:
633:
623:
618:
613:
608:
607:
606:
596:
595:
594:
589:
584:
579:
569:
564:
559:
553:
548:
547:
544:
543:
540:
539:
534:
529:
521:
515:
510:
509:
506:
505:
502:
501:
500:
499:
494:
489:
478:
473:
472:
469:
468:
465:
464:
459:
454:
449:
444:
439:
434:
429:
424:
418:
413:
412:
409:
408:
405:
404:
399:
394:
388:
383:
378:
370:
365:
360:
354:
349:
348:
345:
344:
341:
340:
335:
330:
325:
320:
315:
310:
305:
297:
296:
295:
290:
285:
275:
273:Decision trees
270:
264:
250:classification
240:
239:
238:
235:
234:
231:
230:
225:
220:
215:
210:
205:
200:
195:
190:
185:
180:
175:
170:
165:
160:
155:
150:
145:
143:Classification
139:
136:
135:
132:
131:
128:
127:
122:
117:
112:
107:
102:
100:Batch learning
97:
92:
87:
82:
77:
72:
67:
61:
58:
57:
54:
53:
42:
41:
32:cryptocurrency
26:
24:
14:
13:
10:
9:
6:
4:
3:
2:
6046:
6035:
6032:
6030:
6027:
6026:
6024:
6009:
6001:
5999:
5991:
5989:
5981:
5980:
5977:
5971:
5968:
5966:
5963:
5961:
5958:
5956:
5953:
5951:
5948:
5946:
5943:
5941:
5938:
5936:
5933:
5931:
5928:
5926:
5923:
5921:
5918:
5916:
5913:
5911:
5908:
5906:
5903:
5901:
5898:
5896:
5893:
5891:
5888:
5886:
5883:
5881:
5878:
5876:
5873:
5872:
5870:
5866:
5860:
5857:
5855:
5852:
5850:
5847:
5845:
5844:Mixed reality
5842:
5840:
5837:
5835:
5832:
5830:
5827:
5825:
5822:
5821:
5819:
5817:
5813:
5807:
5804:
5802:
5799:
5797:
5794:
5792:
5789:
5787:
5784:
5783:
5781:
5779:
5775:
5769:
5766:
5764:
5761:
5759:
5756:
5754:
5751:
5749:
5746:
5744:
5741:
5739:
5736:
5734:
5731:
5730:
5728:
5726:
5722:
5716:
5713:
5711:
5708:
5706:
5703:
5701:
5698:
5696:
5693:
5692:
5690:
5688:
5684:
5678:
5677:Accessibility
5675:
5673:
5672:Visualization
5670:
5668:
5665:
5663:
5660:
5658:
5655:
5654:
5652:
5650:
5646:
5640:
5637:
5635:
5632:
5630:
5627:
5625:
5622:
5620:
5617:
5615:
5612:
5610:
5607:
5605:
5602:
5600:
5597:
5596:
5594:
5592:
5588:
5582:
5579:
5577:
5574:
5572:
5569:
5567:
5564:
5562:
5559:
5557:
5554:
5552:
5549:
5547:
5544:
5542:
5539:
5537:
5534:
5532:
5529:
5527:
5524:
5522:
5519:
5517:
5514:
5513:
5511:
5509:
5505:
5499:
5496:
5494:
5491:
5489:
5486:
5484:
5481:
5479:
5476:
5474:
5471:
5469:
5466:
5464:
5461:
5460:
5458:
5456:
5451:
5445:
5442:
5440:
5437:
5435:
5432:
5430:
5427:
5425:
5422:
5421:
5419:
5417:
5413:
5407:
5404:
5402:
5399:
5397:
5394:
5392:
5389:
5387:
5384:
5382:
5379:
5375:
5372:
5371:
5370:
5367:
5366:
5364:
5362:
5358:
5352:
5349:
5347:
5344:
5342:
5339:
5337:
5334:
5332:
5329:
5327:
5324:
5322:
5319:
5317:
5314:
5312:
5309:
5307:
5304:
5303:
5301:
5299:
5295:
5289:
5286:
5284:
5281:
5279:
5276:
5274:
5271:
5269:
5266:
5264:
5261:
5259:
5256:
5254:
5251:
5249:
5246:
5244:
5241:
5240:
5238:
5236:
5232:
5228:
5222:
5219:
5217:
5214:
5212:
5209:
5207:
5204:
5202:
5199:
5198:
5196:
5192:
5186:
5183:
5181:
5178:
5176:
5173:
5171:
5168:
5166:
5163:
5161:
5158:
5157:
5155:
5153:
5149:
5143:
5140:
5138:
5135:
5133:
5132:Dependability
5130:
5128:
5125:
5123:
5120:
5119:
5117:
5113:
5107:
5103:
5100:
5098:
5095:
5093:
5090:
5088:
5085:
5083:
5080:
5078:
5075:
5073:
5070:
5068:
5065:
5063:
5060:
5058:
5055:
5054:
5052:
5050:
5046:
5041:
5035:
5031:
5024:
5019:
5017:
5012:
5010:
5005:
5004:
5001:
4981:
4978:
4976:
4973:
4972:
4970:
4966:
4960:
4957:
4953:
4950:
4949:
4948:
4947:Ralph Kimball
4945:
4941:
4938:
4937:
4936:
4933:
4932:
4930:
4926:
4922:
4915:
4911:
4897:
4894:
4892:
4889:
4887:
4884:
4883:
4881:
4877:
4871:
4868:
4866:
4863:
4861:
4858:
4857:
4855:
4851:
4845:
4842:
4840:
4837:
4835:
4832:
4830:
4827:
4825:
4822:
4820:
4817:
4816:
4814:
4810:
4806:
4799:
4795:
4781:
4778:
4776:
4773:
4771:
4768:
4766:
4763:
4761:
4758:
4757:
4755:
4751:
4745:
4742:
4740:
4737:
4735:
4732:
4731:
4729:
4725:
4719:
4716:
4714:
4711:
4709:
4706:
4705:
4703:
4699:
4693:
4692:Surrogate key
4690:
4688:
4685:
4683:
4680:
4678:
4674:
4671:
4670:
4668:
4664:
4658:
4655:
4653:
4650:
4648:
4645:
4643:
4640:
4636:
4633:
4631:
4628:
4626:
4623:
4622:
4620:
4618:
4615:
4613:
4610:
4608:
4605:
4604:
4602:
4598:
4592:
4589:
4587:
4584:
4582:
4579:
4577:
4574:
4572:
4569:
4567:
4564:
4562:
4559:
4557:
4554:
4552:
4549:
4547:
4544:
4543:
4541:
4537:
4533:
4526:
4522:
4518:
4511:
4506:
4504:
4499:
4497:
4492:
4491:
4488:
4476:
4473:
4471:
4468:
4466:
4463:
4461:
4458:
4456:
4453:
4451:
4448:
4446:
4443:
4441:
4438:
4436:
4433:
4431:
4428:
4426:
4423:
4421:
4418:
4416:
4413:
4411:
4408:
4406:
4403:
4401:
4398:
4396:
4393:
4391:
4388:
4386:
4383:
4381:
4378:
4376:
4373:
4371:
4368:
4366:
4363:
4361:
4358:
4356:
4353:
4351:
4348:
4346:
4343:
4341:
4338:
4336:
4333:
4331:
4328:
4326:
4323:
4321:
4318:
4314:
4311:
4309:
4306:
4304:
4301:
4300:
4299:
4295:
4292:
4290:
4287:
4285:
4282:
4280:
4277:
4275:
4272:
4270:
4267:
4265:
4262:
4260:
4257:
4255:
4252:
4250:
4247:
4245:
4242:
4240:
4237:
4235:
4232:
4231:
4228:
4224:
4217:
4212:
4210:
4205:
4203:
4198:
4197:
4194:
4188:
4184:
4179:
4175:
4174:
4169:
4164:
4162:
4158:
4153:
4149:
4148:
4143:
4138:
4137:
4133:
4127:
4123:
4120:
4114:
4108:
4104:
4100:
4096:
4094:
4090:
4086:
4084:
4080:
4076:
4072:
4070:
4069:0-321-32136-7
4066:
4062:
4058:
4056:
4052:
4048:
4044:
4042:
4038:
4034:
4030:
4026:
4022:
4021:
4015:
4013:
4012:3-540-37881-2
4009:
4005:
4001:
3997:
3994:
3992:
3991:0-387-95284-5
3988:
3984:
3980:
3976:
3972:
3969:
3966:
3962:
3959:
3957:
3953:
3949:
3947:
3943:
3939:
3935:
3931:
3928:
3924:
3920:
3917:
3913:
3909:
3907:
3906:0-13-743980-6
3903:
3899:
3898:Prentice Hall
3895:
3891:
3890:
3885:
3869:
3865:
3861:
3855:
3852:
3839:
3835:
3831:
3825:
3822:
3809:
3805:
3801:
3795:
3792:
3780:
3776:
3770:
3767:
3763:
3760:
3756:
3753:
3748:
3745:
3741:
3735:
3732:
3721:on 2022-12-10
3719:
3714:
3710:
3703:
3700:
3696:
3694:
3677:
3673:
3666:
3663:
3648:
3644:
3640:
3636:
3632:
3625:
3618:
3615:
3611:
3607:
3604:
3603:
3597:
3594:
3589:
3585:
3578:
3575:
3571:
3567:
3564:
3563:
3557:
3555:
3553:
3549:
3533:
3526:
3523:Resig, John.
3519:
3516:
3503:
3499:
3495:
3491:
3487:
3483:
3479:
3475:
3468:
3465:
3460:
3456:
3452:
3445:
3442:
3434:
3430:
3423:
3416:
3413:
3408:
3404:
3400:
3394:
3390:
3386:
3382:
3375:
3372:
3368:
3364:
3361:
3356:
3353:
3349:
3345:
3342:
3337:
3334:
3322:
3318:
3314:
3308:
3305:
3294:on 2014-11-19
3293:
3289:
3285:
3279:
3276:
3271:
3267:
3263:
3259:
3255:
3251:
3247:
3243:
3236:
3233:
3229:
3225:
3222:
3216:
3213:
3210:
3206:
3202:
3198:
3194:
3191:
3185:
3182:
3169:
3165:
3161:
3155:
3152:
3139:
3135:
3131:
3125:
3122:
3109:
3105:
3101:
3095:
3092:
3079:
3075:
3071:
3065:
3062:
3057:
3053:
3049:
3043:
3038:
3037:
3028:
3025:
3013:
3009:
3005:
3001:
2997:
2993:
2989:
2985:
2981:
2977:
2970:
2967:
2954:
2950:
2946:
2942:
2936:
2933:
2928:
2922:
2918:
2911:
2908:
2903:
2901:1-85278-461-X
2897:
2893:
2886:
2883:
2878:
2874:
2870:
2866:
2862:
2858:
2851:
2848:
2845:
2841:
2837:
2833:
2827:
2824:
2821:
2817:
2814:
2809:
2806:
2802:
2798:
2792:
2789:. p. 5.
2788:
2784:
2780:
2774:
2771:
2755:
2748:
2744:
2740:
2739:Fayyad, Usama
2734:
2732:
2730:
2726:
2721:
2715:
2711:
2707:
2701:
2698:
2687:on 2009-11-10
2686:
2682:
2678:
2674:
2670:
2664:
2661:
2649:
2645:
2638:
2635:
2623:
2619:
2616:
2612:
2606:
2604:
2602:
2598:
2592:
2588:
2585:
2584:
2580:
2579:
2575:
2572:
2570:
2567:
2565:
2562:
2560:
2559:Psychometrics
2557:
2555:
2552:
2550:
2547:
2545:
2542:
2540:
2537:
2535:
2532:
2530:
2527:
2525:
2522:
2521:
2519:
2517:
2509:
2508:
2503:
2499:
2495:
2492:
2490:
2487:
2485:
2482:
2480:
2477:
2475:
2472:
2470:
2467:
2466:
2463:
2457:
2449:
2448:
2444:
2441:
2439:
2436:
2434:
2431:
2429:
2426:
2424:
2421:
2419:
2416:
2414:
2411:
2409:
2406:
2404:
2403:Data analysis
2401:
2399:
2396:
2394:
2391:
2389:
2386:
2384:
2381:
2379:
2376:
2375:
2370:
2369:
2365:
2362:
2360:
2357:
2355:
2352:
2350:
2347:
2345:
2342:
2340:
2337:
2335:
2332:
2330:
2327:
2325:
2322:
2320:
2317:
2315:
2312:
2310:
2307:
2305:
2302:
2300:
2297:
2295:
2292:
2290:
2287:
2285:
2282:
2280:
2277:
2275:
2272:
2270:
2267:
2266:
2261:
2260:
2256:
2251:
2247:
2244:
2241:
2237:
2234:
2231:
2227:
2224:
2221:
2218:
2215:
2211:
2208:
2205:
2201:
2198:
2195:
2194:SAS Institute
2191:
2188:
2185:
2181:
2178:
2175:
2172:
2169:
2165:
2162:
2159:
2155:
2152:
2149:
2146:
2143:
2139:
2136:
2133:
2130:
2127:
2124:
2121:
2118:
2117:
2116:
2110:
2105:
2101:
2098:
2095:
2092:
2089:
2085:
2081:
2077:
2076:deep learning
2074:
2070:
2067:
2064:
2061:
2058:
2054:
2050:
2046:
2043:
2041:
2037:
2034:
2031:
2027:
2023:
2020:
2017:
2013:
2010:
2007:
2003:
1999:
1996:
1993:
1989:
1986:
1983:
1980:
1977:
1973:
1970:
1967:
1964:
1961:
1957:
1954:
1951:
1947:
1943:
1939:
1936:
1933:
1930:
1927:
1924:
1923:
1922:
1916:
1913:
1905:
1903:
1900:
1896:
1892:
1885:
1883:
1881:
1876:
1871:
1869:
1865:
1861:
1857:
1853:
1849:
1845:
1844:database laws
1842:
1834:
1830:Copyright law
1829:
1827:
1825:
1820:
1818:
1814:
1810:
1802:
1800:
1796:
1794:
1790:
1786:
1782:
1778:
1771:
1769:
1767:
1763:
1758:
1756:
1752:
1744:
1741:
1738:
1735:
1732:
1731:
1730:
1728:
1723:
1721:
1716:
1712:
1708:
1704:
1699:
1697:
1693:
1689:
1685:
1681:
1677:
1672:
1670:
1669:user behavior
1662:
1660:
1658:
1652:
1646:
1638:
1636:
1634:
1630:
1626:
1622:
1617:
1615:
1611:
1603:
1601:
1599:
1595:
1591:
1584:
1581:– ACM SIGKDD
1580:
1577:
1575:
1572:
1570:
1566:
1563:
1562:
1561:
1558:
1556:
1552:
1548:
1540:
1538:
1534:
1532:
1528:
1524:
1519:
1515:
1509:
1507:
1503:
1499:
1495:
1486:
1485:data dredging
1481:
1474:
1469:
1468:Summarization
1466:
1463:
1460:
1457:
1454:
1451:
1448:
1445:
1442:
1439:
1436:
1435:
1434:
1428:
1426:
1424:
1420:
1416:
1412:
1408:
1399:
1397:
1395:
1390:
1387:
1382:
1379:
1376:
1373:
1370:
1367:
1364:
1363:
1362:
1360:
1352:
1350:
1347:
1344:
1341:
1338:
1337:
1336:
1334:
1326:
1324:
1322:
1318:
1314:
1311:(1960s), and
1310:
1306:
1302:
1298:
1294:
1289:
1285:
1281:
1277:
1269:
1267:
1265:
1261:
1257:
1253:
1249:
1245:
1241:
1240:
1235:
1231:
1227:
1223:
1219:
1215:
1210:
1208:
1207:
1202:
1198:
1194:
1193:data dredging
1190:
1182:
1180:
1177:
1176:
1175:data snooping
1171:
1167:
1166:
1165:data dredging
1160:
1158:
1154:
1153:data analysis
1149:
1147:
1143:
1139:
1135:
1131:
1127:
1123:
1119:
1115:
1110:
1108:
1104:
1100:
1099:
1094:
1093:
1092:data analysis
1088:
1084:
1080:
1076:
1072:
1068:
1064:
1060:
1056:
1052:
1050:
1044:
1040:
1035:
1033:
1029:
1028:visualization
1025:
1021:
1017:
1013:
1009:
1005:
1000:
996:
992:
988:
984:
980:
976:
972:
961:
956:
954:
949:
947:
942:
941:
939:
938:
931:
928:
924:
921:
920:
919:
916:
914:
911:
910:
904:
903:
896:
893:
891:
888:
886:
883:
881:
878:
876:
873:
871:
868:
866:
863:
862:
856:
855:
848:
845:
843:
840:
838:
835:
833:
830:
828:
825:
823:
820:
818:
815:
813:
810:
809:
803:
802:
795:
792:
790:
787:
785:
782:
780:
777:
776:
770:
769:
762:
759:
757:
754:
752:
751:Crowdsourcing
749:
747:
744:
743:
737:
736:
727:
724:
723:
722:
719:
717:
714:
712:
709:
707:
704:
703:
700:
695:
694:
686:
683:
681:
680:Memtransistor
678:
676:
673:
671:
668:
664:
661:
660:
659:
656:
654:
651:
647:
644:
642:
639:
637:
634:
632:
629:
628:
627:
624:
622:
619:
617:
614:
612:
609:
605:
602:
601:
600:
597:
593:
590:
588:
585:
583:
580:
578:
575:
574:
573:
570:
568:
565:
563:
562:Deep learning
560:
558:
555:
554:
551:
546:
545:
538:
535:
533:
530:
528:
526:
522:
520:
517:
516:
513:
508:
507:
498:
497:Hidden Markov
495:
493:
490:
488:
485:
484:
483:
480:
479:
476:
471:
470:
463:
460:
458:
455:
453:
450:
448:
445:
443:
440:
438:
435:
433:
430:
428:
425:
423:
420:
419:
416:
411:
410:
403:
400:
398:
395:
393:
389:
387:
384:
382:
379:
377:
375:
371:
369:
366:
364:
361:
359:
356:
355:
352:
347:
346:
339:
336:
334:
331:
329:
326:
324:
321:
319:
316:
314:
311:
309:
306:
304:
302:
298:
294:
293:Random forest
291:
289:
286:
284:
281:
280:
279:
276:
274:
271:
269:
266:
265:
258:
257:
252:
251:
243:
237:
236:
229:
226:
224:
221:
219:
216:
214:
211:
209:
206:
204:
201:
199:
196:
194:
191:
189:
186:
184:
181:
179:
178:Data cleaning
176:
174:
171:
169:
166:
164:
161:
159:
156:
154:
151:
149:
146:
144:
141:
140:
134:
133:
126:
123:
121:
118:
116:
113:
111:
108:
106:
103:
101:
98:
96:
93:
91:
90:Meta-learning
88:
86:
83:
81:
78:
76:
73:
71:
68:
66:
63:
62:
56:
55:
52:
47:
43:
39:
38:
33:
19:
5940:Cyberwarfare
5599:Cryptography
5555:
4959:Dan Linstedt
4828:
4385:Preservation
4375:Philanthropy
4369:
4239:Augmentation
4171:
4145:
4125:
4102:
4088:
4074:
4060:
4046:
4028:
4018:
3999:
3985:, Springer,
3982:
3964:
3951:
3933:
3926:
3893:
3872:. Retrieved
3864:Lexology.com
3863:
3854:
3842:. Retrieved
3838:the original
3833:
3824:
3812:. Retrieved
3803:
3794:
3783:. Retrieved
3769:
3762:Out-Law.com.
3761:
3747:
3739:
3734:
3723:. Retrieved
3718:the original
3712:
3702:
3690:
3683:. Retrieved
3676:the original
3665:
3654:. Retrieved
3634:
3630:
3617:
3601:
3596:
3587:
3577:
3561:
3539:. Retrieved
3518:
3506:. Retrieved
3502:the original
3481:
3477:
3467:
3459:the original
3454:
3444:
3428:
3415:
3380:
3374:
3355:
3336:
3325:. Retrieved
3307:
3296:. Retrieved
3292:the original
3278:
3245:
3241:
3235:
3215:
3200:
3184:
3172:. Retrieved
3154:
3142:. Retrieved
3124:
3112:. Retrieved
3094:
3082:. Retrieved
3064:
3035:
3027:
3016:. Retrieved
2986:(1): 25–29.
2983:
2979:
2969:
2957:. Retrieved
2948:
2935:
2916:
2910:
2891:
2885:
2860:
2856:
2850:
2835:
2831:
2826:
2808:
2800:
2782:
2773:
2761:. Retrieved
2709:
2700:
2689:. Retrieved
2685:the original
2663:
2652:. Retrieved
2637:
2626:. Retrieved
2574:Web scraping
2518:data), see:
2515:
2513:
2502:Stellar Wind
2494:Surveillance
2269:Agent mining
2200:SPSS Modeler
2114:
2063:scikit-learn
1920:
1889:
1872:
1838:
1821:
1816:
1806:
1797:
1775:
1759:
1750:
1748:
1726:
1724:
1719:
1700:
1673:
1666:
1654:
1639:Notable uses
1618:
1607:
1587:
1559:
1544:
1535:
1526:
1523:training set
1510:
1504:is known as
1491:
1432:
1423:missing data
1415:multivariate
1403:
1391:
1388:
1385:
1356:
1348:
1332:
1330:
1282:(1700s) and
1273:
1263:
1259:
1237:
1233:
1229:
1225:
1221:
1213:
1211:
1204:
1192:
1189:data fishing
1188:
1186:
1173:
1170:data fishing
1169:
1163:
1161:
1150:
1126:dependencies
1111:
1106:
1102:
1096:
1090:
1086:
1077:, including
1048:
1047:extraction (
1036:
993:subfield of
970:
969:
837:PAC learning
524:
373:
368:Hierarchical
300:
254:
248:
50:
6029:Data mining
5950:Video games
5930:Digital art
5687:Concurrency
5556:Data mining
5468:Probability
5201:Interpreter
4896:Spreadsheet
4829:Data mining
4571:Star schema
4445:Stewardship
4335:Integration
4284:Degradation
4269:Compression
4249:Archaeology
4234:Acquisition
4166:‹ The
4140:‹ The
4035:/Elsevier,
3961:Han, Jiawei
3874:14 November
3844:14 November
3814:14 November
3582:Ohm, Paul.
3341:Proceedings
3248:(1): 1–12.
3174:29 December
3144:29 December
3114:29 December
3084:29 December
2863:(1): 1–12.
2779:Han, Jiawei
2763:17 December
2706:Han, Jaiwei
2359:Text mining
2132:PolyAnalyst
2090:algorithms.
2073:open-source
2057:GNU Project
2053:statistical
1880:open access
1809:US Congress
1514:overfitting
1506:overfitting
1429:Data mining
1349:Data mining
1260:data mining
1222:data mining
1214:data mining
1087:large scale
1071:warehousing
971:Data mining
721:Multi-agent
658:Transformer
557:Autoencoder
313:Naive Bayes
51:data mining
6023:Categories
6008:Glossaries
5880:E-commerce
5473:Statistics
5416:Algorithms
5374:Stochastic
5206:Middleware
5062:Peripheral
4935:Bill Inmon
4739:Degenerate
4708:Fact table
4465:Validation
4400:Publishing
4390:Processing
4360:Management
4274:Corruption
4264:Collection
4117:(See also
3785:2021-12-16
3725:2022-12-04
3656:2018-04-20
3327:2022-06-11
3298:2014-06-13
3018:2021-09-04
2691:2012-08-07
2654:2010-12-09
2628:2014-01-27
2593:References
2460:See also:
2443:Web mining
2210:STATISTICA
2180:RapidMiner
2168:DATADVANCE
2126:LIONsolver
1910:See also:
1848:web mining
1755:anonymized
1649:See also:
1531:ROC curves
1494:reproduced
1462:Regression
1450:Clustering
1381:Deployment
1377:Evaluation
1270:Background
1248:(KDD-1989)
1067:extraction
1063:collection
1024:complexity
999:statistics
983:statistics
706:Q-learning
604:Restricted
402:Mean shift
351:Clustering
328:Perceptron
256:regression
158:Clustering
153:Regression
18:Web mining
5829:Rendering
5824:Animation
5455:computing
5406:Semantics
5097:Processor
4853:Languages
4839:OLAP cube
4824:Dashboard
4775:Transform
4727:Dimension
4682:Data mart
4617:Data mesh
4586:Aggregate
4551:Dimension
4470:Warehouse
4435:Scrubbing
4415:Retention
4410:Reduction
4365:Migration
4340:Integrity
4308:Transform
4259:Cleansing
3996:Liu, Bing
3164:KDnuggets
3134:KDnuggets
3104:KDnuggets
3074:KDnuggets
3000:0269-8889
2959:30 August
2516:analyzing
2378:Analytics
2142:Microsoft
2032:language.
2008:language.
1994:language.
1952:language.
1604:Standards
1407:data mart
1339:Selection
1303:(1950s),
1288:data sets
1218:San Diego
1212:The term
1183:Etymology
1114:automatic
1098:analytics
1020:inference
1010:aspects,
975:data sets
865:ECML PKDD
847:VC theory
794:ROC curve
726:Self-play
646:DeepDream
487:Bayes net
278:Ensembles
59:Paradigms
5988:Category
5816:Graphics
5591:Security
5253:Compiler
5152:Networks
5049:Hardware
4968:Products
4812:Concepts
4677:Metadata
4666:Elements
4612:Data hub
4600:Variants
4546:Database
4539:Concepts
4440:Security
4430:Scraping
4405:Recovery
4279:Curation
4244:Analysis
4168:template
4142:template
4004:Springer
3981:(2001);
3919:Archived
3914:(1996) "
3868:Archived
3808:Archived
3779:Archived
3775:"Fedlex"
3755:Archived
3647:Archived
3606:Archived
3566:Archived
3541:16 March
3532:Archived
3508:21 April
3490:45263753
3433:Archived
3407:14967969
3363:Archived
3344:Archived
3321:Archived
3270:12440383
3262:14741005
3224:Archived
3193:Archived
3168:Archived
3166:. 2014.
3138:Archived
3136:. 2007.
3108:Archived
3106:. 2004.
3078:Archived
3076:. 2002.
3056:50055336
3012:Archived
2953:Archived
2816:Archived
2754:Archived
2679:(2009).
2648:Archived
2622:Archived
2388:Big data
2257:See also
2214:StatSoft
2018:library.
1906:Software
1895:fair use
1541:Research
1518:test set
1374:Modeling
1197:a-priori
1055:buzzword
1043:patterns
1039:misnomer
288:Boosting
137:Problems
5998:Outline
4918:Related
4770:Extract
4753:Filling
4718:Measure
4450:Storage
4425:Science
4420:Quality
4350:Lineage
4345:Library
4320:Farming
4303:Extract
4289:Editing
4170:below (
4144:below (
3912:P.S. Yu
3685:9 April
3008:6487637
2877:1924403
2500:(e.g.,
2262:Methods
2226:Vertica
2220:Tanagra
2174:Qlucore
2014:: Open
1926:Carrot2
1707:privacy
1676:privacy
1488:spiders
1327:Process
1242:, etc.
1124:), and
870:NeurIPS
687:(ECRAM)
641:AlexNet
283:Bagging
4928:People
4370:Mining
4330:Fusion
4187:Curlie
4173:Curlie
4161:Curlie
4147:Curlie
4109:
4081:
4067:
4053:
4039:
4010:
3989:
3944:
3904:
3498:546782
3496:
3488:
3405:
3395:
3268:
3260:
3054:
3044:
3006:
2998:
2923:
2898:
2875:
2793:
2716:
2618:SIGKDD
2250:Amazon
2240:Google
2164:PSeven
2148:NetOwl
2120:Angoss
2030:Python
2022:Orange
2012:OpenNN
2006:Python
1988:mlpack
1839:Under
1777:Europe
1751:become
1727:before
1720:per se
1696:ADVISE
1680:ethics
1567:– ACM
1551:SIGKDD
1172:, and
1049:mining
1030:, and
985:, and
663:Vision
519:RANSAC
397:OPTICS
392:DBSCAN
376:-means
183:AutoML
5401:Logic
5235:tools
4879:Tools
4652:ROLAP
4647:MOLAP
4642:HOLAP
4023:: 12.
3679:(PDF)
3650:(PDF)
3627:(PDF)
3535:(PDF)
3528:(PDF)
3484:(2).
3436:(PDF)
3425:(PDF)
3403:S2CID
3266:S2CID
3004:S2CID
2873:JSTOR
2757:(PDF)
2750:(PDF)
2071:: An
2069:Torch
1966:KNIME
1419:noise
1394:SEMMA
1016:model
885:IJCAI
711:SARSA
670:Mamba
636:LeNet
631:U-Net
457:t-SNE
381:Fuzzy
358:BIRCH
5233:and
5106:Form
5102:Size
4780:Load
4701:Fact
4566:OLAP
4561:Fact
4460:Type
4355:Loss
4313:Load
4223:Data
4107:ISBN
4079:ISBN
4065:ISBN
4051:ISBN
4037:ISBN
4008:ISBN
3987:ISBN
3977:and
3942:ISBN
3902:ISBN
3876:2014
3846:2014
3816:2014
3693:CJEU
3687:2020
3543:2018
3510:2004
3494:SSRN
3486:OCLC
3393:ISBN
3258:PMID
3176:2023
3146:2023
3116:2023
3086:2023
3052:OCLC
3042:ISBN
2996:ISSN
2961:2012
2921:ISBN
2896:ISBN
2791:ISBN
2765:2008
2714:ISBN
2104:Java
2100:Weka
2094:UIMA
2047:: A
2040:SPSS
2036:PSPP
1998:NLTK
1982:MEPX
1976:Java
1958:: a
1956:GATE
1950:Java
1944:and
1938:ELKI
1873:The
1705:and
1596:and
1331:The
1307:and
1276:data
1262:and
1254:and
1105:and
1095:and
1018:and
997:and
895:JMLR
880:ICLR
875:ICML
761:RLHF
577:LSTM
363:CURE
49:and
4298:ELT
4294:ETL
4254:Big
4185:at
4159:at
3925:".
3639:doi
3385:doi
3250:doi
3205:doi
2988:doi
2865:doi
2840:doi
2615:ACM
2204:IBM
2080:Lua
1992:C++
1787:'s
1686:or
1629:XML
1527:not
1409:or
1191:or
621:SOM
611:GAN
587:ESN
582:GRU
527:-NN
462:SDL
452:PGD
447:PCA
442:NMF
437:LDA
432:ICA
427:CCA
303:-NN
6025::
5104:/
4091:,
4063:,
4049:,
4031:,
4006:,
4002:,
3973:,
3954:,
3940:,
3936:,
3900:,
3896:,
3862:.
3832:.
3806:.
3802:.
3777:.
3711:.
3689:.
3645:.
3635:38
3633:.
3629:.
3586:.
3551:^
3530:.
3492:.
3480:.
3476:.
3453:.
3427:.
3401:.
3391:.
3319:.
3315:.
3286:.
3264:.
3256:.
3246:44
3244:.
3199:.
3162:.
3132:.
3102:.
3072:.
3050:.
3010:.
3002:.
2994:.
2984:26
2982:.
2978:.
2947:.
2871:.
2861:65
2859:.
2834:,
2799:.
2785:.
2752:.
2741:;
2728:^
2675:;
2671:;
2646:.
2613:.
2600:^
2496:/
1713:.
1600:.
1533:.
1425:.
1299:,
1295:,
1252:AI
1236:,
1232:,
1228:,
1168:,
1132:,
1089:)
1069:,
1065:,
1034:.
1014:,
981:,
890:ML
5042:.
5022:e
5015:t
5008:v
4675:/
4509:e
4502:t
4495:v
4296:/
4215:e
4208:t
4201:v
4121:)
4115:.
3878:.
3848:.
3818:.
3788:.
3728:.
3659:.
3641::
3590:.
3545:.
3512:.
3482:5
3409:.
3387::
3330:.
3301:.
3272:.
3252::
3207::
3178:.
3148:.
3118:.
3088:.
3058:.
3021:.
2990::
2963:.
2929:.
2904:.
2879:.
2867::
2842::
2836:1
2767:.
2722:.
2694:.
2657:.
2631:.
2504:)
2242:.
2232:.
2216:.
2206:.
2196:.
2170:.
2160:.
2144:.
2059:.
2045:R
2000:(
1128:(
1061:(
1002:"
959:e
952:t
945:v
525:k
374:k
301:k
259:)
247:(
34:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.