Knowledge (XXG)

Data mining

Source đź“ť

1480: 5977: 1870:(2001), the UK exception only allows content mining for non-commercial purposes. UK copyright law also does not allow this provision to be overridden by contractual terms and conditions. Since 2020 also Switzerland has been regulating data mining by allowing it in the research field under certain conditions laid down by art. 24d of the Swiss Copyright Act. This new article entered into force on 1 April 2020. 5987: 5997: 1819:, "'n practice, HIPAA may not offer any greater protection than the longstanding regulations in the research arena,' says the AAHC. More importantly, the rule's goal of protection through informed consent is approach a level of incomprehensibility to average individuals." This underscores the necessity for data anonymity in data aggregation and mining practices. 1446:(dependency modeling) – Searches for relationships between variables. For example, a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis. 1722:, but a result of the preparation of data before—and for the purposes of—the analysis. The threat to an individual's privacy comes into play when the data, once compiled, cause the data miner, or anyone who has access to the newly compiled data set, to be able to identify specific individuals, especially when the data were originally anonymous. 1877:
facilitated stakeholder discussion on text and data mining in 2013, under the title of Licences for Europe. The focus on the solution to this legal issue, such as licensing rather than limitations and exceptions, led to representatives of universities, researchers, libraries, civil society groups and
1511:
The final step of knowledge discovery from data is to verify that the patterns produced by the data mining algorithms occur in the wider data set. Not all patterns found by the algorithms are necessarily valid. It is common for data mining algorithms to find patterns in the training set which are not
1798:
In the United Kingdom in particular there have been cases of corporations using data mining as a way to target certain groups of customers forcing them to pay unfairly high prices. These groups tend to be people of lower socio-economic status who are not savvy to the ways they can be exploited in
1631:-based language developed by the Data Mining Group (DMG) and supported as exchange format by many data mining applications. As the name suggests, it only covers prediction models, a particular data mining task of high importance to business applications. However, extensions to cover (for example) 1404:
Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while remaining concise enough to be mined within an acceptable time limit. A
1178:
refer to the use of data mining methods to sample parts of a larger population data set that are (or may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered. These methods can, however, be used in creating new hypotheses to test against the
1573: 1536:
If the learned patterns do not meet the desired standards, it is necessary to re-evaluate and change the pre-processing and data mining steps. If the learned patterns do meet the desired standards, then the final step is to interpret the learned patterns and turn them into knowledge.
1520:
of data on which the data mining algorithm was not trained. The learned patterns are applied to this test set, and the resulting output is compared to the desired output. For example, a data mining algorithm trying to distinguish "spam" from "legitimate" e-mails would be trained on a
1897:, upholds the legality of content mining in America, and other fair use countries such as Israel, Taiwan and South Korea. As content mining is transformative, that is it does not supplant the original work, it is viewed as being lawful under fair use. For example, as part of the 1290:
have grown in size and complexity, direct "hands-on" data analysis has increasingly been augmented with indirect, automated data processing, aided by other discoveries in computer science, specially in the field of machine learning, such as
1757:" data sets can potentially contain enough information to allow identification of individuals, as occurred when journalists were able to find several individuals based on a set of search histories that were inadvertently released by AOL. 1717:
involves combining data together (possibly from various sources) in a way that facilitates analysis (but that also might make identification of private, individual-level data deducible or otherwise apparent). This is not data mining
3011: 1901:
the presiding judge on the case ruled that Google's digitization project of in-copyright books was lawful, in part because of the transformative uses that the digitization project displayed—one being text and data mining.
1396:. However, 3–4 times as many people reported using CRISP-DM. Several teams of researchers have published reviews of data mining process models, and Azevedo and Santos conducted a comparison of CRISP-DM and SEMMA in 2008. 1001:
with an overall goal of extracting information (with intelligent methods) from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the
1487:
through a bot operated by statistician Tyler Vigen, apparently showing a close link between the best word winning a spelling bee competition and the number of people in the United States killed by venomous
1850:) without the permission of the copyright owner is not legal. Where a database is pure data in Europe, it may be that there is no copyright—but database rights may exist, so data mining becomes subject to 1323:
by exploiting the way data is stored and indexed in databases to execute the actual learning and discovery algorithms more efficiently, allowing such methods to be applied to ever-larger data sets.
1440:(outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or data errors that require further investigation due to being out of standard range. 1148:. Neither the data collection, data preparation, nor result interpretation and reporting is part of the data mining step, although they do belong to the overall KDD process as additional steps. 1216:
appeared around 1990 in the database community, with generally positive connotations. For a short time in 1980s, the phrase "database mining"™, was used, but since it was trademarked by HNC, a
1768:, the patrons of Walgreens filed a lawsuit against the company in 2011 for selling prescription information to data mining companies who in turn provided the data to pharmaceutical companies. 4177: 4151: 1616:
standard (JDM 1.0). Development on successors to these processes (CRISP-DM 2.0 and JDM 2.0) was active in 2006 but has stalled since. JDM 2.0 was withdrawn without reaching a final draft.
884: 3754: 1529:
been trained. The accuracy of the patterns can then be measured from how many e-mails they correctly classify. Several statistical methods may be used to evaluate the algorithm, such as
2468: 922: 2096:: The UIMA (Unstructured Information Management Architecture) is a component framework for analyzing unstructured content such as text, audio and video – originally developed by IBM. 1815:(HIPAA). The HIPAA requires individuals to give their "informed consent" regarding information they provide and its intended present and future uses. According to an article in 1764:
leading to the provider violates Fair Information Practices. This indiscretion can cause financial, emotional, or bodily harm to the indicated individual. In one instance of
2586: 1826:(FERPA) applies only to the specific areas that each such law addresses. The use of data mining by the majority of businesses in the U.S. is not controlled by any legislation. 879: 3829: 1159:, regardless of the amount of data. In contrast, data mining uses machine learning and statistical models to uncover clandestine or hidden patterns in a large volume of data. 1812: 869: 1508:, but the same problem can arise at different phases of the process and thus a train/test split—when applicable at all—may not be sufficient to prevent this from happening. 710: 1911: 1568: 1597: 1582: 2488: 1209:
in 1983. Lovell indicates that the practice "masquerades under a variety of aliases, ranging from "experimentation" (positive) to "fishing" or "snooping" (negative).
3283: 2128:: an integrated software application for data mining, business intelligence, and modeling that implements the Learning and Intelligent OptimizatioN (LION) approach. 1609: 1458:– is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as "legitimate" or as "spam". 1358: 1286:(1800s). The proliferation, ubiquity and increasing power of computer technology have dramatically increased data collection, storage, and manipulation ability. As 3918: 3320: 917: 3192: 3167: 1866:. The UK was the second country in the world to do so after Japan, which introduced an exception in 2009 for data mining. However, due to the restriction of the 3867: 1780: 874: 725: 2952: 1492:
Data mining can unintentionally be misused, producing results that appear to be significant but which do not actually predict future behavior and cannot be
1315:(1990s). Data mining is the process of applying these methods with the intention of uncovering hidden patterns. in large data sets. It bridges the gap from 456: 5020: 957: 760: 1144:. For example, the data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a 1140:. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and 4507: 1955: 1823: 1452:– is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data. 3807: 3450: 2815: 1553:). Since 1989, this ACM SIG has hosted an annual international conference and published its proceedings, and since 1999 it has published a biannual 1898: 1863: 836: 3432: 385: 4045:
Poncelet, Pascal; Masseglia, Florent; and Teisseire, Maguelonne (editors) (October 2007); "Data Mining Patterns: New Methods and Applications",
3778: 3751: 3565: 2753: 5730: 5702: 1496:
on a new sample of data, therefore bearing little use. This is sometimes caused by investigating too many hypotheses and not performing proper
1783:, developed between 1998 and 2000, currently effectively expose European users to privacy exploitation by U.S. companies. As a consequence of 5755: 4110: 4082: 4054: 4040: 3945: 3396: 3137: 3107: 3045: 2924: 2794: 2717: 2348: 3708: 5606: 2647: 1667:
While the term "data mining" itself may have no ethical implications, it is often associated with the mining of information in relation to
1589: 894: 657: 192: 1464:– attempts to find a function that models the data with the least error that is, for estimating the relationships among data or datasets. 5760: 5039: 2614: 1840: 1761: 1546: 912: 5272: 4979: 1921:
The following applications are available under free/open-source licenses. Public access to application source code is also available.
1116:
or automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records (
1027: 745: 720: 669: 3646: 5912: 5740: 5277: 4068: 4011: 3990: 3905: 3837: 3671: 2899: 1530: 793: 788: 441: 1155:
and data mining is that data analysis is used to test models and hypotheses on the dataset, e.g., analyzing the effectiveness of a
3223: 6000: 5101: 3473: 3077: 1624: 451: 89: 1389:
Polls conducted in 2002, 2004, 2007 and 2014 show that the CRISP-DM methodology is the leading methodology used by data miners.
5388: 3531: 2680: 2553: 1023: 5641: 2801:
Thus, data mining should have been more appropriately named "knowledge mining from data," which is unfortunately somewhat long
846: 5679: 5305: 5013: 4885: 4213: 2166:: platform for automation of engineering simulation and analysis, multidisciplinary optimization and data mining provided by 1788: 1497: 950: 610: 431: 3379:
GĂĽnnemann, Stephan; Kremer, Hardy; Seidl, Thomas (2011). "An extension of the PMML standard to subspace clustering models".
3159: 2975: 1258:
communities. However, the term data mining became more popular in the business and press communities. Currently, the terms
5821: 5798: 5528: 5518: 4939: 4500: 3995: 3291: 2328: 2029: 2005: 1971: 1867: 1125: 821: 523: 299: 1779:
has rather strong privacy laws, and efforts are underway to further strengthen the rights of the consumers. However, the
5902: 5490: 5398: 5310: 5086: 5071: 4864: 4565: 2461: 1981: 1650: 778: 715: 625: 603: 446: 436: 3915: 3362: 3312: 1791:, there has been increased discussion to revoke this agreement, as in particular the data will be fully exposed to the 5990: 5725: 5230: 4590: 3189: 2621: 2137: 2103: 1975: 1959: 1949: 1691: 929: 841: 826: 287: 109: 3859: 816: 3605: 1479: 5962: 5611: 4974: 4890: 4585: 4550: 3955: 3937: 2323: 2288: 2219: 2079: 1455: 1205: 889: 566: 461: 249: 182: 142: 1525:
of sample e-mails. Once trained, the learned patterns would be applied to the test set of e-mails on which it had
5980: 5907: 5882: 5745: 5393: 5006: 4843: 4743: 4156: 3287: 2944: 2940: 2742: 2427: 2417: 2333: 2278: 2015: 1443: 1292: 1243: 1133: 943: 549: 317: 187: 3343: 1417:
data sets before data mining. The target set is then cleaned. Data cleaning removes the observations containing
6026: 5831: 5664: 5257: 5126: 4764: 4759: 4717: 4493: 4297: 4293: 2548: 2483: 2068: 2044: 2001: 1792: 571: 491: 414: 332: 162: 124: 119: 79: 74: 1608:
There have been some efforts to define standards for the data mining process, for example, the 1999 European
5892: 5826: 5717: 5533: 5200: 4833: 4222: 4118: 2568: 2543: 2478: 2455: 2412: 2353: 2099: 2004:): A suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the 1668: 1656: 1644: 1467: 1414: 1312: 1145: 1129: 1078: 1074: 518: 367: 267: 94: 4182: 3695: ... issued a decision that invalidated Safe Harbor (effective immediately), as currently implemented. 5957: 5788: 5669: 5436: 5426: 5421: 4859: 4656: 4634: 4324: 2538: 1304: 1066: 698: 674: 576: 337: 312: 272: 84: 3799: 3458: 2812: 1470:– providing a more compact representation of the data set, including visualization and report generation. 5927: 5897: 5887: 5783: 5697: 5573: 5513: 5480: 5470: 5360: 5325: 5315: 5252: 5121: 5096: 5091: 5056: 4951: 4818: 4560: 2397: 2235: 1984:: cross-platform tool for regression and classification problems based on a Genetic Programming variant. 1851: 1082: 1019: 652: 474: 426: 282: 197: 69: 3421: 1859: 1574:
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
3560: 2746: 6021: 5687: 5659: 5631: 5626: 5455: 5431: 5383: 5368: 5350: 5340: 5335: 5297: 5247: 5242: 5159: 5105: 4823: 4738: 4606: 4555: 4454: 3774: 3583: 2533: 2432: 2382: 2363: 2083: 2048: 1620: 1238: 1196: 1141: 581: 531: 5952: 5877: 5793: 5778: 5543: 5330: 5287: 5282: 5179: 5169: 5141: 4774: 4712: 4629: 4580: 4394: 4379: 4307: 3099: 2563: 2528: 2338: 1874: 1674:
The ways in which data mining can be used can in some cases and contexts raise questions regarding
1632: 1461: 1320: 1283: 1011: 1003: 684: 620: 591: 496: 322: 255: 241: 227: 202: 152: 104: 64: 3717: 3129: 5917: 5816: 5692: 5649: 5558: 5500: 5485: 5475: 5267: 5066: 4206: 3978: 3402: 3265: 3003: 2872: 2676: 2643: 2473: 2313: 2157: 2153: 1855: 1843: 1754: 1710: 1386:
or a simplified process such as (1) Pre-processing, (2) Data Mining, and (3) Results Validation.
1316: 1300: 1156: 662: 586: 372: 167: 1862:, this led to the UK government to amend its copyright law in 2014 to allow content mining as a 1220:–based company, to pitch their Database Mining Workstation; researchers consequently turned to 5937: 5867: 5846: 5808: 5616: 5583: 5563: 5262: 5174: 5048: 4686: 4384: 4374: 4238: 4106: 4078: 4064: 4050: 4036: 4007: 3986: 3974: 3941: 3901: 3493: 3485: 3392: 3257: 3051: 3041: 2995: 2920: 2895: 2790: 2713: 2672: 2497: 2303: 2273: 2021: 1945: 1765: 1701:
Data mining requires data preparation which uncovers information or patterns which compromise
1683: 1659:
can be found throughout business, medicine, science, finance, construction, and surveillance.
1437: 1418: 1380: 1279: 1121: 1015: 990: 755: 598: 511: 307: 277: 222: 217: 172: 114: 3497: 1968:: The Konstanz Information Miner, a user-friendly and comprehensive data analytics framework. 5770: 5654: 5621: 5416: 5345: 5234: 5220: 5215: 5164: 5151: 5076: 5029: 4869: 4575: 4334: 4283: 4268: 4248: 4233: 3638: 3384: 3249: 3204: 2987: 2864: 2839: 2523: 2318: 2293: 2283: 2245: 2183: 2087: 2025: 1941: 1890: 1714: 1613: 1554: 1501: 1449: 1296: 1255: 1246:
coined the term "knowledge discovery in databases" for the first workshop on the same topic
1117: 1031: 994: 978: 783: 536: 486: 396: 380: 350: 212: 207: 157: 147: 45: 5841: 5735: 5707: 5601: 5553: 5538: 5523: 5378: 5373: 5320: 5210: 5184: 5136: 5081: 4769: 4733: 4672: 4624: 4464: 4399: 4389: 4359: 4302: 4273: 4263: 4172: 4146: 4092: 4019: 4003: 3922: 3758: 3623: 3609: 3569: 3366: 3347: 3227: 3196: 2819: 2786: 2343: 2308: 2229: 1931: 1702: 1687: 1564: 1493: 1062: 1058: 1007: 986: 811: 615: 481: 421: 3892:
Cabena, Peter; Hadjnian, Pablo; Stadler, Rolf; Verhees, Jaap; Zanasi, Alessandro (1997);
3675: 5947: 5851: 5750: 5596: 5568: 4516: 4474: 4469: 4434: 4414: 4409: 4364: 4339: 4258: 4032: 3316: 3230:. In Proceedings of the IADIS European Conference on Data Mining 2008, pp 182–185. 3220: 3069: 3034: 2437: 2422: 2407: 2392: 2189: 1784: 1776: 1578: 1410: 1308: 1200: 1070: 831: 362: 99: 31: 3501: 6015: 5836: 5131: 4946: 4691: 4439: 4429: 4404: 4278: 4243: 4199: 4167: 4141: 4098: 3970: 3897: 3524: 3203:. Volume 21 Issue 1, March 2006, pp 1–24, Cambridge University Press, New York, 2668: 2558: 2402: 2298: 2193: 2075: 1484: 1199:
hypothesis. The term "data mining" was used in a similarly critical way by economist
1174: 1164: 1152: 1137: 1091: 1046: 750: 679: 561: 292: 177: 3674:. Washington, D.C. Congressional Research Service. p. 6. R44257. Archived from 3406: 3269: 2684: 5932: 5591: 4958: 4895: 4779: 4449: 4444: 4424: 4419: 4349: 4344: 4319: 4312: 4288: 3911: 3007: 2738: 2573: 2501: 2493: 2268: 2199: 2150:: suite of multilingual text and entity analytics products that enable data mining. 2062: 1706: 1522: 1422: 1319:
and artificial intelligence (which usually provide the mathematical background) to
1006:" process, or KDD. Aside from the raw analysis step, it also involves database and 3350:, International Conferences on Knowledge Discovery and Data Mining, ACM, New York. 1278:
has occurred for centuries. Early methods of identifying patterns in data include
3642: 5922: 5460: 4570: 4329: 2358: 2131: 2072: 2056: 2052: 1879: 1808: 1513: 1505: 556: 1195:
to refer to what they considered the bad practice of analyzing data without an
5942: 5872: 5465: 5205: 5061: 4934: 4707: 3960: 3474:"Data Mining and Domestic Security: Connecting the Dots to Make Sense of Data" 3208: 2991: 2843: 2778: 2705: 2681:"The Elements of Statistical Learning: Data Mining, Inference, and Prediction" 2442: 2249: 2209: 2179: 2167: 2125: 2065:: An open-source machine learning library for the Python programming language; 1847: 1549:'s (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining ( 1113: 998: 982: 705: 401: 327: 30:"Web mining" redirects here. For web browser-based cryptocurrency mining, see 3860:"Judge grants summary judgment in favor of Google Books – a fair use victory" 2999: 2514:
For more information about extracting information out of data (as opposed to
1753:
anonymous, so that individuals may not readily be identified. However, even "
5447: 5408: 4838: 4681: 4616: 4459: 4354: 3983:
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
3742:, Biotech Business Week, retrieved 17 November 2009 from LexisNexis Academic 3489: 3388: 3359: 3163: 3133: 3103: 3073: 3055: 2377: 2141: 1406: 1217: 1097: 864: 645: 3562:
Think Before You Dig: Privacy Implications of Data Mining & Aggregation
3261: 2610: 1247: 3952:
High Performance Data Mining: Scaling Algorithms, Applications and Systems
2917:
Machine Learning Forensics for Law Enforcement, Security, and Intelligence
1990:: a collection of ready-to-use machine learning algorithms written in the 17: 5508: 4676: 4611: 4545: 4253: 3600: 2387: 2213: 1894: 1795:, and attempts to reach an agreement with the United States have failed. 1739:
Who will be able to mine the data and use the data and their derivatives.
1517: 1287: 1054: 1038: 974: 1733:
The purpose of the data collection and any (known) data mining projects.
3830:"Text and Data Mining:Its importance and the need for change in Europe" 3381:
Proceedings of the 2011 workshop on Predictive markup language modeling
2876: 2225: 2173: 2038:: Data mining and statistics software under the GNU Project similar to 1925: 1675: 1042: 640: 3253: 1655:
Data mining is used wherever there is digital data available. Notable
4186: 4160: 3340: 2617: 2239: 2163: 2147: 2119: 2115:
The following applications are available under proprietary licenses.
2011: 1987: 1695: 1679: 1593: 1550: 391: 2868: 1974:: a real-time big data stream mining with concept drift tool in the 1682:. In particular, data mining government or commercial data sets for 4998: 3752:
UK Researchers Given Data Mining Right Under New UK Copyright Laws.
2102:: A suite of machine learning software applications written in the 1991: 4651: 4646: 4641: 4485: 1965: 1807:
In the United States, privacy concerns have been addressed by the
1478: 1393: 1357:
It exists, however, in many variations on this theme, such as the
635: 630: 357: 2222:: Visualisation-oriented data mining software, also for teaching. 27:
Process of extracting and discovering patterns in large data sets
3692: 3190:"A survey of Knowledge Discovery and Data Mining process models" 2890:
Charemza, Wojciech W.; Deadman, Derek F. (1992). "Data Mining".
2093: 2039: 2035: 1997: 1937: 1275: 5002: 4916: 4800: 4527: 4489: 4195: 4029:
Handbook of Statistical Analysis & Data Mining Applications
3709:"UK companies targeted for using big data to exploit customers" 3313:"Google Scholar: Top publications - Data Mining & Analysis" 2945:"Lesson: Data Mining, and Knowledge Discovery: An Introduction" 973:
is the process of extracting and discovering patterns in large
4000:
Web Data Mining: Exploring Hyperlinks, Contents and Usage Data
2203: 1628: 4059:
Tan, Pang-Ning; Steinbach, Michael; and Kumar, Vipin (2005);
3624:"Big data's impact on privacy, security and consumer welfare" 3160:"What main methodology are you using for data mining (2014)?" 3130:"What main methodology are you using for data mining (2007)?" 3100:"What main methodology are you using for data mining (2004)?" 3070:"What main methodology are you using for data mining (2002)?" 1619:
For exchanging the extracted models—in particular for use in
1392:
The only other data mining standard named in these polls was
1057:
and is frequently applied to any form of large-scale data or
4103:
Data Mining: Practical Machine Learning Tools and Techniques
4073:
Theodoridis, Sergios; and Koutroumbas, Konstantinos (2009);
4017:
Murphy, Chris (16 May 2011). "Is Data Mining Free Speech?".
3800:"Licences for Europe – Structured Stakeholder Dialogue 2013" 3672:"U.S.–E.U. Data Privacy: From Safe Harbor to Privacy Shield" 3740:
BIOMEDICINE; HIPAA Privacy Rule Impedes Biomedical Research
3284:"Microsoft Academic Search: Top conferences in data mining" 1822:
U.S. information privacy legislation such as HIPAA and the
1251: 1187:
In the 1960s, statisticians and economists used terms like
1136:). This usually involves using database techniques such as 2919:. Boca Raton, FL: CRC Press (Taylor & Francis Group). 2134:: data and text mining software by Megaputer Intelligence. 1882:
publishers to leave the stakeholder dialogue in May 2013.
1073:, analysis, and statistics) as well as any application of 1026:
considerations, post-processing of discovered structures,
3422:"The Promise and Pitfalls of Data Mining: Ethical Issues" 3240:
Hawkins, Douglas M (2004). "The problem of overfitting".
2469:
Automatic number plate recognition in the United Kingdom
2055:
computing, data mining, and graphics. It is part of the
923:
List of datasets in computer vision and image processing
4191: 3894:
Discovering Data Mining: From Concept to Implementation
3451:"The End of Illegal Domestic Spying? Don't Count on It" 2830:
Olson, D. L. (2007). Data mining in business services.
1709:
obligations. A common way for this to occur is through
3036:
Data Mining: Concepts, Models, Methods, and Algorithms
2747:"From Data Mining to Knowledge Discovery in Databases" 1917:
Free open-source data mining software and applications
1742:
The status of security surrounding access to the data.
3242:
Journal of Chemical Information and Computer Sciences
2813:
OKAIRP 2005 Fall Conference, Arizona State University
1560:
Computer science conferences on data mining include:
3916:
Data mining: an overview from a database perspective
2644:"Encyclopædia Britannica: Definition of Data Mining" 2587:
International Journal of Data Warehousing and Mining
2252:
for creating & productionising custom ML models.
5860: 5807: 5769: 5716: 5678: 5640: 5582: 5499: 5445: 5407: 5359: 5296: 5229: 5193: 5150: 5114: 5047: 4967: 4927: 4878: 4852: 4811: 4752: 4726: 4700: 4665: 4599: 4538: 3525:"A Framework for Mining Instant Messaging Services" 1934:: A chemical structure miner and web search engine. 1813:
Health Insurance Portability and Accountability Act
1811:via the passage of regulatory controls such as the 3950:Guo, Yike; and Grossman, Robert (editors) (1999); 3716: 3670:Weiss, Martin A.; Archick, Kristin (19 May 2016). 3033: 1912:Category:Data mining and machine learning software 1569:Conference on Information and Knowledge Management 1545:The premier professional body in the field is the 1433:Data mining involves six common classes of tasks: 1045:and knowledge from large amounts of data, not the 4027:Nisbet, Robert; Elder, John; Miner, Gary (2009); 3927:Knowledge and data Engineering, IEEE Transactions 2111:Proprietary data-mining software and applications 1598:International Conference on Very Large Data Bases 1583:Conference on Knowledge Discovery and Data Mining 4101:; Frank, Eibe; Hall, Mark A. (30 January 2011). 1512:present in the general data set. This is called 4087:Weiss, Sholom M.; and Indurkhya, Nitin (1998); 3738:Biotech Business Week Editors (June 30, 2008); 1928:: Text and search results clustering framework. 1846:, the mining of in-copyright works (such as by 1725:It is recommended to be aware of the following 1610:Cross Industry Standard Process for Data Mining 1359:Cross-industry standard process for data mining 1940:: A university research project with advanced 1333:knowledge discovery in databases (KDD) process 918:List of datasets for machine-learning research 5038:Note: This template roughly follows the 2012 5014: 4980:Data warehousing products and their producers 4501: 4207: 2212:Data Miner: data mining software provided by 1635:have been proposed independently of the DMG. 1413:. Pre-processing is essential to analyze the 951: 8: 3221:KDD, SEMMA and CRISP-DM: a parallel overview 2489:Quantitative structure–activity relationship 1588:Data mining topics are also present in many 2894:. Aldershot: Edward Elgar. pp. 14–31. 5021: 5007: 4999: 4924: 4913: 4808: 4797: 4535: 4524: 4508: 4494: 4486: 4214: 4200: 4192: 3834:Association of European Research Libraries 3478:Columbia Science and Technology Law Review 2855:Lovell, Michael C. (1983). "Data Mining". 1516:. To overcome this, the evaluation uses a 958: 944: 36: 3556: 3554: 3552: 1854:owners' rights that are protected by the 1824:Family Educational Rights and Privacy Act 1250:and this term became more popular in the 1022:considerations, interestingness metrics, 977:involving methods at the intersection of 4176:) is being considered for deletion. See 4150:) is being considered for deletion. See 2733: 2731: 2729: 2238:: automated custom ML models managed by 1112:The actual data mining task is the semi- 3866:. Antonelli Law Ltd. 19 November 2013. 3572:, NASCIO Research Brief, September 2004 2976:"Data mining: past, present and future" 2708:; Kamber, Micheline; Pei, Jian (2011). 2597: 2192:: data mining software provided by the 1274:The manual extraction of patterns from 1101:—or, when referring to actual methods, 44: 5731:Knowledge representation and reasoning 3932:Feldman, Ronen; Sanger, James (2007); 3602:AOL search data identified individuals 2892:New Directions in Econometric Practice 2857:The Review of Economics and Statistics 1893:, and in particular its provision for 1500:. A simple version of this problem in 1041:because the goal is the extraction of 5756:Philosophy of artificial intelligence 3870:from the original on 29 November 2014 3140:from the original on 17 November 2012 2605: 2603: 2601: 2176:Omics Explorer: data mining software. 1361:(CRISP-DM) which defines six phases: 1335:is commonly defined with the stages: 7: 5082:Energy consumption (Green computing) 3965:Data mining: concepts and techniques 3472:Taipale, Kim A. (15 December 2003). 3431:. American Statistical Association. 3429:ASA Section on Government Statistics 3110:from the original on 8 February 2017 3080:from the original on 16 January 2017 2783:Data mining: concepts and techniques 2710:Data Mining: Concepts and Techniques 2024:: A component-based data mining and 1590:data management/database conferences 5761:Distributed artificial intelligence 5040:ACM Computing Classification System 2955:from the original on 30 August 2012 2228:: data mining software provided by 2202:: data mining software provided by 2140:: data mining software provided by 1762:personally identifiable information 1749:Data may also be modified so as to 1547:Association for Computing Machinery 913:Glossary of artificial intelligence 5273:Integrated development environment 4865:MultiDimensional eXpressions (MDX) 3963:, Micheline Kamber, and Jian Pei. 3810:from the original on 23 March 2013 3170:from the original on 1 August 2016 1745:How collected data can be updated. 25: 5741:Automated planning and scheduling 5278:Software configuration management 4180:to help reach a consensus. â€ş 4154:to help reach a consensus. â€ş 2712:(3rd ed.). Morgan Kaufmann. 2122:KnowledgeSTUDIO: data mining tool 5995: 5985: 5976: 5975: 3584:"Don't Build a Database of Ruin" 3438:from the original on 2022-10-09. 3201:The Knowledge Engineering Review 3188:Lukasz Kurgan and Petr Musilek: 2980:The Knowledge Engineering Review 2274:Anomaly/outlier/change detection 2086:framework with wide support for 1781:U.S.–E.U. Safe Harbor Principles 1625:Predictive Model Markup Language 1085:. Often the more general terms ( 1075:computer decision support system 1004:knowledge discovery in databases 5986: 5389:Computational complexity theory 4077:, 4th Edition, Academic Press, 3781:from the original on 2021-12-16 3652:from the original on 2018-06-19 3537:from the original on 2022-10-09 3323:from the original on 2023-02-10 3014:from the original on 2023-07-02 2759:from the original on 2022-10-09 2650:from the original on 2011-02-05 2624:from the original on 2013-10-14 2554:Profiling (information science) 1858:. On the recommendation of the 1698:, has raised privacy concerns. 1483:An example of data produced by 1203:in an article published in the 5180:Network performance evaluation 4886:Business intelligence software 4765:Extract, load, transform (ELT) 4760:Extract, transform, load (ETL) 4128:, Mahwah, NJ: Lawrence Erlbaum 3219:Azevedo, A. and Santos, M. F. 2248:: managed service provided by 2028:software suite written in the 1962:and language engineering tool. 1886:Situation in the United States 1803:Situation in the United States 1789:global surveillance disclosure 1760:The inadvertent revelation of 1557:titled "SIGKDD Explorations". 1498:statistical hypothesis testing 333:Relevance vector machine (RVM) 1: 5544:Multimedia information system 5529:Geographic information system 5519:Enterprise information system 5115:Computer systems organization 4834:Decision support system (DSS) 4047:Information Science Reference 3707:Parker, George (2018-09-30). 3449:Pitts, Chip (15 March 2007). 2642:Clifton, Christopher (2010). 2329:Multilinear subspace learning 2051:and software environment for 1972:Massive Online Analysis (MOA) 1868:Information Society Directive 1592:such as the ICDE Conference, 1081:(e.g., machine learning) and 822:Computational learning theory 386:Expectation–maximization (EM) 5903:Computational social science 5491:Theoretical computer science 5311:Software development process 5087:Electronic design automation 5072:Very Large Scale Integration 4860:Data Mining Extensions (DMX) 4157:Knowledge Discovery Software 3643:10.1016/j.telpol.2014.10.002 3612:, SecurityFocus, August 2006 2974:Coenen, Frans (2011-02-07). 2781:; Kamber, Micheline (2001). 2462:Category:Applied data mining 2186:and data mining experiments. 1651:Category:Applied data mining 1612:(CRISP-DM 1.0) and the 2004 1405:common source for data is a 1037:The term "data mining" is a 779:Coefficient of determination 626:Convolutional neural network 338:Support vector machine (SVM) 5726:Natural language processing 5514:Information storage systems 4621:Ensemble modeling patterns 4591:Single version of the truth 4126:The Handbook of Data Mining 4061:Introduction to Data Mining 3032:Kantardzic, Mehmed (2003). 2949:Introduction to Data Mining 2138:Microsoft Analysis Services 1960:natural language processing 1692:Total Information Awareness 1663:Privacy concerns and ethics 1224:. Other terms used include 930:Outline of machine learning 827:Empirical risk minimization 6043: 5642:Human–computer interaction 5612:Intrusion detection system 5524:Social information systems 5509:Database management system 4975:Comparison of OLAP servers 3956:Kluwer Academic Publishers 3938:Cambridge University Press 3764:Retrieved 14 November 2014 2941:Piatetsky-Shapiro, Gregory 2745:; Smyth, Padhraic (1996). 2743:Piatetsky-Shapiro, Gregory 2459: 2453: 2324:Learning classifier system 2156:: data mining software by 1909: 1736:How the data will be used. 1648: 1642: 1353:Interpretation/evaluation. 1266:are used interchangeably. 1206:Review of Economic Studies 567:Feedforward neural network 318:Artificial neural networks 29: 5971: 5908:Computational engineering 5883:Computational mathematics 5036: 4923: 4912: 4844:Data warehouse automation 4807: 4796: 4534: 4529:Creating a data warehouse 4523: 4229: 3631:Telecommunications Policy 3420:Seltzer, William (2005). 3288:Microsoft Academic Search 3209:10.1017/S0269888906000737 3040:. John Wiley & Sons. 2992:10.1017/S0269888910000378 2844:10.1007/s11628-006-0014-7 2428:Exploratory data analysis 2418:Domain driven data mining 2279:Association rule learning 2082:programming language and 1690:purposes, such as in the 1671:(ethical and otherwise). 1623:—the key standard is the 1444:Association rule learning 1244:Gregory Piatetsky-Shapiro 1179:larger data populations. 1134:sequential pattern mining 550:Artificial neural network 5918:Computational healthcare 5913:Differentiable computing 5832:Graphics processing unit 5258:Domain-specific language 5127:Computational complexity 4183:Data Mining Tool Vendors 4178:templates for discussion 4152:templates for discussion 4105:(3 ed.). Elsevier. 3967:. Morgan kaufmann, 2006. 3934:The Text Mining Handbook 3691:On October 6, 2015, the 2611:"Data Mining Curriculum" 2549:Named-entity recognition 2484:National Security Agency 2349:Structured data analysis 2002:Natural Language Toolkit 1864:limitation and exception 1793:National Security Agency 859:Journals and conferences 806:Mathematical foundations 716:Temporal difference (TD) 572:Recurrent neural network 492:Conditional random field 415:Dimensionality reduction 163:Dimensionality reduction 125:Quantum machine learning 120:Neuromorphic engineering 80:Self-supervised learning 75:Semi-supervised learning 5893:Computational chemistry 5827:Photograph manipulation 5718:Artificial intelligence 5534:Decision support system 4870:XML for Analysis (XMLA) 3588:Harvard Business Review 3389:10.1145/2023598.2023605 2943:; Parker, Gary (2011). 2569:Surveillance capitalism 2544:Information integration 2479:Educational data mining 2456:Examples of data mining 2413:Decision support system 2354:Support vector machines 1948:methods written in the 1799:digital market places. 1657:examples of data mining 1645:Examples of data mining 1313:support vector machines 1151:The difference between 1146:decision support system 1130:association rule mining 1109:—are more appropriate. 1103:artificial intelligence 1079:artificial intelligence 268:Apprenticeship learning 5958:Educational technology 5789:Reinforcement learning 5539:Process control system 5437:Computational geometry 5427:Algorithmic efficiency 5422:Analysis of algorithms 5077:Systems on Chip (SoCs) 4802:Using a data warehouse 4657:Operational data store 4089:Predictive Data Mining 2539:Information extraction 1899:Google Book settlement 1489: 1365:Business understanding 1230:information harvesting 1059:information processing 817:Bias–variance tradeoff 699:Reinforcement learning 675:Spiking neural network 85:Reinforcement learning 5928:Electronic publishing 5898:Computational biology 5888:Computational physics 5784:Unsupervised learning 5698:Distributed computing 5574:Information retrieval 5481:Mathematical analysis 5471:Mathematical software 5361:Theory of computation 5326:Software construction 5316:Requirements analysis 5194:Software organization 5122:Computer architecture 5092:Hardware acceleration 5057:Printed circuit board 4819:Business intelligence 3757:June 9, 2014, at the 3622:Kshetri, Nir (2014). 2398:Business intelligence 2236:Google Cloud Platform 2182:: An environment for 2106:programming language. 1978:programming language. 1852:intellectual property 1817:Biotech Business Week 1482: 1234:information discovery 1083:business intelligence 653:Neural radiance field 475:Structured prediction 198:Structured prediction 70:Unsupervised learning 5688:Concurrent computing 5660:Ubiquitous computing 5632:Application security 5627:Information security 5456:Discrete mathematics 5432:Randomized algorithm 5384:Computability theory 5369:Model of computation 5341:Software maintenance 5336:Software engineering 5298:Software development 5248:Programming language 5243:Programming paradigm 5160:Network architecture 4635:Focal point modeling 4607:Column-oriented DBMS 4556:Dimensional modeling 4395:Protection (privacy) 3455:Washington Spectator 2915:Mena, JesĂşs (2011). 2534:Electronic discovery 2450:Application examples 2433:Predictive analytics 2383:Behavior informatics 2364:Time series analysis 2190:SAS Enterprise Miner 2084:scientific computing 2049:programming language 1729:data are collected: 1627:(PMML), which is an 1621:predictive analytics 1239:knowledge extraction 1142:predictive analytics 1120:), unusual records ( 989:. Data mining is an 842:Statistical learning 740:Learning with humans 532:Local outlier factor 5963:Document management 5953:Operations research 5878:Enterprise software 5794:Multi-task learning 5779:Supervised learning 5501:Information systems 5331:Software deployment 5288:Software repository 5142:Real-time computing 4940:Information factory 4713:Early-arriving fact 4630:Data vault modeling 4581:Reverse star schema 4075:Pattern Recognition 3910:M.S. Chen, J. Han, 3840:on 29 November 2014 3804:European Commission 3360:SIGKDD Explorations 2564:Social media mining 2529:Data transformation 2371:Application domains 2339:Regression analysis 1875:European Commission 1835:Situation in Europe 1772:Situation in Europe 1633:subspace clustering 1321:database management 1284:regression analysis 1264:knowledge discovery 1012:data pre-processing 685:Electrochemical RAM 592:reservoir computing 323:Logistic regression 242:Supervised learning 228:Multimodal learning 203:Feature engineering 148:Generative modeling 110:Rule-based learning 105:Curriculum learning 65:Supervised learning 40:Part of a series on 5746:Search methodology 5693:Parallel computing 5650:Interaction design 5559:Computing platform 5486:Numerical analysis 5476:Information theory 5268:Software framework 5231:Software notations 5170:Network components 5067:Integrated circuit 4891:Reporting software 4119:Free Weka software 3975:Tibshirani, Robert 3921:2016-03-03 at the 3608:2010-01-06 at the 3568:2008-12-17 at the 3504:on 5 November 2014 3365:2010-07-29 at the 3346:2010-04-30 at the 3226:2013-01-09 at the 3195:2013-05-26 at the 2818:2014-02-01 at the 2673:Tibshirani, Robert 2474:Customer analytics 2314:Genetic algorithms 2158:Oracle Corporation 2154:Oracle Data Mining 1856:Database Directive 1841:European copyright 1490: 1475:Results validation 1368:Data understanding 1317:applied statistics 1301:genetic algorithms 1162:The related terms 1157:marketing campaign 253: • 168:Density estimation 6009: 6008: 5938:Electronic voting 5868:Quantum Computing 5861:Applied computing 5847:Image compression 5617:Hardware security 5607:Security services 5564:Digital marketing 5351:Open-source model 5263:Modeling language 5175:Network scheduler 4996: 4995: 4992: 4991: 4988: 4987: 4908: 4907: 4904: 4903: 4792: 4791: 4788: 4787: 4687:Sixth normal form 4483: 4482: 4475:Wrangling/munging 4325:Format management 4124:Ye, Nong (2003); 4112:978-0-12-374856-0 4083:978-1-59749-272-0 4055:978-1-59904-162-9 4041:978-0-12-374765-5 3946:978-0-521-83657-9 3929:on 8 (6), 866–883 3637:(11): 1134–1145. 3398:978-1-4503-0837-3 3254:10.1021/ci0342472 3047:978-0-471-22852-3 2926:978-1-4398-6069-4 2796:978-1-55860-489-6 2719:978-0-12-381479-1 2498:Mass surveillance 2304:Ensemble learning 2284:Bayesian networks 1946:outlier detection 1860:Hargreaves review 1766:privacy violation 1684:national security 1594:SIGMOD Conference 1438:Anomaly detection 1122:anomaly detection 991:interdisciplinary 968: 967: 773:Model diagnostics 756:Human-in-the-loop 599:Boltzmann machine 512:Anomaly detection 308:Linear regression 223:Ontology learning 218:Grammar induction 193:Semantic analysis 188:Association rules 173:Anomaly detection 115:Neuro-symbolic AI 16:(Redirected from 6034: 5999: 5998: 5989: 5988: 5979: 5978: 5799:Cross-validation 5771:Machine learning 5655:Social computing 5622:Network security 5417:Algorithm design 5346:Programming team 5306:Control variable 5283:Software library 5221:Software quality 5216:Operating system 5165:Network protocol 5030:Computer science 5023: 5016: 5009: 5000: 4925: 4914: 4809: 4798: 4576:Snowflake schema 4536: 4525: 4510: 4503: 4496: 4487: 4216: 4209: 4202: 4193: 4116: 4024: 3979:Friedman, Jerome 3880: 3879: 3877: 3875: 3856: 3850: 3849: 3847: 3845: 3836:. Archived from 3826: 3820: 3819: 3817: 3815: 3796: 3790: 3789: 3787: 3786: 3771: 3765: 3749: 3743: 3736: 3730: 3729: 3727: 3726: 3720: 3715:. Archived from 3704: 3698: 3697: 3688: 3686: 3680: 3667: 3661: 3660: 3658: 3657: 3651: 3628: 3619: 3613: 3598: 3592: 3591: 3579: 3573: 3558: 3547: 3546: 3544: 3542: 3536: 3529: 3520: 3514: 3513: 3511: 3509: 3500:. Archived from 3469: 3463: 3462: 3457:. Archived from 3446: 3440: 3439: 3437: 3426: 3417: 3411: 3410: 3376: 3370: 3369:, ACM, New York. 3357: 3351: 3338: 3332: 3331: 3329: 3328: 3309: 3303: 3302: 3300: 3299: 3290:. Archived from 3280: 3274: 3273: 3237: 3231: 3217: 3211: 3186: 3180: 3179: 3177: 3175: 3156: 3150: 3149: 3147: 3145: 3126: 3120: 3119: 3117: 3115: 3096: 3090: 3089: 3087: 3085: 3066: 3060: 3059: 3039: 3029: 3023: 3022: 3020: 3019: 2971: 2965: 2964: 2962: 2960: 2937: 2931: 2930: 2912: 2906: 2905: 2887: 2881: 2880: 2852: 2846: 2832:Service Business 2828: 2822: 2810: 2804: 2803: 2775: 2769: 2768: 2766: 2764: 2758: 2751: 2735: 2724: 2723: 2702: 2696: 2695: 2693: 2692: 2683:. Archived from 2677:Friedman, Jerome 2665: 2659: 2658: 2656: 2655: 2639: 2633: 2632: 2630: 2629: 2607: 2524:Data integration 2319:Intention mining 2294:Cluster analysis 2246:Amazon SageMaker 2184:machine learning 2088:machine learning 2078:library for the 2026:machine learning 1942:cluster analysis 1891:US copyright law 1715:Data aggregation 1711:data aggregation 1678:, legality, and 1614:Java Data Mining 1555:academic journal 1502:machine learning 1371:Data preparation 1297:cluster analysis 1256:machine learning 1226:data archaeology 1118:cluster analysis 1107:machine learning 1051:) of data itself 995:computer science 987:database systems 979:machine learning 960: 953: 946: 907:Related articles 784:Confusion matrix 537:Isolation forest 482:Graphical models 261: 260: 213:Learning to rank 208:Feature learning 46:Machine learning 37: 21: 6042: 6041: 6037: 6036: 6035: 6033: 6032: 6031: 6027:Formal sciences 6012: 6011: 6010: 6005: 5996: 5967: 5948:Word processing 5856: 5842:Virtual reality 5803: 5765: 5736:Computer vision 5712: 5708:Multiprocessing 5674: 5636: 5602:Security hacker 5578: 5554:Digital library 5495: 5446:Mathematics of 5441: 5403: 5379:Automata theory 5374:Formal language 5355: 5321:Software design 5292: 5225: 5211:Virtual machine 5189: 5185:Network service 5146: 5137:Embedded system 5110: 5043: 5032: 5027: 4997: 4984: 4963: 4919: 4900: 4874: 4848: 4803: 4784: 4748: 4744:Slowly changing 4734:Dimension table 4722: 4696: 4673:Data dictionary 4661: 4625:Anchor modeling 4595: 4530: 4519: 4517:Data warehouses 4514: 4484: 4479: 4455:Synchronization 4225: 4220: 4181: 4155: 4136: 4131: 4113: 4097: 4093:Morgan Kaufmann 4020:InformationWeek 4016: 3923:Wayback Machine 3888: 3886:Further reading 3883: 3873: 3871: 3858: 3857: 3853: 3843: 3841: 3828: 3827: 3823: 3813: 3811: 3798: 3797: 3793: 3784: 3782: 3773: 3772: 3768: 3759:Wayback Machine 3750: 3746: 3737: 3733: 3724: 3722: 3713:Financial Times 3706: 3705: 3701: 3684: 3682: 3681:on 9 April 2020 3678: 3669: 3668: 3664: 3655: 3653: 3649: 3626: 3621: 3620: 3616: 3610:Wayback Machine 3599: 3595: 3581: 3580: 3576: 3570:Wayback Machine 3559: 3550: 3540: 3538: 3534: 3527: 3522: 3521: 3517: 3507: 3505: 3471: 3470: 3466: 3448: 3447: 3443: 3435: 3424: 3419: 3418: 3414: 3399: 3378: 3377: 3373: 3367:Wayback Machine 3358: 3354: 3348:Wayback Machine 3339: 3335: 3326: 3324: 3311: 3310: 3306: 3297: 3295: 3282: 3281: 3277: 3239: 3238: 3234: 3228:Wayback Machine 3218: 3214: 3197:Wayback Machine 3187: 3183: 3173: 3171: 3158: 3157: 3153: 3143: 3141: 3128: 3127: 3123: 3113: 3111: 3098: 3097: 3093: 3083: 3081: 3068: 3067: 3063: 3048: 3031: 3030: 3026: 3017: 3015: 2973: 2972: 2968: 2958: 2956: 2939: 2938: 2934: 2927: 2914: 2913: 2909: 2902: 2889: 2888: 2884: 2869:10.2307/1924403 2854: 2853: 2849: 2829: 2825: 2820:Wayback Machine 2811: 2807: 2797: 2787:Morgan Kaufmann 2777: 2776: 2772: 2762: 2760: 2756: 2749: 2737: 2736: 2727: 2720: 2704: 2703: 2699: 2690: 2688: 2667: 2666: 2662: 2653: 2651: 2641: 2640: 2636: 2627: 2625: 2609: 2608: 2599: 2595: 2581:Other resources 2578: 2507: 2464: 2458: 2447: 2368: 2344:Sequence mining 2334:Neural networks 2309:Factor analysis 2259: 2230:Hewlett-Packard 2113: 2016:neural networks 1932:Chemicalize.org 1919: 1914: 1908: 1888: 1837: 1832: 1805: 1774: 1703:confidentiality 1688:law enforcement 1665: 1653: 1647: 1641: 1606: 1565:CIKM Conference 1543: 1477: 1431: 1421:and those with 1402: 1329: 1293:neural networks 1272: 1185: 1138:spatial indices 1053:. It also is a 1032:online updating 1008:data management 964: 935: 934: 908: 900: 899: 860: 852: 851: 812:Kernel machines 807: 799: 798: 774: 766: 765: 746:Active learning 741: 733: 732: 701: 691: 690: 616:Diffusion model 552: 542: 541: 514: 504: 503: 477: 467: 466: 422:Factor analysis 417: 407: 406: 390: 353: 343: 342: 263: 262: 246: 245: 244: 233: 232: 138: 130: 129: 95:Online learning 60: 48: 35: 28: 23: 22: 15: 12: 11: 5: 6040: 6038: 6030: 6029: 6024: 6014: 6013: 6007: 6006: 6004: 6003: 5993: 5983: 5972: 5969: 5968: 5966: 5965: 5960: 5955: 5950: 5945: 5940: 5935: 5930: 5925: 5920: 5915: 5910: 5905: 5900: 5895: 5890: 5885: 5880: 5875: 5870: 5864: 5862: 5858: 5857: 5855: 5854: 5852:Solid modeling 5849: 5844: 5839: 5834: 5829: 5824: 5819: 5813: 5811: 5805: 5804: 5802: 5801: 5796: 5791: 5786: 5781: 5775: 5773: 5767: 5766: 5764: 5763: 5758: 5753: 5751:Control method 5748: 5743: 5738: 5733: 5728: 5722: 5720: 5714: 5713: 5711: 5710: 5705: 5703:Multithreading 5700: 5695: 5690: 5684: 5682: 5676: 5675: 5673: 5672: 5667: 5662: 5657: 5652: 5646: 5644: 5638: 5637: 5635: 5634: 5629: 5624: 5619: 5614: 5609: 5604: 5599: 5597:Formal methods 5594: 5588: 5586: 5580: 5579: 5577: 5576: 5571: 5569:World Wide Web 5566: 5561: 5556: 5551: 5546: 5541: 5536: 5531: 5526: 5521: 5516: 5511: 5505: 5503: 5497: 5496: 5494: 5493: 5488: 5483: 5478: 5473: 5468: 5463: 5458: 5452: 5450: 5443: 5442: 5440: 5439: 5434: 5429: 5424: 5419: 5413: 5411: 5405: 5404: 5402: 5401: 5396: 5391: 5386: 5381: 5376: 5371: 5365: 5363: 5357: 5356: 5354: 5353: 5348: 5343: 5338: 5333: 5328: 5323: 5318: 5313: 5308: 5302: 5300: 5294: 5293: 5291: 5290: 5285: 5280: 5275: 5270: 5265: 5260: 5255: 5250: 5245: 5239: 5237: 5227: 5226: 5224: 5223: 5218: 5213: 5208: 5203: 5197: 5195: 5191: 5190: 5188: 5187: 5182: 5177: 5172: 5167: 5162: 5156: 5154: 5148: 5147: 5145: 5144: 5139: 5134: 5129: 5124: 5118: 5116: 5112: 5111: 5109: 5108: 5099: 5094: 5089: 5084: 5079: 5074: 5069: 5064: 5059: 5053: 5051: 5045: 5044: 5037: 5034: 5033: 5028: 5026: 5025: 5018: 5011: 5003: 4994: 4993: 4990: 4989: 4986: 4985: 4983: 4982: 4977: 4971: 4969: 4965: 4964: 4962: 4961: 4956: 4955: 4954: 4952:Enterprise bus 4944: 4943: 4942: 4931: 4929: 4921: 4920: 4917: 4910: 4909: 4906: 4905: 4902: 4901: 4899: 4898: 4893: 4888: 4882: 4880: 4876: 4875: 4873: 4872: 4867: 4862: 4856: 4854: 4850: 4849: 4847: 4846: 4841: 4836: 4831: 4826: 4821: 4815: 4813: 4805: 4804: 4801: 4794: 4793: 4790: 4789: 4786: 4785: 4783: 4782: 4777: 4772: 4767: 4762: 4756: 4754: 4750: 4749: 4747: 4746: 4741: 4736: 4730: 4728: 4724: 4723: 4721: 4720: 4715: 4710: 4704: 4702: 4698: 4697: 4695: 4694: 4689: 4684: 4679: 4669: 4667: 4663: 4662: 4660: 4659: 4654: 4649: 4644: 4639: 4638: 4637: 4632: 4627: 4619: 4614: 4609: 4603: 4601: 4597: 4596: 4594: 4593: 4588: 4583: 4578: 4573: 4568: 4563: 4558: 4553: 4548: 4542: 4540: 4532: 4531: 4528: 4521: 4520: 4515: 4513: 4512: 4505: 4498: 4490: 4481: 4480: 4478: 4477: 4472: 4467: 4462: 4457: 4452: 4447: 4442: 4437: 4432: 4427: 4422: 4417: 4412: 4407: 4402: 4397: 4392: 4387: 4382: 4380:Pre-processing 4377: 4372: 4367: 4362: 4357: 4352: 4347: 4342: 4337: 4332: 4327: 4322: 4317: 4316: 4315: 4310: 4305: 4291: 4286: 4281: 4276: 4271: 4266: 4261: 4256: 4251: 4246: 4241: 4236: 4230: 4227: 4226: 4221: 4219: 4218: 4211: 4204: 4196: 4190: 4189: 4165: 4163: 4139: 4135: 4134:External links 4132: 4130: 4129: 4122: 4111: 4099:Witten, Ian H. 4095: 4085: 4071: 4057: 4043: 4033:Academic Press 4025: 4014: 3998:(2007, 2011); 3993: 3971:Hastie, Trevor 3968: 3958: 3948: 3930: 3908: 3889: 3887: 3884: 3882: 3881: 3851: 3821: 3791: 3766: 3744: 3731: 3699: 3662: 3614: 3593: 3574: 3548: 3515: 3464: 3461:on 2007-11-28. 3441: 3412: 3397: 3383:. p. 48. 3371: 3352: 3333: 3317:Google Scholar 3304: 3275: 3232: 3212: 3181: 3151: 3121: 3091: 3061: 3046: 3024: 2966: 2951:. KD Nuggets. 2932: 2925: 2907: 2900: 2882: 2847: 2838:(3), 181–193. 2823: 2805: 2795: 2770: 2725: 2718: 2697: 2669:Hastie, Trevor 2660: 2634: 2620:. 2006-04-30. 2596: 2594: 2591: 2590: 2589: 2583: 2582: 2577: 2576: 2571: 2566: 2561: 2556: 2551: 2546: 2541: 2536: 2531: 2526: 2520: 2512: 2511: 2510:Related topics 2506: 2505: 2491: 2486: 2481: 2476: 2471: 2465: 2454:Main article: 2452: 2451: 2446: 2445: 2440: 2438:Real-time data 2435: 2430: 2425: 2423:Drug discovery 2420: 2415: 2410: 2408:Data warehouse 2405: 2400: 2395: 2393:Bioinformatics 2390: 2385: 2380: 2374: 2373: 2372: 2367: 2366: 2361: 2356: 2351: 2346: 2341: 2336: 2331: 2326: 2321: 2316: 2311: 2306: 2301: 2299:Decision trees 2296: 2291: 2289:Classification 2286: 2281: 2276: 2271: 2265: 2264: 2263: 2258: 2255: 2254: 2253: 2243: 2233: 2223: 2217: 2207: 2197: 2187: 2177: 2171: 2161: 2151: 2145: 2135: 2129: 2123: 2112: 2109: 2108: 2107: 2097: 2091: 2066: 2060: 2042: 2033: 2019: 2009: 1995: 1985: 1979: 1969: 1963: 1953: 1935: 1929: 1918: 1915: 1907: 1904: 1887: 1884: 1836: 1833: 1831: 1828: 1804: 1801: 1785:Edward Snowden 1773: 1770: 1747: 1746: 1743: 1740: 1737: 1734: 1694:Program or in 1664: 1661: 1643:Main article: 1640: 1637: 1605: 1602: 1586: 1585: 1579:KDD Conference 1576: 1571: 1542: 1539: 1476: 1473: 1472: 1471: 1465: 1459: 1456:Classification 1453: 1447: 1441: 1430: 1427: 1411:data warehouse 1401: 1400:Pre-processing 1398: 1384: 1383: 1378: 1375: 1372: 1369: 1366: 1355: 1354: 1351: 1346: 1345:Transformation 1343: 1342:Pre-processing 1340: 1328: 1325: 1309:decision rules 1305:decision trees 1280:Bayes' theorem 1271: 1268: 1201:Michael Lovell 1184: 1181: 966: 965: 963: 962: 955: 948: 940: 937: 936: 933: 932: 927: 926: 925: 915: 909: 906: 905: 902: 901: 898: 897: 892: 887: 882: 877: 872: 867: 861: 858: 857: 854: 853: 850: 849: 844: 839: 834: 832:Occam learning 829: 824: 819: 814: 808: 805: 804: 801: 800: 797: 796: 791: 789:Learning curve 786: 781: 775: 772: 771: 768: 767: 764: 763: 758: 753: 748: 742: 739: 738: 735: 734: 731: 730: 729: 728: 718: 713: 708: 702: 697: 696: 693: 692: 689: 688: 682: 677: 672: 667: 666: 665: 655: 650: 649: 648: 643: 638: 633: 623: 618: 613: 608: 607: 606: 596: 595: 594: 589: 584: 579: 569: 564: 559: 553: 548: 547: 544: 543: 540: 539: 534: 529: 521: 515: 510: 509: 506: 505: 502: 501: 500: 499: 494: 489: 478: 473: 472: 469: 468: 465: 464: 459: 454: 449: 444: 439: 434: 429: 424: 418: 413: 412: 409: 408: 405: 404: 399: 394: 388: 383: 378: 370: 365: 360: 354: 349: 348: 345: 344: 341: 340: 335: 330: 325: 320: 315: 310: 305: 297: 296: 295: 290: 285: 275: 273:Decision trees 270: 264: 250:classification 240: 239: 238: 235: 234: 231: 230: 225: 220: 215: 210: 205: 200: 195: 190: 185: 180: 175: 170: 165: 160: 155: 150: 145: 143:Classification 139: 136: 135: 132: 131: 128: 127: 122: 117: 112: 107: 102: 100:Batch learning 97: 92: 87: 82: 77: 72: 67: 61: 58: 57: 54: 53: 42: 41: 32:cryptocurrency 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 6039: 6028: 6025: 6023: 6020: 6019: 6017: 6002: 5994: 5992: 5984: 5982: 5974: 5973: 5970: 5964: 5961: 5959: 5956: 5954: 5951: 5949: 5946: 5944: 5941: 5939: 5936: 5934: 5931: 5929: 5926: 5924: 5921: 5919: 5916: 5914: 5911: 5909: 5906: 5904: 5901: 5899: 5896: 5894: 5891: 5889: 5886: 5884: 5881: 5879: 5876: 5874: 5871: 5869: 5866: 5865: 5863: 5859: 5853: 5850: 5848: 5845: 5843: 5840: 5838: 5837:Mixed reality 5835: 5833: 5830: 5828: 5825: 5823: 5820: 5818: 5815: 5814: 5812: 5810: 5806: 5800: 5797: 5795: 5792: 5790: 5787: 5785: 5782: 5780: 5777: 5776: 5774: 5772: 5768: 5762: 5759: 5757: 5754: 5752: 5749: 5747: 5744: 5742: 5739: 5737: 5734: 5732: 5729: 5727: 5724: 5723: 5721: 5719: 5715: 5709: 5706: 5704: 5701: 5699: 5696: 5694: 5691: 5689: 5686: 5685: 5683: 5681: 5677: 5671: 5670:Accessibility 5668: 5666: 5665:Visualization 5663: 5661: 5658: 5656: 5653: 5651: 5648: 5647: 5645: 5643: 5639: 5633: 5630: 5628: 5625: 5623: 5620: 5618: 5615: 5613: 5610: 5608: 5605: 5603: 5600: 5598: 5595: 5593: 5590: 5589: 5587: 5585: 5581: 5575: 5572: 5570: 5567: 5565: 5562: 5560: 5557: 5555: 5552: 5550: 5547: 5545: 5542: 5540: 5537: 5535: 5532: 5530: 5527: 5525: 5522: 5520: 5517: 5515: 5512: 5510: 5507: 5506: 5504: 5502: 5498: 5492: 5489: 5487: 5484: 5482: 5479: 5477: 5474: 5472: 5469: 5467: 5464: 5462: 5459: 5457: 5454: 5453: 5451: 5449: 5444: 5438: 5435: 5433: 5430: 5428: 5425: 5423: 5420: 5418: 5415: 5414: 5412: 5410: 5406: 5400: 5397: 5395: 5392: 5390: 5387: 5385: 5382: 5380: 5377: 5375: 5372: 5370: 5367: 5366: 5364: 5362: 5358: 5352: 5349: 5347: 5344: 5342: 5339: 5337: 5334: 5332: 5329: 5327: 5324: 5322: 5319: 5317: 5314: 5312: 5309: 5307: 5304: 5303: 5301: 5299: 5295: 5289: 5286: 5284: 5281: 5279: 5276: 5274: 5271: 5269: 5266: 5264: 5261: 5259: 5256: 5254: 5251: 5249: 5246: 5244: 5241: 5240: 5238: 5236: 5232: 5228: 5222: 5219: 5217: 5214: 5212: 5209: 5207: 5204: 5202: 5199: 5198: 5196: 5192: 5186: 5183: 5181: 5178: 5176: 5173: 5171: 5168: 5166: 5163: 5161: 5158: 5157: 5155: 5153: 5149: 5143: 5140: 5138: 5135: 5133: 5132:Dependability 5130: 5128: 5125: 5123: 5120: 5119: 5117: 5113: 5107: 5103: 5100: 5098: 5095: 5093: 5090: 5088: 5085: 5083: 5080: 5078: 5075: 5073: 5070: 5068: 5065: 5063: 5060: 5058: 5055: 5054: 5052: 5050: 5046: 5041: 5035: 5031: 5024: 5019: 5017: 5012: 5010: 5005: 5004: 5001: 4981: 4978: 4976: 4973: 4972: 4970: 4966: 4960: 4957: 4953: 4950: 4949: 4948: 4947:Ralph Kimball 4945: 4941: 4938: 4937: 4936: 4933: 4932: 4930: 4926: 4922: 4915: 4911: 4897: 4894: 4892: 4889: 4887: 4884: 4883: 4881: 4877: 4871: 4868: 4866: 4863: 4861: 4858: 4857: 4855: 4851: 4845: 4842: 4840: 4837: 4835: 4832: 4830: 4827: 4825: 4822: 4820: 4817: 4816: 4814: 4810: 4806: 4799: 4795: 4781: 4778: 4776: 4773: 4771: 4768: 4766: 4763: 4761: 4758: 4757: 4755: 4751: 4745: 4742: 4740: 4737: 4735: 4732: 4731: 4729: 4725: 4719: 4716: 4714: 4711: 4709: 4706: 4705: 4703: 4699: 4693: 4692:Surrogate key 4690: 4688: 4685: 4683: 4680: 4678: 4674: 4671: 4670: 4668: 4664: 4658: 4655: 4653: 4650: 4648: 4645: 4643: 4640: 4636: 4633: 4631: 4628: 4626: 4623: 4622: 4620: 4618: 4615: 4613: 4610: 4608: 4605: 4604: 4602: 4598: 4592: 4589: 4587: 4584: 4582: 4579: 4577: 4574: 4572: 4569: 4567: 4564: 4562: 4559: 4557: 4554: 4552: 4549: 4547: 4544: 4543: 4541: 4537: 4533: 4526: 4522: 4518: 4511: 4506: 4504: 4499: 4497: 4492: 4491: 4488: 4476: 4473: 4471: 4468: 4466: 4463: 4461: 4458: 4456: 4453: 4451: 4448: 4446: 4443: 4441: 4438: 4436: 4433: 4431: 4428: 4426: 4423: 4421: 4418: 4416: 4413: 4411: 4408: 4406: 4403: 4401: 4398: 4396: 4393: 4391: 4388: 4386: 4383: 4381: 4378: 4376: 4373: 4371: 4368: 4366: 4363: 4361: 4358: 4356: 4353: 4351: 4348: 4346: 4343: 4341: 4338: 4336: 4333: 4331: 4328: 4326: 4323: 4321: 4318: 4314: 4311: 4309: 4306: 4304: 4301: 4300: 4299: 4295: 4292: 4290: 4287: 4285: 4282: 4280: 4277: 4275: 4272: 4270: 4267: 4265: 4262: 4260: 4257: 4255: 4252: 4250: 4247: 4245: 4242: 4240: 4237: 4235: 4232: 4231: 4228: 4224: 4217: 4212: 4210: 4205: 4203: 4198: 4197: 4194: 4188: 4184: 4179: 4175: 4174: 4169: 4164: 4162: 4158: 4153: 4149: 4148: 4143: 4138: 4137: 4133: 4127: 4123: 4120: 4114: 4108: 4104: 4100: 4096: 4094: 4090: 4086: 4084: 4080: 4076: 4072: 4070: 4069:0-321-32136-7 4066: 4062: 4058: 4056: 4052: 4048: 4044: 4042: 4038: 4034: 4030: 4026: 4022: 4021: 4015: 4013: 4012:3-540-37881-2 4009: 4005: 4001: 3997: 3994: 3992: 3991:0-387-95284-5 3988: 3984: 3980: 3976: 3972: 3969: 3966: 3962: 3959: 3957: 3953: 3949: 3947: 3943: 3939: 3935: 3931: 3928: 3924: 3920: 3917: 3913: 3909: 3907: 3906:0-13-743980-6 3903: 3899: 3898:Prentice Hall 3895: 3891: 3890: 3885: 3869: 3865: 3861: 3855: 3852: 3839: 3835: 3831: 3825: 3822: 3809: 3805: 3801: 3795: 3792: 3780: 3776: 3770: 3767: 3763: 3760: 3756: 3753: 3748: 3745: 3741: 3735: 3732: 3721:on 2022-12-10 3719: 3714: 3710: 3703: 3700: 3696: 3694: 3677: 3673: 3666: 3663: 3648: 3644: 3640: 3636: 3632: 3625: 3618: 3615: 3611: 3607: 3604: 3603: 3597: 3594: 3589: 3585: 3578: 3575: 3571: 3567: 3564: 3563: 3557: 3555: 3553: 3549: 3533: 3526: 3523:Resig, John. 3519: 3516: 3503: 3499: 3495: 3491: 3487: 3483: 3479: 3475: 3468: 3465: 3460: 3456: 3452: 3445: 3442: 3434: 3430: 3423: 3416: 3413: 3408: 3404: 3400: 3394: 3390: 3386: 3382: 3375: 3372: 3368: 3364: 3361: 3356: 3353: 3349: 3345: 3342: 3337: 3334: 3322: 3318: 3314: 3308: 3305: 3294:on 2014-11-19 3293: 3289: 3285: 3279: 3276: 3271: 3267: 3263: 3259: 3255: 3251: 3247: 3243: 3236: 3233: 3229: 3225: 3222: 3216: 3213: 3210: 3206: 3202: 3198: 3194: 3191: 3185: 3182: 3169: 3165: 3161: 3155: 3152: 3139: 3135: 3131: 3125: 3122: 3109: 3105: 3101: 3095: 3092: 3079: 3075: 3071: 3065: 3062: 3057: 3053: 3049: 3043: 3038: 3037: 3028: 3025: 3013: 3009: 3005: 3001: 2997: 2993: 2989: 2985: 2981: 2977: 2970: 2967: 2954: 2950: 2946: 2942: 2936: 2933: 2928: 2922: 2918: 2911: 2908: 2903: 2901:1-85278-461-X 2897: 2893: 2886: 2883: 2878: 2874: 2870: 2866: 2862: 2858: 2851: 2848: 2845: 2841: 2837: 2833: 2827: 2824: 2821: 2817: 2814: 2809: 2806: 2802: 2798: 2792: 2789:. p. 5. 2788: 2784: 2780: 2774: 2771: 2755: 2748: 2744: 2740: 2739:Fayyad, Usama 2734: 2732: 2730: 2726: 2721: 2715: 2711: 2707: 2701: 2698: 2687:on 2009-11-10 2686: 2682: 2678: 2674: 2670: 2664: 2661: 2649: 2645: 2638: 2635: 2623: 2619: 2616: 2612: 2606: 2604: 2602: 2598: 2592: 2588: 2585: 2584: 2580: 2579: 2575: 2572: 2570: 2567: 2565: 2562: 2560: 2559:Psychometrics 2557: 2555: 2552: 2550: 2547: 2545: 2542: 2540: 2537: 2535: 2532: 2530: 2527: 2525: 2522: 2521: 2519: 2517: 2509: 2508: 2503: 2499: 2495: 2492: 2490: 2487: 2485: 2482: 2480: 2477: 2475: 2472: 2470: 2467: 2466: 2463: 2457: 2449: 2448: 2444: 2441: 2439: 2436: 2434: 2431: 2429: 2426: 2424: 2421: 2419: 2416: 2414: 2411: 2409: 2406: 2404: 2403:Data analysis 2401: 2399: 2396: 2394: 2391: 2389: 2386: 2384: 2381: 2379: 2376: 2375: 2370: 2369: 2365: 2362: 2360: 2357: 2355: 2352: 2350: 2347: 2345: 2342: 2340: 2337: 2335: 2332: 2330: 2327: 2325: 2322: 2320: 2317: 2315: 2312: 2310: 2307: 2305: 2302: 2300: 2297: 2295: 2292: 2290: 2287: 2285: 2282: 2280: 2277: 2275: 2272: 2270: 2267: 2266: 2261: 2260: 2256: 2251: 2247: 2244: 2241: 2237: 2234: 2231: 2227: 2224: 2221: 2218: 2215: 2211: 2208: 2205: 2201: 2198: 2195: 2194:SAS Institute 2191: 2188: 2185: 2181: 2178: 2175: 2172: 2169: 2165: 2162: 2159: 2155: 2152: 2149: 2146: 2143: 2139: 2136: 2133: 2130: 2127: 2124: 2121: 2118: 2117: 2116: 2110: 2105: 2101: 2098: 2095: 2092: 2089: 2085: 2081: 2077: 2076:deep learning 2074: 2070: 2067: 2064: 2061: 2058: 2054: 2050: 2046: 2043: 2041: 2037: 2034: 2031: 2027: 2023: 2020: 2017: 2013: 2010: 2007: 2003: 1999: 1996: 1993: 1989: 1986: 1983: 1980: 1977: 1973: 1970: 1967: 1964: 1961: 1957: 1954: 1951: 1947: 1943: 1939: 1936: 1933: 1930: 1927: 1924: 1923: 1922: 1916: 1913: 1905: 1903: 1900: 1896: 1892: 1885: 1883: 1881: 1876: 1871: 1869: 1865: 1861: 1857: 1853: 1849: 1845: 1844:database laws 1842: 1834: 1830:Copyright law 1829: 1827: 1825: 1820: 1818: 1814: 1810: 1802: 1800: 1796: 1794: 1790: 1786: 1782: 1778: 1771: 1769: 1767: 1763: 1758: 1756: 1752: 1744: 1741: 1738: 1735: 1732: 1731: 1730: 1728: 1723: 1721: 1716: 1712: 1708: 1704: 1699: 1697: 1693: 1689: 1685: 1681: 1677: 1672: 1670: 1669:user behavior 1662: 1660: 1658: 1652: 1646: 1638: 1636: 1634: 1630: 1626: 1622: 1617: 1615: 1611: 1603: 1601: 1599: 1595: 1591: 1584: 1581:– ACM SIGKDD 1580: 1577: 1575: 1572: 1570: 1566: 1563: 1562: 1561: 1558: 1556: 1552: 1548: 1540: 1538: 1534: 1532: 1528: 1524: 1519: 1515: 1509: 1507: 1503: 1499: 1495: 1486: 1485:data dredging 1481: 1474: 1469: 1468:Summarization 1466: 1463: 1460: 1457: 1454: 1451: 1448: 1445: 1442: 1439: 1436: 1435: 1434: 1428: 1426: 1424: 1420: 1416: 1412: 1408: 1399: 1397: 1395: 1390: 1387: 1382: 1379: 1376: 1373: 1370: 1367: 1364: 1363: 1362: 1360: 1352: 1350: 1347: 1344: 1341: 1338: 1337: 1336: 1334: 1326: 1324: 1322: 1318: 1314: 1311:(1960s), and 1310: 1306: 1302: 1298: 1294: 1289: 1285: 1281: 1277: 1269: 1267: 1265: 1261: 1257: 1253: 1249: 1245: 1241: 1240: 1235: 1231: 1227: 1223: 1219: 1215: 1210: 1208: 1207: 1202: 1198: 1194: 1193:data dredging 1190: 1182: 1180: 1177: 1176: 1175:data snooping 1171: 1167: 1166: 1165:data dredging 1160: 1158: 1154: 1153:data analysis 1149: 1147: 1143: 1139: 1135: 1131: 1127: 1123: 1119: 1115: 1110: 1108: 1104: 1100: 1099: 1094: 1093: 1092:data analysis 1088: 1084: 1080: 1076: 1072: 1068: 1064: 1060: 1056: 1052: 1050: 1044: 1040: 1035: 1033: 1029: 1028:visualization 1025: 1021: 1017: 1013: 1009: 1005: 1000: 996: 992: 988: 984: 980: 976: 972: 961: 956: 954: 949: 947: 942: 941: 939: 938: 931: 928: 924: 921: 920: 919: 916: 914: 911: 910: 904: 903: 896: 893: 891: 888: 886: 883: 881: 878: 876: 873: 871: 868: 866: 863: 862: 856: 855: 848: 845: 843: 840: 838: 835: 833: 830: 828: 825: 823: 820: 818: 815: 813: 810: 809: 803: 802: 795: 792: 790: 787: 785: 782: 780: 777: 776: 770: 769: 762: 759: 757: 754: 752: 751:Crowdsourcing 749: 747: 744: 743: 737: 736: 727: 724: 723: 722: 719: 717: 714: 712: 709: 707: 704: 703: 700: 695: 694: 686: 683: 681: 680:Memtransistor 678: 676: 673: 671: 668: 664: 661: 660: 659: 656: 654: 651: 647: 644: 642: 639: 637: 634: 632: 629: 628: 627: 624: 622: 619: 617: 614: 612: 609: 605: 602: 601: 600: 597: 593: 590: 588: 585: 583: 580: 578: 575: 574: 573: 570: 568: 565: 563: 562:Deep learning 560: 558: 555: 554: 551: 546: 545: 538: 535: 533: 530: 528: 526: 522: 520: 517: 516: 513: 508: 507: 498: 497:Hidden Markov 495: 493: 490: 488: 485: 484: 483: 480: 479: 476: 471: 470: 463: 460: 458: 455: 453: 450: 448: 445: 443: 440: 438: 435: 433: 430: 428: 425: 423: 420: 419: 416: 411: 410: 403: 400: 398: 395: 393: 389: 387: 384: 382: 379: 377: 375: 371: 369: 366: 364: 361: 359: 356: 355: 352: 347: 346: 339: 336: 334: 331: 329: 326: 324: 321: 319: 316: 314: 311: 309: 306: 304: 302: 298: 294: 293:Random forest 291: 289: 286: 284: 281: 280: 279: 276: 274: 271: 269: 266: 265: 258: 257: 252: 251: 243: 237: 236: 229: 226: 224: 221: 219: 216: 214: 211: 209: 206: 204: 201: 199: 196: 194: 191: 189: 186: 184: 181: 179: 178:Data cleaning 176: 174: 171: 169: 166: 164: 161: 159: 156: 154: 151: 149: 146: 144: 141: 140: 134: 133: 126: 123: 121: 118: 116: 113: 111: 108: 106: 103: 101: 98: 96: 93: 91: 90:Meta-learning 88: 86: 83: 81: 78: 76: 73: 71: 68: 66: 63: 62: 56: 55: 52: 47: 43: 39: 38: 33: 19: 5933:Cyberwarfare 5592:Cryptography 5548: 4959:Dan Linstedt 4828: 4385:Preservation 4375:Philanthropy 4369: 4239:Augmentation 4171: 4145: 4125: 4102: 4088: 4074: 4060: 4046: 4028: 4018: 3999: 3985:, Springer, 3982: 3964: 3951: 3933: 3926: 3893: 3872:. Retrieved 3864:Lexology.com 3863: 3854: 3842:. Retrieved 3838:the original 3833: 3824: 3812:. Retrieved 3803: 3794: 3783:. Retrieved 3769: 3762:Out-Law.com. 3761: 3747: 3739: 3734: 3723:. Retrieved 3718:the original 3712: 3702: 3690: 3683:. Retrieved 3676:the original 3665: 3654:. Retrieved 3634: 3630: 3617: 3601: 3596: 3587: 3577: 3561: 3539:. Retrieved 3518: 3506:. Retrieved 3502:the original 3481: 3477: 3467: 3459:the original 3454: 3444: 3428: 3415: 3380: 3374: 3355: 3336: 3325:. Retrieved 3307: 3296:. Retrieved 3292:the original 3278: 3245: 3241: 3235: 3215: 3200: 3184: 3172:. Retrieved 3154: 3142:. Retrieved 3124: 3112:. Retrieved 3094: 3082:. Retrieved 3064: 3035: 3027: 3016:. Retrieved 2986:(1): 25–29. 2983: 2979: 2969: 2957:. Retrieved 2948: 2935: 2916: 2910: 2891: 2885: 2860: 2856: 2850: 2835: 2831: 2826: 2808: 2800: 2782: 2773: 2761:. Retrieved 2709: 2700: 2689:. Retrieved 2685:the original 2663: 2652:. Retrieved 2637: 2626:. Retrieved 2574:Web scraping 2518:data), see: 2515: 2513: 2502:Stellar Wind 2494:Surveillance 2269:Agent mining 2200:SPSS Modeler 2114: 2063:scikit-learn 1920: 1889: 1872: 1838: 1821: 1816: 1806: 1797: 1775: 1759: 1750: 1748: 1726: 1724: 1719: 1700: 1673: 1666: 1654: 1639:Notable uses 1618: 1607: 1587: 1559: 1544: 1535: 1526: 1523:training set 1510: 1504:is known as 1491: 1432: 1423:missing data 1415:multivariate 1403: 1391: 1388: 1385: 1356: 1348: 1332: 1330: 1282:(1700s) and 1273: 1263: 1259: 1237: 1233: 1229: 1225: 1221: 1213: 1211: 1204: 1192: 1189:data fishing 1188: 1186: 1173: 1170:data fishing 1169: 1163: 1161: 1150: 1126:dependencies 1111: 1106: 1102: 1096: 1090: 1086: 1077:, including 1048: 1047:extraction ( 1036: 993:subfield of 970: 969: 837:PAC learning 524: 373: 368:Hierarchical 300: 254: 248: 50: 6022:Data mining 5943:Video games 5923:Digital art 5680:Concurrency 5549:Data mining 5461:Probability 5201:Interpreter 4896:Spreadsheet 4829:Data mining 4571:Star schema 4445:Stewardship 4335:Integration 4284:Degradation 4269:Compression 4249:Archaeology 4234:Acquisition 4166:‹ The 4140:‹ The 4035:/Elsevier, 3961:Han, Jiawei 3874:14 November 3844:14 November 3814:14 November 3582:Ohm, Paul. 3341:Proceedings 3248:(1): 1–12. 3174:29 December 3144:29 December 3114:29 December 3084:29 December 2863:(1): 1–12. 2779:Han, Jiawei 2763:17 December 2706:Han, Jaiwei 2359:Text mining 2132:PolyAnalyst 2090:algorithms. 2073:open-source 2057:GNU Project 2053:statistical 1880:open access 1809:US Congress 1514:overfitting 1506:overfitting 1429:Data mining 1349:Data mining 1260:data mining 1222:data mining 1214:data mining 1087:large scale 1071:warehousing 971:Data mining 721:Multi-agent 658:Transformer 557:Autoencoder 313:Naive Bayes 51:data mining 6016:Categories 6001:Glossaries 5873:E-commerce 5466:Statistics 5409:Algorithms 5206:Middleware 5062:Peripheral 4935:Bill Inmon 4739:Degenerate 4708:Fact table 4465:Validation 4400:Publishing 4390:Processing 4360:Management 4274:Corruption 4264:Collection 4117:(See also 3785:2021-12-16 3725:2022-12-04 3656:2018-04-20 3327:2022-06-11 3298:2014-06-13 3018:2021-09-04 2691:2012-08-07 2654:2010-12-09 2628:2014-01-27 2593:References 2460:See also: 2443:Web mining 2210:STATISTICA 2180:RapidMiner 2168:DATADVANCE 2126:LIONsolver 1910:See also: 1848:web mining 1755:anonymized 1649:See also: 1531:ROC curves 1494:reproduced 1462:Regression 1450:Clustering 1381:Deployment 1377:Evaluation 1270:Background 1248:(KDD-1989) 1067:extraction 1063:collection 1024:complexity 999:statistics 983:statistics 706:Q-learning 604:Restricted 402:Mean shift 351:Clustering 328:Perceptron 256:regression 158:Clustering 153:Regression 18:Datamining 5822:Rendering 5817:Animation 5448:computing 5399:Semantics 5097:Processor 4853:Languages 4839:OLAP cube 4824:Dashboard 4775:Transform 4727:Dimension 4682:Data mart 4617:Data mesh 4586:Aggregate 4551:Dimension 4470:Warehouse 4435:Scrubbing 4415:Retention 4410:Reduction 4365:Migration 4340:Integrity 4308:Transform 4259:Cleansing 3996:Liu, Bing 3164:KDnuggets 3134:KDnuggets 3104:KDnuggets 3074:KDnuggets 3000:0269-8889 2959:30 August 2516:analyzing 2378:Analytics 2142:Microsoft 2032:language. 2008:language. 1994:language. 1952:language. 1604:Standards 1407:data mart 1339:Selection 1303:(1950s), 1288:data sets 1218:San Diego 1212:The term 1183:Etymology 1114:automatic 1098:analytics 1020:inference 1010:aspects, 975:data sets 865:ECML PKDD 847:VC theory 794:ROC curve 726:Self-play 646:DeepDream 487:Bayes net 278:Ensembles 59:Paradigms 5981:Category 5809:Graphics 5584:Security 5253:Compiler 5152:Networks 5049:Hardware 4968:Products 4812:Concepts 4677:Metadata 4666:Elements 4612:Data hub 4600:Variants 4546:Database 4539:Concepts 4440:Security 4430:Scraping 4405:Recovery 4279:Curation 4244:Analysis 4168:template 4142:template 4004:Springer 3981:(2001); 3919:Archived 3914:(1996) " 3868:Archived 3808:Archived 3779:Archived 3775:"Fedlex" 3755:Archived 3647:Archived 3606:Archived 3566:Archived 3541:16 March 3532:Archived 3508:21 April 3490:45263753 3433:Archived 3407:14967969 3363:Archived 3344:Archived 3321:Archived 3270:12440383 3262:14741005 3224:Archived 3193:Archived 3168:Archived 3166:. 2014. 3138:Archived 3136:. 2007. 3108:Archived 3106:. 2004. 3078:Archived 3076:. 2002. 3056:50055336 3012:Archived 2953:Archived 2816:Archived 2754:Archived 2679:(2009). 2648:Archived 2622:Archived 2388:Big data 2257:See also 2214:StatSoft 2018:library. 1906:Software 1895:fair use 1541:Research 1518:test set 1374:Modeling 1197:a-priori 1055:buzzword 1043:patterns 1039:misnomer 288:Boosting 137:Problems 5991:Outline 4918:Related 4770:Extract 4753:Filling 4718:Measure 4450:Storage 4425:Science 4420:Quality 4350:Lineage 4345:Library 4320:Farming 4303:Extract 4289:Editing 4170:below ( 4144:below ( 3912:P.S. Yu 3685:9 April 3008:6487637 2877:1924403 2500:(e.g., 2262:Methods 2226:Vertica 2220:Tanagra 2174:Qlucore 2014:: Open 1926:Carrot2 1707:privacy 1676:privacy 1488:spiders 1327:Process 1242:, etc. 1124:), and 870:NeurIPS 687:(ECRAM) 641:AlexNet 283:Bagging 4928:People 4370:Mining 4330:Fusion 4187:Curlie 4173:Curlie 4161:Curlie 4147:Curlie 4109:  4081:  4067:  4053:  4039:  4010:  3989:  3944:  3904:  3498:546782 3496:  3488:  3405:  3395:  3268:  3260:  3054:  3044:  3006:  2998:  2923:  2898:  2875:  2793:  2716:  2618:SIGKDD 2250:Amazon 2240:Google 2164:PSeven 2148:NetOwl 2120:Angoss 2030:Python 2022:Orange 2012:OpenNN 2006:Python 1988:mlpack 1839:Under 1777:Europe 1751:become 1727:before 1720:per se 1696:ADVISE 1680:ethics 1567:– ACM 1551:SIGKDD 1172:, and 1049:mining 1030:, and 985:, and 663:Vision 519:RANSAC 397:OPTICS 392:DBSCAN 376:-means 183:AutoML 5394:Logic 5235:tools 4879:Tools 4652:ROLAP 4647:MOLAP 4642:HOLAP 4023:: 12. 3679:(PDF) 3650:(PDF) 3627:(PDF) 3535:(PDF) 3528:(PDF) 3484:(2). 3436:(PDF) 3425:(PDF) 3403:S2CID 3266:S2CID 3004:S2CID 2873:JSTOR 2757:(PDF) 2750:(PDF) 2071:: An 2069:Torch 1966:KNIME 1419:noise 1394:SEMMA 1016:model 885:IJCAI 711:SARSA 670:Mamba 636:LeNet 631:U-Net 457:t-SNE 381:Fuzzy 358:BIRCH 5233:and 5106:Form 5102:Size 4780:Load 4701:Fact 4566:OLAP 4561:Fact 4460:Type 4355:Loss 4313:Load 4223:Data 4107:ISBN 4079:ISBN 4065:ISBN 4051:ISBN 4037:ISBN 4008:ISBN 3987:ISBN 3977:and 3942:ISBN 3902:ISBN 3876:2014 3846:2014 3816:2014 3693:CJEU 3687:2020 3543:2018 3510:2004 3494:SSRN 3486:OCLC 3393:ISBN 3258:PMID 3176:2023 3146:2023 3116:2023 3086:2023 3052:OCLC 3042:ISBN 2996:ISSN 2961:2012 2921:ISBN 2896:ISBN 2791:ISBN 2765:2008 2714:ISBN 2104:Java 2100:Weka 2094:UIMA 2047:: A 2040:SPSS 2036:PSPP 1998:NLTK 1982:MEPX 1976:Java 1958:: a 1956:GATE 1950:Java 1944:and 1938:ELKI 1873:The 1705:and 1596:and 1331:The 1307:and 1276:data 1262:and 1254:and 1105:and 1095:and 1018:and 997:and 895:JMLR 880:ICLR 875:ICML 761:RLHF 577:LSTM 363:CURE 49:and 4298:ELT 4294:ETL 4254:Big 4185:at 4159:at 3925:". 3639:doi 3385:doi 3250:doi 3205:doi 2988:doi 2865:doi 2840:doi 2615:ACM 2204:IBM 2080:Lua 1992:C++ 1787:'s 1686:or 1629:XML 1527:not 1409:or 1191:or 621:SOM 611:GAN 587:ESN 582:GRU 527:-NN 462:SDL 452:PGD 447:PCA 442:NMF 437:LDA 432:ICA 427:CCA 303:-NN 6018:: 5104:/ 4091:, 4063:, 4049:, 4031:, 4006:, 4002:, 3973:, 3954:, 3940:, 3936:, 3900:, 3896:, 3862:. 3832:. 3806:. 3802:. 3777:. 3711:. 3689:. 3645:. 3635:38 3633:. 3629:. 3586:. 3551:^ 3530:. 3492:. 3480:. 3476:. 3453:. 3427:. 3401:. 3391:. 3319:. 3315:. 3286:. 3264:. 3256:. 3246:44 3244:. 3199:. 3162:. 3132:. 3102:. 3072:. 3050:. 3010:. 3002:. 2994:. 2984:26 2982:. 2978:. 2947:. 2871:. 2861:65 2859:. 2834:, 2799:. 2785:. 2752:. 2741:; 2728:^ 2675:; 2671:; 2646:. 2613:. 2600:^ 2496:/ 1713:. 1600:. 1533:. 1425:. 1299:, 1295:, 1252:AI 1236:, 1232:, 1228:, 1168:, 1132:, 1089:) 1069:, 1065:, 1034:. 1014:, 981:, 890:ML 5042:. 5022:e 5015:t 5008:v 4675:/ 4509:e 4502:t 4495:v 4296:/ 4215:e 4208:t 4201:v 4121:) 4115:. 3878:. 3848:. 3818:. 3788:. 3728:. 3659:. 3641:: 3590:. 3545:. 3512:. 3482:5 3409:. 3387:: 3330:. 3301:. 3272:. 3252:: 3207:: 3178:. 3148:. 3118:. 3088:. 3058:. 3021:. 2990:: 2963:. 2929:. 2904:. 2879:. 2867:: 2842:: 2836:1 2767:. 2722:. 2694:. 2657:. 2631:. 2504:) 2242:. 2232:. 2216:. 2206:. 2196:. 2170:. 2160:. 2144:. 2059:. 2045:R 2000:( 1128:( 1061:( 1002:" 959:e 952:t 945:v 525:k 374:k 301:k 259:) 247:( 34:. 20:)

Index

Datamining
cryptocurrency
Machine learning
data mining
Supervised learning
Unsupervised learning
Semi-supervised learning
Self-supervised learning
Reinforcement learning
Meta-learning
Online learning
Batch learning
Curriculum learning
Rule-based learning
Neuro-symbolic AI
Neuromorphic engineering
Quantum machine learning
Classification
Generative modeling
Regression
Clustering
Dimensionality reduction
Density estimation
Anomaly detection
Data cleaning
AutoML
Association rules
Semantic analysis
Structured prediction
Feature engineering

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑