Knowledge (XXG)

Exploratory data analysis

Source đź“ť

959: 976: 946: 958: 934: 593: 975: 945: 987:
What is learned from the plots is different from what is illustrated by the regression model, even though the experiment was not designed to investigate any of these other trends. The patterns found by exploring the data suggest hypotheses about tipping that may not have been anticipated in advance,
911:
Findings from EDA are orthogonal to the primary analysis task. To illustrate, consider an example from Cook et al. where the analysis task is to find the variables which best predict the tip that a dining party will give to the waiter. The variables available in the data collected for this task are:
981:
Scatterplot of tips vs. bill separated by payer gender and smoking section status. Smoking parties have a lot more variability in the tips that they give. Males tend to pay the (few) higher bills, and the female non-smokers tend to be very consistent tippers (with three conspicuous exceptions shown
431:
Tukey defined data analysis in 1961 as: "Procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which
567:
Exploratory data analysis, robust statistics, nonparametric statistics, and the development of statistical programming languages facilitated statisticians' work on scientific and engineering problems. Such problems included the fabrication of semiconductors and the understanding of communications
951:
Histogram of tip amounts where the bins cover $ 0.10 increments. An interesting phenomenon is visible: peaks occur at the whole-dollar and half-dollar amounts, which is caused by customers picking round numbers as tips. This behavior is common to other types of purchases too, like
933: 912:
the tip amount, total bill, payer gender, smoking/non-smoking section, time of day, day of the week, and size of the party. The primary analysis task is approached by fitting a regression model where the tip rate is the response variable. The fitted model is
964:
Scatterplot of tips vs. bill. Points below the line correspond to tips that are lower than expected (for that bill amount), and points above the line are higher than expected. We might expect to see a tight, positive linear association, but instead see
435:
Exploratory data analysis is an analysis technique to analyze and investigate the data set and summarize the main characteristics of the dataset. Main advantage of EDA is providing the data visualization of data after conducting the analysis.
1533:
Young, F. W. Valero-Mora, P. and Friendly M. (2006) Visual Statistics: Seeing your data with Dynamic Interactive Graphics. Wiley ISBN 978-0-471-68160-1 Jambu M. (1991) Exploratory and Multivariate Data Analysis. Academic Press ISBN
1524:
Cook, D. and Swayne, D.F. (with A. Buja, D. Temple Lang, H. Hofmann, H. Wickham, M. Lawrence) (2007-12-12). Interactive and Dynamic Graphics for Data Analysis: With R and GGobi. Springer. ISBN 9780387717616.
414:
can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling and thereby contrasts traditional hypothesis testing. Exploratory data analysis has been promoted by
423:, which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA. 2222: 939:
Histogram of tip amounts where the bins cover $ 1 increments. The distribution of values is skewed right and unimodal, as is common in distributions of small, non-negative quantities.
419:
since 1970 to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from
1454:(with A. Buja, D. Temple Lang, H. Hofmann, H. Wickham, M. Lawrence) (2007) "Interactive and Dynamic Graphics for Data Analysis: With R and GGobi" Springer, 978-0387717616 969:. In particular, there are more points far away from the line in the lower right than in the upper left, indicating that more customers are very cheap than very generous. 2283: 2242: 1603: 1570: 1517: 2237: 2303: 1142:
Baillie, Mark; Le Cessie, Saskia; Schmidt, Carsten Oliver; Lusa, Lara; Huebner, Marianne; Topic Group "Initial Data Analysis" of the STRATOS Initiative (2022).
2252: 374: 459:. This family of statistical-computing environments featured vastly improved dynamic visualization capabilities, which allowed statisticians to identify 1045:, an open-source programming language for statistical computing and graphics. Together with Python one of the most popular languages for data science. 615:
to suggest hypotheses to test. In particular, he held that confusing the two types of analyses and employing them on the same set of data can lead to
1817: 782: 620: 2262: 924:
which says that as the size of the dining party increases by one person (leading to a higher bill), the tip rate will decrease by 1%, on average.
2247: 2194: 2227: 1770: 1738: 1718: 1695: 1665: 1620: 1587: 1554: 1530:
Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1983). Understanding Robust and Exploratory Data Analysis. ISBN 978-0-471-09777-8.
1087: 2293: 676:
There are a number of tools that are useful for EDA, but EDA is characterized more by the attitude taken than by particular techniques.
2344: 2334: 2367: 2184: 2174: 1290: 1126: 988:
and which could lead to interesting follow-up experiments where the hypotheses are formally stated and tested by collecting new data.
787: 41: 2339: 1651: 1501: 1479: 2329: 2324: 1527:
Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1985). Exploring Data Tables, Trends and Shapes. ISBN 978-0-471-09776-1.
1213: 367: 869:(3rd edn., 1920), p. 62– he defines "the maximum and minimum, median, quartiles and two deciles" as the "seven positions"). 2010: 515: 568:
networks, which concerned Bell Labs. These statistical developments, all championed by Tukey, were designed to complement the
1754: 608: 573: 65: 1305: 1660:
Theus, M., Urbanek, S. (2008), Interactive Graphics for Data Analysis: Principles and Examples, CRC Press, Boca Raton, FL,
1092: 1036: 1537:
S. H. C. DuToit, A. G. W. Steyn, R. H. Stumpf (1986) Graphical Exploratory Data Analysis. Springer ISBN 978-1-4612-9371-2
2120: 1810: 776: 1058: 793: 733: 360: 1485: 1447: 997: 561: 161: 2316: 2257: 1938: 1863: 1280: 531: 1042: 771: 764: 543: 535: 483: 456: 444: 440: 1234:... we wanted to be able to interact with our data, using Exploratory Data Analysis (Tukey, 1971) techniques. 2090: 1933: 1827: 1803: 1054: 817: 551: 547: 2207: 2189: 2080: 1965: 1868: 1843: 1097: 872: 70: 2217: 2100: 1960: 1913: 896: 658: 644: 267: 156: 75: 1071: 1306:
Behrens-Principles and Procedures of Exploratory Data Analysis-American Psychological Association-1997
2138: 2075: 2065: 1918: 1858: 1853: 1384: 1155: 1082: 1023: 854: 723: 680: 577: 403: 80: 2298: 2267: 2020: 1928: 1903: 1786:
Carnegie Mellon University – free online course on Probability and Statistics, with a module on EDA
1785: 850: 668:. They are also being taught to young students as a way to introduce them to statistical thinking. 491: 486:, both of which tried to reduce the sensitivity of statistical inferences to errors in formulating 468: 338: 287: 2115: 2030: 2005: 1977: 1950: 1885: 1631: 1597: 1564: 1511: 1248:"Conversation with John W. Tukey and Elizabeth Tukey, Luisa T. Fernholz and Stephan Morgenthaler" 1200: 966: 846: 581: 523: 475: 407: 60: 2232: 592: 2105: 2070: 2000: 1766: 1750: 1734: 1714: 1691: 1661: 1647: 1639: 1616: 1583: 1550: 1497: 1489: 1475: 1467: 1451: 1418: 1400: 1286: 1183: 1122: 1019: 917: 892: 569: 487: 479: 411: 343: 333: 1760: 2212: 2025: 1873: 1408: 1392: 1353: 1326: 1259: 1173: 1163: 1030: 927:
However, exploring the data reveals other interesting features not described by this model.
697: 616: 464: 348: 171: 146: 2148: 2110: 2045: 1945: 1908: 1835: 1744: 884: 836: 654: 252: 121: 85: 1016:, an EDA and general statistics package widely used in industrial and corporate settings. 17: 1388: 1159: 2158: 2040: 2035: 2015: 1990: 1985: 1923: 1898: 1790: 1684: 1543: 1436: 1413: 1372: 1219: 1178: 1143: 876: 832: 277: 166: 116: 1039:, an open-source programming language widely used in data mining and machine learning. 1010:, Konstanz Information Miner – Open-Source data exploration platform based on Eclipse. 2361: 2202: 2179: 2095: 2057: 1995: 1848: 1707: 1077: 1057:
an open source data mining package that includes visualization and EDA tools such as
1001: 807: 752: 600: 555: 495: 420: 396: 257: 136: 90: 2153: 1955: 1357: 712: 707: 247: 227: 217: 181: 176: 151: 141: 131: 100: 1492:(with A. Buja, D. Temple Lang, H. Hofmann, H. Wickham, M. Lawrence) (2007-12-12). 1168: 1048: 1026: 900: 665: 313: 292: 242: 95: 2143: 728: 638: 519: 416: 388: 237: 202: 186: 126: 34: 1672: 1404: 1264: 1247: 743: 702: 692: 634: 448: 328: 282: 272: 262: 232: 212: 207: 1422: 1187: 534:
than traditional summaries (the mean and standard deviation). The packages
1613:
Parallel Coordinates:Visual Multidimensional Geometry and its Applications
1795: 862: 840: 800: 738: 718: 687: 611:(confirmatory data analysis); more emphasis needed to be placed on using 527: 511: 399: 323: 318: 222: 1635:, Review of Research in Education, Vol. 8, 1980 (1980), pp. 85–157. 1472:
Exploratory Analysis of Spatial and Temporal Data. A Systematic Approach
1632:
Exploratory Data Analysis: New Tools for the Analysis of Empirical Data
1396: 1013: 812: 607:
in 1977. Tukey held that too much emphasis in statistics was placed on
503: 499: 460: 2288: 1893: 1730:
Visual Statistics: Seeing your data with Dynamic Interactive Graphics
1218:, Murray Hill, New Jersey: AT&T Bell Laboratories, archived from 858: 650:
Support the selection of appropriate statistical tools and techniques
539: 507: 452: 1791:• Exploratory data analysis chapter: engineering statistics handbook 1728: 1494:
Interactive and Dynamic Graphics for Data Analysis: With R and GGobi
1344:
Tukey, John W. (1980). "We need both exploratory and confirmatory".
1330: 828:
Many EDA ideas can be traced back to earlier authors, for example:
1578:
Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1983).
1541:
Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1985).
1007: 757:
Projection methods such as grand tour, guided tour and manual tour
591: 1051:
an EDA software for upper elementary and middle school students.
612: 308: 1799: 1709:
Applications, Basics and Computing of Exploratory Data Analysis
1371:
Sailem, Heba Z.; Sero, Julia E.; Bakal, Chris (2015-01-08).
514:—because these median and quartiles, being functions of the 564:, which are nonparametric and robust (for many problems). 2223:
Household, Income and Labour Dynamics in Australia Survey
439:
Tukey's championing of EDA encouraged the development of
751:
Glyph-based visualization methods such as PhenoPlot and
526:; moreover, the quartiles and median are more robust to 891:(MDST 242), took the above ideas and merged them with 1759:
S. H. C. DuToit, A. G. W. Steyn, R. H. Stumpf (1986)
1644:
Exploratory Data Analysis with MATLAB, second edition
474:
Tukey's EDA was related to two other developments in
402:
to summarize their main characteristics, often using
1727:
Young, F. W. Valero-Mora, P. and Friendly M. (2006)
1437:
https://archive.org/details/cu31924013702968/page/n5
653:
Provide a basis for further data collection through
2289:
European Society for Opinion and Marketing Research
2276: 2167: 2131: 2056: 1976: 1884: 1834: 1373:"Visualizing cellular imaging data using PhenoPlot" 1706: 1683: 1580:Understanding Robust and Exploratory Data Analysis 1542: 1246:Morgenthaler, Stephan; Fernholz, Luisa T. (2000). 451:. The S programming language inspired the systems 2284:American Association for Public Opinion Research 2243:National Health and Nutrition Examination Survey 1435:Elementary Manual of Statistics (3rd edn., 1920) 1317:Konold, C. (1999). "Statistics goes to school". 1201:John Tukey-The Future of Data Analysis-July 1961 2238:List of household surveys in the United States 518:are defined for all distributions, unlike the 2304:World Association for Public Opinion Research 1811: 368: 27:Approach of analyzing data sets in statistics 8: 2253:Suffolk University Political Research Center 1602:: CS1 maint: multiple names: authors list ( 1569:: CS1 maint: multiple names: authors list ( 1516:: CS1 maint: multiple names: authors list ( 1144:"Ten simple rules for initial data analysis" 664:Many EDA techniques have been adopted into 1818: 1804: 1796: 1746:Exploratory and Multivariate Data Analysis 1642:; Martinez, A. R. & Solka, J. (2010). 375: 361: 29: 1412: 1263: 1177: 1167: 630:Enable unexpected discoveries in the data 1705:Velleman, P. F.; Hoaglin, D. C. (1981). 1545:Exploring Data Tables, Trends and Shapes 967:variation that increases with tip amount 621:testing hypotheses suggested by the data 2263:Quinnipiac University Polling Institute 1119:Problem Solving: A Statistician's Guide 1109: 929: 300: 194: 108: 47: 40: 2248:New Zealand Attitudes and Values Study 2195:Comparative Study of Electoral Systems 1595: 1562: 1509: 865:, along with the median—see his 1088:Structured data analysis (statistics) 7: 849:used precursors of the stemplot and 471:in data that merited further study. 2294:International Statistical Institute 1762:Graphical Exploratory Data Analysis 760:Interactive versions of these plots 2185:American National Election Studies 2175:List of comparative social surveys 1121:(2nd ed.). Chapman and Hall. 788:Nonlinear dimensionality reduction 42:Data and information visualization 25: 1671:Tucker, L; MacCallum, R. (1993). 550:, such as Quenouille and Tukey's 974: 957: 944: 932: 879:(see his book of the same name). 619:owing to the issues inherent in 867:Elementary Manual of Statistics 1629:Leinhardt, G., Leinhardt, S., 1358:10.1080/00031305.1980.10482706 1074:, on importance of exploration 920:) = 0.18 - 0.01 Ă— (party size) 626:The objectives of EDA are to: 609:statistical hypothesis testing 596:Data science process flowchart 574:testing statistical hypotheses 66:Interactive data visualization 1: 1615:. London New York: Springer. 1093:Configural frequency analysis 633:Suggest hypotheses about the 1169:10.1371/journal.pcbi.1009819 875:articulated a philosophy of 777:Principal component analysis 643:Assess assumptions on which 490:. Tukey promoted the use of 2228:International Social Survey 1682:Tukey, John Wilder (1977). 1674:Exploratory Factor Analysis 1059:targeted projection pursuit 857:", including the extremes, 794:Iconography of correlations 734:Targeted projection pursuit 421:initial data analysis (IDA) 2384: 1646:. Chapman & Hall/CRC. 1611:Inselberg, Alfred (2009). 1470:& Andrienko, G (2005) 1148:PLOS Computational Biology 895:'s work, which introduced 532:heavy-tailed distributions 494:of numerical data—the two 432:apply to analyzing data." 2368:Exploratory data analysis 2312: 2258:The Phillips Academy Poll 2086:Exploratory data analysis 1939:Sample size determination 1686:Exploratory Data Analysis 1346:The American Statistician 1282:Exploratory Data Analysis 899:via coin-tossing and the 853:(Bowley actually used a " 605:Exploratory Data Analysis 393:exploratory data analysis 195:Information graphic types 56:Exploratory data analysis 18:Exploratory Data Analysis 772:Multidimensional scaling 765:Dimensionality reduction 580:tradition's emphasis on 546:included routines using 484:nonparametric statistics 395:(EDA) is an approach of 2199:Emerson College Polling 2091:Multivariate statistics 1934:Nonprobability sampling 1319:Contemporary Psychology 1279:Tukey, John W. (1977). 2208:European Social Survey 2190:Asian Barometer Survey 2081:Descriptive statistics 1966:Cross-sequential study 1919:Simple random sampling 1117:Chatfield, C. (1995). 1098:Descriptive statistics 1000:, an EDA package from 597: 516:empirical distribution 76:Inferential statistics 71:Descriptive statistics 2218:General Social Survey 2101:Statistical inference 1961:Cross-sectional study 1377:Nature Communications 1265:10.1214/ss/1009212675 897:statistical inference 889:Statistics in Society 645:statistical inference 595: 548:resampling statistics 443:packages, especially 441:statistical computing 268:Stem-and-leaf display 157:Alexander Osterwalder 2139:Audience measurement 2076:Level of measurement 1909:Sampling for surveys 1215:A Brief History of S 1212:Becker, Richard A., 1083:Predictive analytics 855:seven-figure summary 724:Parallel coordinates 681:graphical techniques 672:Techniques and tools 582:exponential families 404:statistical graphics 81:Statistical graphics 33:Part of a series on 2299:Pew Research Center 2268:World Values Survey 2011:Specification error 1929:Stratified sampling 1389:2015NatCo...6.5825S 1252:Statistical Science 1160:2022PLSCB..18E9819B 851:five-number summary 576:, particularly the 492:five number summary 339:Regression analysis 2106:Statistical models 2006:Non-sampling error 1904:Statistical sample 1844:Collection methods 1690:. Addison-Wesley. 1397:10.1038/ncomms6825 1072:Anscombe's quartet 847:Arthur Lyon Bowley 719:Stem-and-leaf plot 598: 524:standard deviation 488:statistical models 476:statistical theory 408:data visualization 61:Information design 2355: 2354: 2071:Contingency table 2046:Processing errors 2031:Non-response bias 2021:Measurement error 2001:Systematic errors 1771:978-1-4612-9371-2 1749:. Academic Press 1739:978-0-471-68160-1 1720:978-0-87150-409-8 1713:. Duxbury Press. 1697:978-0-201-07616-5 1666:978-1-58488-594-8 1622:978-0-387-68628-8 1589:978-0-471-09777-8 1556:978-0-471-09776-1 893:Gottfried Noether 683:used in EDA are: 480:robust statistics 412:statistical model 385: 384: 344:Statistical model 334:Visual perception 109:Important figures 16:(Redirected from 2375: 2066:Categorical data 1820: 1813: 1806: 1797: 1743:Jambu M. (1991) 1724: 1712: 1701: 1689: 1678: 1657: 1626: 1607: 1601: 1593: 1574: 1568: 1560: 1548: 1521: 1515: 1507: 1455: 1445: 1439: 1433: 1427: 1426: 1416: 1368: 1362: 1361: 1341: 1335: 1334: 1314: 1308: 1303: 1297: 1296: 1276: 1270: 1269: 1267: 1243: 1237: 1236: 1231: 1230: 1224: 1209: 1203: 1198: 1192: 1191: 1181: 1171: 1139: 1133: 1132: 1114: 1031:machine learning 978: 961: 948: 936: 873:Andrew Ehrenberg 837:order statistics 803:techniques are: 698:Multi-vari chart 560: 377: 370: 363: 349:Misleading graph 172:Leland Wilkinson 147:David McCandless 48:Major dimensions 30: 21: 2383: 2382: 2378: 2377: 2376: 2374: 2373: 2372: 2358: 2357: 2356: 2351: 2308: 2272: 2233:LatinobarĂłmetro 2163: 2149:Market research 2127: 2052: 2026:Response errors 1972: 1946:Research design 1914:Random sampling 1880: 1864:Semi-structured 1836:Data collection 1830: 1828:survey research 1824: 1782: 1776: 1721: 1704: 1698: 1681: 1670: 1654: 1640:Martinez, W. L. 1638: 1623: 1610: 1594: 1590: 1577: 1561: 1557: 1540: 1508: 1504: 1484: 1464: 1459: 1458: 1446: 1442: 1434: 1430: 1370: 1369: 1365: 1343: 1342: 1338: 1316: 1315: 1311: 1304: 1300: 1293: 1278: 1277: 1273: 1245: 1244: 1240: 1228: 1226: 1222: 1211: 1210: 1206: 1199: 1195: 1154:(2): e1009819. 1141: 1140: 1136: 1129: 1116: 1115: 1111: 1106: 1068: 1033:software suite. 994: 983: 982:in the sample). 979: 970: 962: 953: 949: 940: 937: 909: 885:Open University 826: 783:Multilinear PCA 674: 617:systematic bias 603:wrote the book 590: 558: 429: 381: 293:Marimekko chart 122:Ben Shneiderman 28: 23: 22: 15: 12: 11: 5: 2381: 2379: 2371: 2370: 2360: 2359: 2353: 2352: 2350: 2349: 2348: 2347: 2342: 2337: 2332: 2327: 2319: 2313: 2310: 2309: 2307: 2306: 2301: 2296: 2291: 2286: 2280: 2278: 2274: 2273: 2271: 2270: 2265: 2260: 2255: 2250: 2245: 2240: 2235: 2230: 2225: 2220: 2215: 2210: 2205: 2200: 2197: 2192: 2187: 2182: 2177: 2171: 2169: 2165: 2164: 2162: 2161: 2159:Public opinion 2156: 2151: 2146: 2141: 2135: 2133: 2129: 2128: 2126: 2125: 2124: 2123: 2118: 2113: 2103: 2098: 2093: 2088: 2083: 2078: 2073: 2068: 2062: 2060: 2054: 2053: 2051: 2050: 2049: 2048: 2043: 2041:Pseudo-opinion 2038: 2036:Coverage error 2033: 2028: 2023: 2018: 2013: 2003: 1998: 1993: 1991:Standard error 1988: 1986:Sampling error 1982: 1980: 1974: 1973: 1971: 1970: 1969: 1968: 1963: 1958: 1953: 1943: 1942: 1941: 1936: 1931: 1926: 1924:Quota sampling 1921: 1916: 1906: 1901: 1899:Sampling frame 1896: 1890: 1888: 1882: 1881: 1879: 1878: 1877: 1876: 1871: 1866: 1861: 1851: 1846: 1840: 1838: 1832: 1831: 1825: 1823: 1822: 1815: 1808: 1800: 1794: 1793: 1788: 1781: 1780:External links 1778: 1774: 1773: 1757: 1741: 1725: 1719: 1702: 1696: 1679: 1668: 1658: 1652: 1636: 1627: 1621: 1608: 1588: 1575: 1555: 1538: 1535: 1531: 1528: 1525: 1522: 1502: 1482: 1463: 1460: 1457: 1456: 1440: 1428: 1363: 1336: 1331:10.1037/001949 1309: 1298: 1292:978-0201076165 1291: 1271: 1238: 1204: 1193: 1134: 1128:978-0412606304 1127: 1108: 1107: 1105: 1102: 1101: 1100: 1095: 1090: 1085: 1080: 1075: 1067: 1064: 1063: 1062: 1052: 1046: 1040: 1034: 1017: 1011: 1005: 993: 990: 985: 984: 980: 973: 971: 963: 956: 954: 950: 943: 941: 938: 931: 922: 921: 908: 905: 881: 880: 877:data reduction 870: 844: 833:Francis Galton 825: 822: 821: 820: 815: 810: 797: 796: 791: 785: 780: 774: 762: 761: 758: 755: 753:Chernoff faces 749: 746: 741: 736: 731: 726: 721: 716: 710: 705: 700: 695: 690: 673: 670: 662: 661: 651: 648: 641: 631: 589: 586: 428: 425: 383: 382: 380: 379: 372: 365: 357: 354: 353: 352: 351: 346: 341: 336: 331: 326: 321: 316: 311: 303: 302: 301:Related topics 298: 297: 296: 295: 290: 285: 280: 278:Small multiple 275: 270: 265: 260: 255: 253:Stripe graphic 250: 245: 240: 235: 230: 225: 220: 215: 210: 205: 197: 196: 192: 191: 190: 189: 184: 179: 174: 169: 167:Hadley Wickham 164: 159: 154: 149: 144: 139: 134: 129: 124: 119: 117:Tamara Munzner 111: 110: 106: 105: 104: 103: 98: 93: 88: 83: 78: 73: 68: 63: 58: 50: 49: 45: 44: 38: 37: 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 2380: 2369: 2366: 2365: 2363: 2346: 2343: 2341: 2338: 2336: 2333: 2331: 2328: 2326: 2323: 2322: 2320: 2318: 2315: 2314: 2311: 2305: 2302: 2300: 2297: 2295: 2292: 2290: 2287: 2285: 2282: 2281: 2279: 2275: 2269: 2266: 2264: 2261: 2259: 2256: 2254: 2251: 2249: 2246: 2244: 2241: 2239: 2236: 2234: 2231: 2229: 2226: 2224: 2221: 2219: 2216: 2214: 2211: 2209: 2206: 2204: 2203:Eurobarometer 2201: 2198: 2196: 2193: 2191: 2188: 2186: 2183: 2181: 2180:Afrobarometer 2178: 2176: 2173: 2172: 2170: 2168:Major surveys 2166: 2160: 2157: 2155: 2152: 2150: 2147: 2145: 2142: 2140: 2137: 2136: 2134: 2130: 2122: 2119: 2117: 2114: 2112: 2109: 2108: 2107: 2104: 2102: 2099: 2097: 2096:Psychometrics 2094: 2092: 2089: 2087: 2084: 2082: 2079: 2077: 2074: 2072: 2069: 2067: 2064: 2063: 2061: 2059: 2058:Data analysis 2055: 2047: 2044: 2042: 2039: 2037: 2034: 2032: 2029: 2027: 2024: 2022: 2019: 2017: 2014: 2012: 2009: 2008: 2007: 2004: 2002: 1999: 1997: 1996:Sampling bias 1994: 1992: 1989: 1987: 1984: 1983: 1981: 1979: 1978:Survey errors 1975: 1967: 1964: 1962: 1959: 1957: 1954: 1952: 1949: 1948: 1947: 1944: 1940: 1937: 1935: 1932: 1930: 1927: 1925: 1922: 1920: 1917: 1915: 1912: 1911: 1910: 1907: 1905: 1902: 1900: 1897: 1895: 1892: 1891: 1889: 1887: 1883: 1875: 1872: 1870: 1867: 1865: 1862: 1860: 1857: 1856: 1855: 1852: 1850: 1849:Questionnaire 1847: 1845: 1842: 1841: 1839: 1837: 1833: 1829: 1821: 1816: 1814: 1809: 1807: 1802: 1801: 1798: 1792: 1789: 1787: 1784: 1783: 1779: 1777: 1772: 1768: 1764: 1763: 1758: 1756: 1752: 1748: 1747: 1742: 1740: 1736: 1732: 1731: 1726: 1722: 1716: 1711: 1710: 1703: 1699: 1693: 1688: 1687: 1680: 1676: 1675: 1669: 1667: 1663: 1659: 1655: 1653:9781439812204 1649: 1645: 1641: 1637: 1634: 1633: 1628: 1624: 1618: 1614: 1609: 1605: 1599: 1591: 1585: 1581: 1576: 1572: 1566: 1558: 1552: 1547: 1546: 1539: 1536: 1532: 1529: 1526: 1523: 1519: 1513: 1505: 1503:9780387717616 1499: 1495: 1491: 1487: 1483: 1481: 1480:3-540-25994-5 1477: 1473: 1469: 1466: 1465: 1461: 1453: 1449: 1444: 1441: 1438: 1432: 1429: 1424: 1420: 1415: 1410: 1406: 1402: 1398: 1394: 1390: 1386: 1382: 1378: 1374: 1367: 1364: 1359: 1355: 1351: 1347: 1340: 1337: 1332: 1328: 1324: 1320: 1313: 1310: 1307: 1302: 1299: 1294: 1288: 1284: 1283: 1275: 1272: 1266: 1261: 1257: 1253: 1249: 1242: 1239: 1235: 1225:on 2015-07-23 1221: 1217: 1216: 1208: 1205: 1202: 1197: 1194: 1189: 1185: 1180: 1175: 1170: 1165: 1161: 1157: 1153: 1149: 1145: 1138: 1135: 1130: 1124: 1120: 1113: 1110: 1103: 1099: 1096: 1094: 1091: 1089: 1086: 1084: 1081: 1079: 1078:Data dredging 1076: 1073: 1070: 1069: 1065: 1060: 1056: 1053: 1050: 1047: 1044: 1041: 1038: 1035: 1032: 1028: 1025: 1021: 1018: 1015: 1012: 1009: 1006: 1003: 1002:SAS Institute 999: 996: 995: 991: 989: 977: 972: 968: 960: 955: 947: 942: 935: 930: 928: 925: 919: 915: 914: 913: 906: 904: 902: 898: 894: 890: 886: 878: 874: 871: 868: 864: 860: 856: 852: 848: 845: 842: 838: 834: 831: 830: 829: 823: 819: 816: 814: 811: 809: 808:Median polish 806: 805: 804: 802: 795: 792: 789: 786: 784: 781: 778: 775: 773: 770: 769: 768: 766: 759: 756: 754: 750: 748:Horizon graph 747: 745: 742: 740: 737: 735: 732: 730: 727: 725: 722: 720: 717: 714: 711: 709: 706: 704: 701: 699: 696: 694: 691: 689: 686: 685: 684: 682: 677: 671: 669: 667: 660: 656: 652: 649: 647:will be based 646: 642: 640: 636: 632: 629: 628: 627: 624: 622: 618: 614: 610: 606: 602: 601:John W. Tukey 594: 587: 585: 583: 579: 575: 571: 565: 563: 557: 553: 549: 545: 541: 537: 533: 529: 525: 521: 517: 513: 509: 505: 501: 497: 493: 489: 485: 481: 477: 472: 470: 466: 462: 458: 454: 450: 446: 442: 437: 433: 426: 424: 422: 418: 413: 409: 405: 401: 398: 394: 390: 378: 373: 371: 366: 364: 359: 358: 356: 355: 350: 347: 345: 342: 340: 337: 335: 332: 330: 327: 325: 322: 320: 317: 315: 312: 310: 307: 306: 305: 304: 299: 294: 291: 289: 286: 284: 281: 279: 276: 274: 271: 269: 266: 264: 261: 259: 258:Control chart 256: 254: 251: 249: 246: 244: 241: 239: 236: 234: 231: 229: 226: 224: 221: 219: 216: 214: 211: 209: 206: 204: 201: 200: 199: 198: 193: 188: 185: 183: 180: 178: 175: 173: 170: 168: 165: 163: 160: 158: 155: 153: 150: 148: 145: 143: 140: 138: 137:Simon Wardley 135: 133: 130: 128: 125: 123: 120: 118: 115: 114: 113: 112: 107: 102: 99: 97: 94: 92: 91:Data analysis 89: 87: 84: 82: 79: 77: 74: 72: 69: 67: 64: 62: 59: 57: 54: 53: 52: 51: 46: 43: 39: 36: 32: 31: 19: 2277:Associations 2154:Opinion poll 2132:Applications 2085: 1956:Cohort study 1869:Unstructured 1775: 1765:. Springer 1761: 1745: 1729: 1708: 1685: 1673: 1643: 1630: 1612: 1579: 1544: 1496:. Springer. 1493: 1490:Swayne, D.F. 1474:. Springer. 1471: 1468:Andrienko, N 1462:Bibliography 1452:Swayne, D.F. 1443: 1431: 1380: 1376: 1366: 1352:(1): 23–25. 1349: 1345: 1339: 1325:(1): 81–82. 1322: 1318: 1312: 1301: 1281: 1274: 1258:(1): 79–94. 1255: 1251: 1241: 1233: 1227:, retrieved 1220:the original 1214: 1207: 1196: 1151: 1147: 1137: 1118: 1112: 986: 926: 923: 910: 888: 882: 866: 827: 801:quantitative 798: 763: 713:Scatter plot 708:Pareto chart 678: 675: 663: 637:of observed 625: 604: 599: 566: 473: 438: 434: 430: 392: 386: 248:Bubble chart 228:Pareto chart 218:Scatter plot 182:Jeffrey Heer 177:Mike Bostock 152:Kim Albrecht 142:Hans Rosling 132:Edward Tufte 101:Data science 55: 2213:Gallup Poll 2016:Frame error 1951:Panel study 1886:Methodology 1383:(1): 5825. 1285:. Pearson. 1049:TinkerPlots 1027:data mining 1024:open-source 901:median test 835:emphasized 666:data mining 659:experiments 588:Development 410:methods. A 314:Information 96:Infographic 2345:Statistics 2335:Psychology 2144:Demography 2121:Structural 2116:Log-linear 1859:Structured 1755:0123800900 1534:0123800900 1229:2015-07-23 1104:References 818:Ordination 729:Odds ratio 572:theory of 510:, and the 417:John Tukey 406:and other 389:statistics 238:Area chart 203:Line chart 187:Ihab Ilyas 162:Ed Hawkins 127:John Tukey 35:Statistics 2340:Sociology 2321:Projects 2111:Graphical 1854:Interview 1598:cite book 1582:. Wiley. 1565:cite book 1549:. Wiley. 1512:cite book 1405:2041-1723 952:gasoline. 863:quartiles 841:quantiles 744:Bar chart 703:Run chart 693:Histogram 639:phenomena 578:Laplacian 562:bootstrap 552:jackknife 512:quartiles 449:Bell Labs 400:data sets 397:analyzing 329:Chartjunk 283:Sparkline 273:Cartogram 263:Run chart 233:Pie chart 213:Histogram 208:Bar chart 2362:Category 2330:Politics 2325:Business 2317:Category 1733:. Wiley 1486:Cook, D. 1448:Cook, D. 1423:25569359 1188:35202399 1066:See also 992:Software 918:tip rate 799:Typical 739:Heat map 688:Box plot 679:Typical 570:analytic 496:extremes 469:patterns 461:outliers 427:Overview 324:Database 319:Big data 243:Tree map 223:Box plot 1826:Social 1414:4354266 1385:Bibcode 1179:8870512 1156:Bibcode 1014:Minitab 907:Example 887:course 859:deciles 824:History 813:Trimean 715:(2D/3D) 655:surveys 506:), the 504:minimum 500:maximum 1894:Census 1874:Couple 1769:  1753:  1737:  1717:  1694:  1664:  1650:  1619:  1586:  1553:  1500:  1478:  1421:  1411:  1403:  1289:  1186:  1176:  1125:  1037:Python 1020:Orange 790:(NLDR) 635:causes 559:'s 542:, and 540:S-PLUS 528:skewed 508:median 465:trends 453:S-PLUS 1022:, an 1008:KNIME 779:(PCA) 556:Efron 288:Table 1767:ISBN 1751:ISBN 1735:ISBN 1715:ISBN 1692:ISBN 1662:ISBN 1648:ISBN 1617:ISBN 1604:link 1584:ISBN 1571:link 1551:ISBN 1518:link 1498:ISBN 1488:and 1476:ISBN 1450:and 1419:PMID 1401:ISSN 1287:ISBN 1223:(PS) 1184:PMID 1123:ISBN 1055:Weka 1029:and 883:The 861:and 839:and 613:data 554:and 522:and 520:mean 502:and 482:and 467:and 455:and 309:Data 86:Plot 1409:PMC 1393:doi 1354:doi 1327:doi 1260:doi 1174:PMC 1164:doi 998:JMP 657:or 530:or 447:at 387:In 2364:: 1600:}} 1596:{{ 1567:}} 1563:{{ 1514:}} 1510:{{ 1417:. 1407:. 1399:. 1391:. 1379:. 1375:. 1350:34 1348:. 1323:44 1321:. 1256:15 1254:. 1250:. 1232:, 1182:. 1172:. 1162:. 1152:18 1150:. 1146:. 903:. 767:: 623:. 584:. 538:, 478:: 463:, 391:, 1819:e 1812:t 1805:v 1723:. 1700:. 1677:. 1656:. 1625:. 1606:) 1592:. 1573:) 1559:. 1520:) 1506:. 1425:. 1395:: 1387:: 1381:6 1360:. 1356:: 1333:. 1329:: 1295:. 1268:. 1262:: 1190:. 1166:: 1158:: 1131:. 1061:. 1043:R 1004:. 916:( 843:. 544:R 536:S 498:( 457:R 445:S 376:e 369:t 362:v 20:)

Index

Exploratory Data Analysis
Statistics
Data and information visualization
Exploratory data analysis
Information design
Interactive data visualization
Descriptive statistics
Inferential statistics
Statistical graphics
Plot
Data analysis
Infographic
Data science
Tamara Munzner
Ben Shneiderman
John Tukey
Edward Tufte
Simon Wardley
Hans Rosling
David McCandless
Kim Albrecht
Alexander Osterwalder
Ed Hawkins
Hadley Wickham
Leland Wilkinson
Mike Bostock
Jeffrey Heer
Ihab Ilyas
Line chart
Bar chart

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑