Knowledge

Talk:Principal component analysis

Source 📝

1374:
their standard deviations to get the correlation matrix would take O(n*n) ops. Since it's often the case that n<<m, this could be more efficient. As far as I know, Mathematica is closed source so we don't actually know what its PCA function does with that argument. It might just compute the z-scores and then use SVD. I'm curious why they do this, so if anyone knows, feel free to comment. Anyway, someone re-formatted the second paragraph but I still think it's misleading. Using the correlation matrix is not an equivalent formulation and that should be made clear. Again, I think that specific applications of PCA (e.g. to z-scores) and their specific pre-processing steps should not be conflated with the algorithm itself. Something to the effect of "The principal components are the eigenvectors of the covariance matrix obtained from the data, or equivalently the singular vectors from SVD of the data matrix." might be clearer and more appropriate for the intro.
1310:
geometrically, as a line or plane that minimizes squared distances, grounds the problem in terms anyone can understand. It's true that PCA is usually not computed in an explicitly iterative fashion like that (as far as I'm aware), but it's the most precise and accessible explanation I can come up with using so many words. The first k principal components form a basis for the best-fitting k-dimensional subspace, but an orthogonal basis for the best-fitting k-dimensional subspace doesn't necessarily comprise the first k principle components (correct me if I'm mistaken on this). And iteratively constructing/defining a basis should be familiar ground for anyone who has taken a course on linear algebra. Thanks again for your comments. I'm not a mathematician so second opinions are welcome. I'll wait a bit longer before editing the article.
1845:. Nevertheless, Pearson used "lines and planes of best fit". This means that the plane of k first principal components is defined as a k-dimensional plane of best fit. The iterative process used now in the first paragraph is a simple theorem, a consequence of the definition (the mean point belongs to the line of the first component, the line belongs to the plane, etc.). There is one unpleasant incorrectness now: the iterative procedure is not yet formally defined if the eigenspectrum is degenerated (of course, the correction is obvious but forces us to discuss minor technical issues when we must discuss ideas. The "object-oriented" intro ("what ") seems to be better than the procedure-oriented one. Thus, if my opinion matters: --> 848:
applying the usual form of PCA. Rather than a "different method" of PCA, this could be viewed as applying PCA to a specific sort of data or as a pre-processing step unrelated to PCA itself (as opposed to mean-subtraction, which is a necessary step if you want to compute the covariance matrix using a matrix product). My concern is that the intro does not make this distinction clear. PCA is formulated as an optimization problem and using the correlation matrix may not yield the same solution to the maximum-variance/minimum-projection-error objective with respect to an arbitrary collection of data. Conflating the mathematical definition of PCA with a specific use makes the intro less clear.
1291:), but the approximation approach is much clearer and allows multiple nonlinear generalisations. BTW, the history of the "variance" reformulation of the original PCA could be interesting: who and what for destroyed the Pearson's approach and created this viral "variance" definition? Concerning your definition: It is correct, but the iterative algorithm of selection of Principal Components should be, perhaps, a consequence of the basic definition: the best approximation of data by k-dimensional linear manifolds (if we follow the original approach). For the centred data set, these k-dimensional linear manifolds form a complete 569:
I've searched around a bit, and it looks like the practice of using a correlation matrix isn't entirely uncommon, but it will not give you the same results with respect to your original data. If anyone has a good source on this, please let me know. The preceding sentence already describes the algorithm as an eigendecomposition or SVD of the covariance or correlation matrix, which are not affected by the variable means anyway, so the sentence was at least redundant. I propose re-writing that sentence as well to make it clear that using the correlation matrix may not yield the same results.
1783:
article would introduce the problem in terms of the objective (and thus defining "principal component"), solve for the first PC (thus showing it's an eigenvector), and then move on to alternative derivations, efficient calculation of the principal components via SVD/eigendecomposition, applications, etc. I object to the article starting with the phrase "In statistics", because statistics is but one area where PCA is applied (perhaps we can say how it's applied in the "applications" section), and I feel it's a slight psychological deterrent to the average reader.
1181:
matrix, where the "first" eigenvector is very sensitive to small differences. So the results will likely be different, and using the correlation matrix, they might also differ substantially between each run even though you have large samples of the same distribution, whereas the first PC you get from eigendecomposition of the covariance matrix will be fairly consistent. Again, I'm not at all claiming that PCA can't or shouldn't be applied to z-scores. Just that in this case, using a correlation matrix might not give someone the results they expect.
1753:
avoid by first stating the objective function, then defining principal components as a solution to that objective. If you want to change it, I insist you define PCA in terms of the least-squares objective, and define "principal component" in the simplest possible terms as a solution to that objective. Pearson described it geometrically in terms of the objective it optimizes, and I feel that this is much clearer than using jargon like "statistical procedure". I also insist that it not start with the phrase "In statistics,"
96: 75: 195: 185: 164: 42: 1870:
that can be addressed elsewhere.) Does WP have any official guidelines on how much technical competence we should expect from the reader? At least, I think the intro should be relatively gentle, and that's why I wrote up the construction that's currently in the first paragraph. The intro is not the place to slam the reader with technical jargon, unnecessary comments, stray observations, etc. We do not want to deter the reader. It should be as simple as reasonably possible.
3446:. I did make a small omission in that revision, to the maximum variance explanation. In that case it would be easiest to assume the data are mean-centered and talk only about directions, otherwise the variance has no maximum (a "line" can be translated to make the variance infinite even if we parameterize it as having a unit vector for direction). But it's otherwise correct as far as I know. In any case, the minimum-squared-distances presentation is cleaner. 1768:
of PCA as maximizing the variance captured, as opposed to minimizing the residual, so didn't think of it as a "least squares" thing, though I agree that's one approach. The basic idea of fitting a line to data seems imperfect, though, as least squares fitting usually starts with an independent variable, which we don't have in this case. Anyway, there are various ways to go about this. I suggest each of you find some sources to compare how they go about it.
316: 33: 2727:
orthogonal to the directions of the first i-1 best-fitting lines, with the base case i=1 being unconstrained*. We have to say "directions of the best fitting line" because we are only concerned with their directions. They don't have to pass through the origin, but it's their directions (usually taken as unit vectors) that form the basis of principal components.
273: 1433:
repair. Its unnecessarily long and obscures what is otherwise a very simple and elegant concept with jargon and a lack of continuity. If you want to repair it, the intro (and every other section) could use some work, but I think the opening paragraph was just fine the way it was. It was similar to how Pearson introduced PCA and I think he put it best.
2503:(edit: you'd need at least the structure of a normed vector space for PCA to make sense, but my point is that distance is a suitably general and well-understood term). The terms "residual" and "error" are often used in the context of regression, which is different from PCA. They are less general and less likely to be understood by the average reader. 1613:
choose the simpler of two equivalent definitions to state in the intro. Nothing is "delayed" by this introduction, I think the reader can bear with it for the whole four sentences before at long last they come to the definition of "Principal component". If you're concerned about brevity, the rest of the article could use a lot of work in that regard.
4232: 2038:
here. However, readers visit an article to understand the topic of that article. Any material before you reference the topic/title is probably going to be read and carefully considered by the reader. In this sense, it is a good opportunity to communicate important ideas and concepts because it may command a great deal of the reader's attention.
1393:
before PCA than you can keep your concept of PCA more pure. But omitting it or denying it is not helpful. I'm not familiar with covariance vs correlation as the way to explain this option, but if that's what sources do, it's fine for us to do so. I'd rather see the difference integrated as a small optional difference though.
3404: 1933:
components" and I don't think it does the reader any good to force this style. It would actually make a lot more sense to re-title the article "Principal Components" and have "PCA" redirect to the article, since the term "PCA" doesn't seem to have a single concrete definition. Please consider this idea.
1458:"...fitting lines to a set of points in space" is "linear regression". The current entry paragraph talks about all other things which have their own pages in Knowledge, until defining PCA at the very end. This does not fit the Knowledge standard at all. Here are some examples of good entry paragraphs: 4714:
I have added four other major uses of PCA from the social sciences and genetics. Having done this, I see the article is too complex and meandering and reflects the input of editors with different interests and priorities. It should be broken up now into two or three articles. I don't think separating
4696:
This logic affects all mathematical pages, which would harm Knowledge. Knowledge is an encyclopedia and it is used by students. One must not turn it into tutorial pages. As far as I know, people into PCA know some math (the entire concept is mathematical). But I agree that some clarity in the lede is
3846:
The (now deleted by me) statement "Given a set of points in Euclidean space, the first principal component corresponds to a line that passes through the multidimensional mean and minimizes the sum of squares of the distances of the points from the line" is incorrect. Counterexamples are easy to find,
2478:
I do appreciate that you're trying to improve. Please explain why you think it's a travesty, after all the years of work by so many other editors. You might also consider responding to my questions, providing sources, etc., as a way to help move the discussion forward, rather than just insisting on
2448:
If you see a way to improve it without throwing out the basic normal lead style, please go ahead. But in discussing differences, it can also be very helpful to link sources that do it more like what you're proposing. It shouldn't be hard to converge on a good wording. It's not clear to me what you
2022:
I invite you to read the current intro. I'll take a break from editing the article for now but I feel it's much, much clearer and more precise than the intro linked earlier in the talk page. PCA is not inherently a statistical concept per se, it's essentially just a change of basis that satisfies the
1551:
If it is equal to maximum variance, then you can simply say "maximum variance" in the definition. We don't need to explain every fundamental concept. If someone does not know what variance is, they can follow the link to the variance page. Otherwise we would need to recite everything from Pythagoras.
1412:
Hello, the first sentence of the article was more about "how" than "what", so I created another one concerning "what" PCA is. We should further consider postponing the "how" discussion in my opinion; one does not need to know how to do something until s/he knows what it is and why s/he should use it.
1180:
The variance in the third coordinate is very high compared to the others, so for the first principal component, you'd want a vector that lies close to or on that axis. However, because the correlation between different variables will be very small, the correlation matrix will be close to the identity
568:
Centering the data is an essential step in PCA, so that could be mentioned in the intro if someone feels it must. On the other hand, dividing each variable by its standard deviation may give you different principle components, the first of which may not preserve as much variance of the original data.
2372:
Thanks. And I've notified you of your 3RR limit at your talk page. It has actually gone somewhere, in that several of us think your approach is wrong and are trying to fix it. I agree with you that it can perhaps be more clear and precise and jargon free, but within the framework of a normal lead
1983:
I made a few changes to the intro as they came to mind. It's really the same intro but with slightly better wording, and I hope you'll be convinced that this is a reasonable way to present the topic. Keep in mind that PCA can be defined completely without reference to any concepts from statistics at
1656:
Starting with "what" is not only for the stylistic convention, but also for a more healthy approach to a topic. As I mentioned above "one does not need to know how to do something until s/he knows what it is and why s/he should use it". Anyway, our positions are clear and I started to repeat myself.
1392:
where I note this topic has long been horribly messed up. It should be fixed, not removed, since normalizing by variance is indeed a very common part of PCA. Yes, it affects the components – it makes them independent of the measurement units, for example – and yes if you think of it as pre-process
732:
Whether you apply PCA to z-scores or any other sort of data has nothing to do with PCA itself. At least make clear the distinction between necessary steps of the algorithm, like mean subtraction, and the sort of data you're using it on. Again, using the correlation matrix may not yield a solution to
1913:
Some more edits. Hopefully this is a satisfactory compromise. It's starting to look like a good intro in my opinion. The terms "Principal Component" and PCA are clearly defined and both objectives are briefly covered. I don't think we need to be overly pedantic in the intro. Anyway, there's still a
1869:
If there aren't enough linearly independent eigenvectors then any orthogonal basis for the remaining subspace can be taken. This doesn't contradict the definition in the first paragraph, although you're correct if you meant that there may not be a unique solution. This seems like a very minor issue
1826:
P.S. I would not be opposed to a very clear and well-organized derivation from maximum variance if we can replace the whole intro (and maybe clean up some of the main article) instead of just the first paragraph. The whole article needs work and it's not my intent to be an obstacle here. I'd rather
1782:
Pearson was a statistician, and yet he still chose to introduce the concept geometrically. The least-squares approach highlights the difference between PCA and linear regression, and nicely ties it in with linear algebra while remaining (relatively) accessible to the average undergrad. Ideally this
1767:
I'm not saying it was a great lead, which is why I suggested working on a compromise. It's not clear to me why you object to "In statistics" for starters. Is there a better way to characterize the field within which this concept is meaningful? Wasn't Pearson a statistician? And I always thought
1328:
of subspaces is practically unique: the first vector belongs to the 1D subspace from the flag, the second vector is in its orthogonal complement in the 2D subspace from the flag, etc. The only uncertainty is in the signs - each vector can be multiplied by -1. Therefore, if we have the k-dimensional
2555:
The first PC is a vector with the same direction as a line that best fits the data. This line may not be a vector itself, meaning the line that best fits the data may not pass through the origin. This technicality is not a problem if you say "the direction of a line". Incidentally, this is why the
2484:
I don't understand why you claim that "distances" is more appropriate, esp. in light of the sources I showed that have "residuals"; and I don't see how not being a statistician is relevant to how confusing any of these terms might be. I'm not a statistician, and it seems OK to me. Also not clear
2463:
I've already explained the reasons I think we should use the term "distance" instead of errors or residuals. Let's wait until they open the discussion on the dispute resolution page so we aren't wasting our time. PCA is an uncomplicated concept that anyone can understand and a great example of how
1898:
I edited the intro to include a definition based on the maximum variance objective, and I think it reads a bit more clearly. This and the first paragraph are really just recursive definitions of the principal components, so I don't consider them to be too "procedural" in nature. However, if anyone
1612:
It doesn't start with the "how", it starts with a definition of "best fitting line", which is the objective that PCA optimizes. It is immediately clear to most readers what that means. Yes, PCA maximizes the variance of the projected data, but why lead with that? I don't think it's unreasonable to
597:
It is not original research. Do you agree that the first principal component should maximize the variance of the projected data points? PCA is defined as an optimization problem, and if instead of using the covariance matrix, you use the correlation matrix, the transformation you get may not be an
4770:
that stripping out the math complexity is contrary to the point of Knowledge (and constitutes a paradigm shift that would necessitate the rewriting of most Wiki pages related to math topics). Knowledge is collection of human knowledge. Introductions or specialty sections can be used to give reads
3419:
Good, that's a convincing demonstration that the ith direction applied to the residual R is the same as the ith direction applied to the original vector. But does he really suggest skipping the computation of R, as a practical matter? Wouldn't it be easier to compute R than to constrain the ith
2037:
One final thought for today. On Knowledge it is conventional that the article's topic should be defined/referenced in the first sentence or shortly thereafter. This is understandable; it would be very inconvenient to "delay" the definition any more than a few short sentences, and I've not done so
1231:
Given a set of points in two, three, or higher dimensional space, a "best fitting" line can be defined as one that minimizes the average squared distance from a point to the line. The next best-fitting line can be similarly chosen from directions perpendicular to the first. Repeating this process
758:
I give up. It did not hit you that some methods use normalized variance "– and, possibly, normalizing each variable's variance to make it equal to 1; see Z-scores.". The text does not deny that other nonnormalized methods exist, in ML or other. You are bordering on vandalism and will end up being
2777:
You mean Bishop's section 12.1.2 on projection error minimization? I don't have access to my copy at work these days, but on Amazon I can see some of the relevant pages. Unfortunately not page 564 where he might support what you're saying. I'm not saying you're wrong, just that looking at the
2433:
The term "residual" occurs in the context of statistics. We don't need to call it that here. Calling the objective "average squared distance" is entirely correct, more general in a mathematical sense, and much more likely to be understood by a casual reader. I don't know what you're thinking. In
1752:
The intro in the edit you linked describes PCA as a "statistical procedure" (whatever that is) and is practically a circular definition, so I really can't imagine why you think the current intro is more "procedural" in nature than the old one. That's exactly the sort of description I intended to
1510:
The definition I gavein the intro is equivalent to the maximum variance formulation, and not equivalent to linear regression. You cannot fit a vertical line with linear regression, for example. I believe it will be clearest to other English speakers the way I've written it, so I do object to any
1373:
FYI, the only reason I can think of to use a correlation matrix instead of computing the z-scores beforehand is to save a few division ops. If you have m data points with dimension n, then computing the z-scores would require m*n divisions, but after computing the covariance matrix, dividing out
1250:
Given a collection of points in two, three, or higher dimensional space, a "best fitting" line can be defined as one that minimizes the average squared distance from a point to the line. The next best-fitting line can be similarly chosen from directions perpendicular to the first. Repeating this
2796:
The "residuals" (which usually refer to scalar values, but I suspect you're talking about the parts that don't lie in the subspace spanned by PCs up to i-1) from projecting onto the i-1 PCs are orthogonal to those i-1 PCs by definition. You can just fit the ith PC under the condition that it be
2726:
Finding the ith principal component could be viewed as a constrained optimization problem (and in fact you can use Lagrange multipliers to show that it's an eigenvector. Bishop does this in "Pattern Recognition and Machine Learning"). You're trying to find a best-fitting line whose direction is
1432:
PCA is most easily understood as successively fitting lines to a set of points in space. Why start it off with "In statistics,"? It's not exclusively a statistical technique, and you don't always use the principal components for dimensionality reduction either. The article is in serious need of
2053:
Thanks for your efforts on this, but I don't think it's important for the reader to study distance from a line before knowing that a principal component is a best-fit direction vector; so I've taken another cut at a normal lead starting with a definition that tries to be clear and neutral with
847:
I'd like to summarize my point and make a few things clear, please correct me if I'm mistaken about any of these points. Computing PCA using a correlation matrix instead of a covariance matrix will give you the same set of principal components that you'd get by standardizing your data and then
4676:
Sorry (and I'm actually a mathematician myself). I'd have to say all this complex mathematics is all very nice but it actually belongs in a specialist textbook, not in an encyclopaedia. A layman who wants to know what PCA is, what it does and what its limitations are is going to be completely
552:
This is/was suggested as a pre-processing step in the third paragraph, but it may change the directions of the principle components. Consider vectors and . The standard deviation across the values in the first coordinate is 1, and across the second is 2. Dividing, we get ,, which will have a
1309:
Thanks for your comment! My reference on the topic (Bishop) shows how the solution can be derived in each way, from maximum-variance and from minimum MSE projection-distance. Deriving a solution from the maximum variance objective seems a little bit simpler, but I think stating the objective
4039: 2294:
PS I am trying to help you here. It's disappointing when a simple and elegant concept like PCA is obscured by jargon and careless editing. WP is a great resource and I use it all the time. I don't want anyone to come away from this article confused or with mistaken ideas about what PCA is.
1932:
To be honest, I don't think your edit reads as nicely. The phrase "directions (principal components) that align with most of the variation" is not as precise or succinct as either recursive definition. It is awkward to define PCA without first taking a couple sentences to define "principal
4785:
There is no need to strip away the math, yet at the same time the article could be much accessible to laypeople and students. The section on Intuition seems the place to provide a non-jargon laden description. That is not achieved by starting off with "PCA can be thought of as fitting a
437:
The easies fix is to just take out that optional section that is wrong and doesn't fit. A better mix might be to put that option up front in defining the B matrix (as B = Z instead of what it is now), after which the rest would make sense; that is, the option is between the present
434:, for example, the retired editor who did most of the detailed article writing around that time moved that bit from one place it didn't fit to another. It certainly doesn't fit where it is now, since the eigenvectors are based on the transformed data that they're used to transform. 4825:
In the section on how PCA relates to the SVD the SVD is described as X=U Sigma W^T, yet the page for SVD describes it as U Sigma V^T. Is there any reason as to why this page uses W instead of V? If there isn't, then it should be changed to be consistent with the main SVD page.
1454:
you have not answered my main discussion point. If you have any issues with the definition sentence you can modify it. If you think that it doesn't only belong to statistics, then you can add anything else that it belongs to. There was not a mention of dimensionality reduction
4747:
Though I do not agree with your proposal, to be more constructive, you should actually provide examples of how you see the topic being split. What are the two or three articles you are proposing, what content should go in each. Give a roadmap for your suggested improvements.
2711:
that best fits the residual." It's not clear to me what your distances refer to if you're not operating on residuals from the previous. Does this work for distances of original data points (perhaps it does, though it's not obvious to me, not how I was thinking of it).
627:
against that. You need to find an appropriate secondary reliable source (such as a journal article or a news article) that supports your addition, unfortunately, that is not what you have been doing. Instead, your edit seems to be backed by "that is the truth". Read
3464:. We first define "best-fitting line", then define our objective as a series of "best-fitting" lines, then state the solution to that objective. This is the most sensible progression in my opinion, and we remind the reader of several important concepts along the way. 2086:
That's not what I'm thinking. Residuals are what's left after subtracting off a model, regression or not. There was no mention of independent or dependent variables in the lead I wrote. You have reverted attempted improvements too many times now; please stop that.
2023:
objective(s) described in the intro. There's no point in forcing it all into one sentence because it won't be comprehensible by anyone who isn't already familiar with PCA. Please be careful if you edit it. We should aim to communicate, rather than merely document.
1675:
It is absolutely necessary to define "best fitting line". Linear regression minimizes squared vertical distances between the points and the line i.e. (f(x)-y)^2. The first principle component minimizes squared (orthogonal) distances. Perhaps a graphic would help:
1343:
Thanks again. I replaced the opening paragraph but the article is still in a rough state. It reads more like a rough literature survey than an encyclopedic entry. Many things are repeated in different sections, and the article is littered with stray observations.
2324:
But it isn't. It's the base case of PCA's recursive definition. We define the first PC and then we define the ith PC using the first i-1 PCs. This is an extremely common pattern in both computer science and the construction of a basis in linear algebra e.g. the
881:
This is not original research. Two sections above this one, another user remarked that the intro should be re-written and the problem stated mathematically, and I agree with them. Without a precise definition of PCA to work from, we're not getting anywhere.
1184:
This condition was described by Karl Pearson himself: "This sphericity of distribution of points in space involves the vanishing of all the correlations between the variables and the equality of all their standard-deviations", in the original 1901 paper.
3071: 3541:
page is gone from the intro, but think about about how a vector decomposes into a linear combination of basis vectors. This is a fundamental concept and I think everyone could do with a reminder. Please fix the intro if you understand what I'm saying.
410:
Could someone please have another go through the article in that relation? There are still occurrences of X^T as the data matrix. Also, if possible, it would be nice to have a reference (textbook) with notation consistent to one used in the article.
1811:
Unless there's another field in which "principal component analysis" has an entirely different meaning, I don't see the need to so distinguish PCA in the first sentence of this article. Applications can be addressed in the "applications" section.
1846:
I prefer Pearson's definition through data approximation (especially because it allows immediate non-linear generalisations, while the "variance approach" does not). Nevertheless, this definition is better to introduce correctly, as Pearson did,
4601: 1204:
Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal
3492:
If the article were renamed "Principal Components" we would not have to hem and haw about what PCA means in the first sentence. We could take the intro in the revision I linked earlier and just put something like this in front of it: "The
2481:
The terms errors and residuals have a specific meaning. The term "distances" is more appropriate for explaining PCA in a general context where your points may not be a statistical sample, and less likely to confuse someone who isn't a
3739:
The body of the article is still in need of "de-fragmentation" for lack of a better word. I don't have time to work on it now but as a general game plan, I think editors should try to make the article a bit more compact and coherent.
4680:
I've made a start on trying to put some plain language into the article instead of all this far-too-clever mathematics - which I would move to some other strictly technical entry called maybe "Mathematics of Principal Components".
3917: 2309:
I don't say that it is incorrect, I say that it is redundant. It does not belong to the top of the PCA article. It has its own article already. If you have concerns, please try to fix it instead of reverting altogether. Thank you.
2759:(say, defined in terms of a direction (||direction||=1) and translation in R^n as the set of points {c*direction+translation | c in R}) that best fits the data. For the ith, take (the direction of) a line that best fits the data, 4226: 777:
I understand what you mean, but using the correlation matrix changes the algorithm in a fundamental way, and I think this distinction should at least be made clear to the reader who might not understand the difference otherwise.
2279:
Regression is a fundamentally different idea from PCA. I urge you to carefully consider that "redundant" sentence. If it is incorrect, then please explain my mistake. Otherwise I insist that it remain in the article's intro.
1491:
It is fallacious to think that the reason linear regression is called linear regression is that it involves fitting lines. Polynomial regression, for example, is an instance of linear regression, so called for good reasons.
4433: 2464:
linear algebra can be applied to real-world problems. This article is a travesty, so I'm trying to improve its correctness and organization while minimizing the amount of background knowledge it requires from the reader.
2105:
have a specific meaning. The term "distances" is more appropriate for explaining PCA in a general context where your points may not be a statistical sample, and less likely to confuse someone who isn't a statistician.
1899:
objects, please let me know why. I'd really like to delete the rest of the intro after the first couple paragraphs include a simple proof that the first PC is an eigenvector in its place, but I'll wait for comments.
486: 3438:, and basically the same one Bishop uses. If you set it all up like Bishop does and set the vector of partial derivatives equal to zero, then you get that it's an eigenvector of the covariance matrix. So, they're 3033: 1529:
Again, the rest of the article does need a lot of work, so I don't mind at all if you want to improve other sections. Even the rest of the intro is a bit misleading and will need to be rewritten at some point.
1631:
I'll add that I think readability is far more important than rigid stylistic convention. Most people use Knowledge as a learning tool. You don't have to shoehorn the entire definition into the first sentence.
2811:
I don't see it as more natural. Can you point out a source that does it that way? Does Bishop? Sorry, I can't see the relevant page. And yes of course residuals are vectors; how would they be scalars?
1914:
lot of repetition in the rest of the article that should be removed at some point. It seems like the article has been continually added to without much attempt at maintaining its continuity/readability.
1211:
In many physical, statistical, and biological investigations it is desirable to represent a system of points in plane, three, or higher dimensioned space by the "best-fitting" straight line or plane.
3632:" in the lead sentence, especially since you then use the word "vector" for the same kind of object. See if you can simplify that to plain English; it doesn't need to start too formal or precise. 146: 1965:
The article starts with a recursive definition of principal components. If you want to phrase it a bit more formally that's fine by me but it is a (reasonably) precise mathematical definition.
3782:
for the number of principal components and k to index individual principal components. It uses p for the dimension of the underlying vectors and i to index individual rows in the data matrix.
1265:
I think it reads much more clearly than the current paragraph and describes PCA as the solution to a specific objective rather than a "statistical procedure", without being any less precise.
251: 583:
This is original research. It is "principal" not "principle". And of course it can be done with normalized vectors in which case it has the covariance. If you revert I will bring it to ANI.
863:
You are now doing WP:OR. Knowledge is not about this, but about reflecting the professional literature. If you disagree with the literature, please publish and it will make its way to WP.
792:
It does not change anything methodologically since you are dealing with normalized vectors and derive their attributes. People can retranslate from the normalized vector to the raw one.
700:
I'm not sure I understand what you mean. I didn't add any content or cite any new sources in the article. I only removed something that was inconsistent with the usual definition of PCA.
4869: 4484: 3399:{\displaystyle p_{i}^{T}v=p_{i}^{T}(a_{1}p_{1}+a_{2}p_{2}+\dots +a_{i-1}p_{i-1})+p_{i}^{T}R=a_{1}p_{i}^{T}p_{1}+a_{2}p_{i}^{T}p_{2}+\dots +a_{i-1}p_{i}^{T}p_{i-1}+p_{i}^{T}R=p_{i}^{T}R} 1719:, I pinged you as you are active users who wrote to this talk page previously. Could you please give an input to resolve this discussion? It is stuck for more than a week. Thank you. -- 2068:
I beg to differ. Both you and Gufosowa seem to think that I'm describing regression here. There is no dependent variable in PCA. The terms in the objective function are not residuals.
1947:
I'm not saying I got it to a great state, but I really think we need to lead with a definition, not a tutorial. And please avoid caplitalization of things that aren't proper names.
4300: 3912: 3785:
To avoid creating unnecessary confusion for the first-time reader, I propose changing the Introduction to use pronumerals in the same way as in the Details section, ie replace p by
3685:
That's fine then. Unless Gufosowa has an issue with it we can probably close the topic on the DRN. I will probably continue to edit the rest of the article when I have time though.
1511:
changes . Defining "principal component analysis" requires defining what a "principal component" is, and I've condensed that into as short and accessible an explanation as I could.
4265: 3877: 3497:
of a collection of points in R^n are a sequence of n vectors where the ith element is the direction of a line that best fits the data and is orthogonal to the first i-1 elements"
1984:
all. PCA is a change of basis that satisfies the recursive least-squares objective described in the first paragraph, and I feel that's by far the most natural way to introduce it.
1287:) is much more seminal that the definition of this article. Of course, minimisation of MSE is equivalent to maximisation of the variance of projection (a simple consequence of the 4332: 3630: 2172: 1555:
I do not like the current entry paragraph because of these two reasons: 1) It discusses "how" before "what" 2) It delays "PCA" to discuss other things that have their own pages.
1865:
I'm not sure what you mean. You can always find a "best-fitting" line. There may be more than one direction to choose from but it's the same up to the order they're selected. (
4637: 4771:
more of a bird's eye view of the topic, but the richness of information for this topic should not be watered down just because some people are not able to appreciate/use it.
2132:
article in Knowledge, we can simply link to it. If the reader does not understand it and find it important, they may follow the link. Hence, the following part is redundant:
4155: 4126: 4097: 4068: 4459: 2914: 3066: 2198: 4859: 2683: 2627: 2228: 1734:
Yes, I agree with you that the current description in the lead is more a process than a definition. I'd saying something like how it was before AP295 changed it in
2741:*Edit- Unconstrained in its direction. Bishop actually uses a normality constraint to enforce ||u||=1. Just didn't want that to be a point of confusion/contention. 2709: 2653: 2254: 2128:'s change. We do not have to establish all the background, we assume that the reader knows basic concepts like line, error, dimension, space etc. We already have a 810:
Please see my summary below. It seems we got off on the wrong foot. I'm not claiming that using PCA on standardized data is unsound or methodologically incorrect.
299: 4874: 4479: 3803: 3780: 4341: 2434:"Pattern Recognition and Machine Learning", Bishop calls it the "minimum error" formulation, but there's really no harm at all in just calling them distances. 1851:(this is 'what', not 'how'), and then the iterative procedure can be presented as the simple corollary (with all proper formulations for degenerated spectra).- 1470:
Knowledge is an encyclopedia, it is not a textbook. So the discussion of "what" should precede the discussion of "how". Do you have any objections to that? --
4884: 136: 46: 4899: 329: 241: 685:
Who's "we"? Please re-consider my point. Are there any software packages that compute PCA using the correlation matrix instead of the covariance matrix?
4160: 1841:
Thank you for invitation to this discussion. I can say the both positions are correct but (1) I prefer Pearson's definition principal components through
4854: 4889: 2534: 4864: 4806:
This paragraph exclusively discusses factor analysis, never PCA. Why is it here? (Article already mentions FA earlier, with appropriate links.)
4034:{\displaystyle {\frac {1}{N}}XX^{T}={\begin{bmatrix}\langle x^{2}\rangle +\langle x\rangle ^{2}&0\\0&\langle y^{2}\rangle \end{bmatrix}}.} 1413:"Why" could follow "what", e.g. the sentence beginning with "PCA is mostly used as a tool..." could be the second sentence. What do you think? -- 217: 4894: 375: 112: 4879: 4827: 441: 412: 1251:
process yields an orthogonal basis in which different individual dimensions of the data are uncorrelated. These basis vectors are called
648: 2485:
why you think points being a "statistical sample" is relevant; you can compute statistics on data no matter where they came from, no?
2919: 1283:
You are right! The Pearson's definition of Principal Components as the best approximation of the data cloud with minimisation of MSE (
208: 169: 915:
That's a bit better, but I still think the intro needs work. I'll leave it alone for now, but you may find this code interesting:
3759:
As at 27 Sept 2021 the introduction uses p for the number of principal components and i to index individual principal components.
1827:
people not make quick and careless edits to the first paragraph though, as I've thought very carefully about how it should read.
1329:
planes of the best approximation then the basis we speak about is uniquely defined (up to multiplications of some vectors by -1).
103: 80: 3699:
Good idea. It still needs work on terminological and variable name consistency, and work on the Z thing per discussions above.
4849: 3461: 3443: 2142: 667:
we clearly use both techniques and the page mentions it. Reported very disuptive editing, 3RR, with original research on WP:AN.
651: 2265:
is concerned that linear regression works numerically while PCA geometrically, we can add it as a keyword in the sentence. --
2054:
respect to whether one prefers the max variance or min distance approach. Please see what you think, do try to improve it.
3068:. If you restrict your selection of the ith principal component to only those unit vectors orthogonal to P's columns, then 4711:
Not so. Most users pump their PCA straight out of statistical packages and have barely heard of eigenvalues or transposes.
4653: 714:
Wolfram's Mathematica does both covariance and correlation, and any book on Z scores used in finance and risk management.
532:
depending on your application, but I did add a few words about what normalizing the input data variance can be good for.
3420:
direction to be orthogonal to all those other directions? How do you do that exactly? Or is it just a gedanken thing?
2629:
principal component is a direction of a line that minimizes the sum of squared distances and is orthogonal to the first
289: 55: 3661: 2365: 1584: 748: 359: 1677: 2499:
The word "distances" is more appropriate because that is what they are. The concept of distance is defined in any
4776: 4753: 654: 2326: 338: 416: 4831: 386:
representing a variable that was measured, rather than the other way round which was unnecessarily confusing.
632:
for an understanding of what that means. I have reverted your edit again, reinstating your edit constitutes
4738: 4702: 1497: 1215:
In my opinion, the latter is a much clearer way to motivate the problem and convey its geometric intuition.
901: 868: 797: 764: 719: 672: 645: 588: 4772: 4749: 4791: 4270: 3882: 4238: 3850: 3646:"Points in R^n" is there to let the reader know that our sequence is of equal length as the dimension. 2411: 2350:
Since this doesn't seem to be going anywhere, I've made a topic on the dispute resolution noticeboard.
2138: 1325: 1292: 61: 4786:
p-dimensional ellipsoid to the data". Would be great if someone could attempt a plain English version.
4662: 4649: 4305: 194: 3813: 95: 74: 4641: 3649: 3606: 2761:
but only from among those lines whose direction is orthogonal to the direction of the first i-1 lines
2399: 2353: 2148: 2102: 736: 4815: 4596:{\displaystyle {\frac {1}{N}}\sum _{k=1}^{P}\sigma _{k}{\hat {u}}_{k}\sum _{j}({\hat {v}}_{k})_{j},} 2778:
residuals makes so much more sense. Also note that he does the maximum variance formulation first.
2556:
data must be centered if you compute the covariance matrix using an outer product (Gramian matrix).
32: 4720: 4686: 4606: 3808:
I just wanted to check first whether I have misunderstood something. Please let me know if I have.
1288: 4658: 4645: 1558:
Hence, I created a sentence that talks about "what" and gives "PCA" right away (Please compare to
216:
on Knowledge. If you would like to participate, please visit the project page, where you can join
111:
on Knowledge. If you would like to participate, please visit the project page, where you can join
17: 4811: 4767: 4734: 4698: 3832: 3809: 3718: 3704: 3676: 3637: 3561: 3483: 3425: 2817: 2783: 2717: 2576: 2542: 2490: 2454: 2423: 2378: 2315: 2270: 2092: 2059: 1952: 1802: 1773: 1743: 1724: 1662: 1599: 1493: 1475: 1418: 1398: 1366: 1284: 897: 878: 864: 807: 793: 774: 760: 729: 715: 697: 682: 668: 664: 642: 620: 584: 537: 520:
It's still messed up, and I need to fix it, since the last step with the KLT refers to using the
510: 496: 344: 200: 292:
on 6 May 2016. For the contribution history and old versions of the redirected page, please see
184: 163: 4131: 4102: 4073: 4044: 4438: 2755:
I'll put it another way. For the first principal component, one can take (the direction of) a
1856: 1565: 1562: 1465: 1462: 1334: 1300: 1199:
One more comment: Compare the first sentence of this article with how Pearson introduced PCA:
598:
optimal solution to this objective. See "Pattern Recognition and Machine Learning" by Bishop.
285: 2829: 636:. I do not want an admin to block you, but it is important that you comply with Knowledge's 3038: 2177: 637: 623:
is talking about is that the sources that you are adding may be primary sources. We have a
401: 396:
The article is still pretty horrible, but this may make it a little easier to take forward.
374:
I have changed the introduction of X, so it is now defined to be the standard way round for
340: 315: 2658: 2602: 2203: 733:
the maximum-variance/minimum-projection-error objective with respect to the original data.
430:
Since 2006, the bit on converting input data to z-scores has been confused and messed up.
4835: 4795: 4780: 4757: 4742: 4724: 4706: 4690: 4666: 3836: 3817: 3749: 3745: 3722: 3708: 3694: 3690: 3680: 3665: 3657: 3641: 3597: 3593: 3565: 3551: 3547: 3538: 3520: 3516: 3506: 3502: 3487: 3473: 3469: 3455: 3451: 3429: 3414: 3410: 2821: 2806: 2802: 2787: 2772: 2768: 2750: 2746: 2736: 2732: 2721: 2594: 2590: 2580: 2565: 2561: 2546: 2512: 2508: 2494: 2473: 2469: 2458: 2443: 2439: 2427: 2382: 2361: 2337: 2333: 2319: 2304: 2300: 2289: 2285: 2274: 2115: 2111: 2096: 2081: 2077: 2063: 2047: 2043: 2032: 2028: 1993: 1989: 1974: 1970: 1956: 1942: 1938: 1923: 1919: 1908: 1904: 1879: 1875: 1860: 1836: 1832: 1821: 1817: 1806: 1792: 1788: 1777: 1762: 1758: 1747: 1728: 1712: 1689: 1685: 1666: 1641: 1637: 1622: 1618: 1603: 1539: 1535: 1520: 1516: 1501: 1479: 1442: 1438: 1422: 1402: 1383: 1379: 1370: 1353: 1349: 1338: 1319: 1315: 1304: 1277: 1273: 1243: 1239: 1224: 1220: 1194: 1190: 905: 891: 887: 872: 857: 853: 819: 815: 801: 787: 783: 768: 752: 744: 723: 709: 705: 690: 676: 658: 607: 603: 592: 578: 574: 562: 558: 541: 514: 500: 420: 405: 2688: 2632: 2233: 342: 4787: 4716: 4682: 1594:
It is open for modifications. If you have any oppositions to 1) and 2) please share. --
1232:
yields an orthogonal basis in which individual dimensions of the data are uncorrelated.
4464: 3788: 3765: 4843: 4807: 3828: 3714: 3700: 3672: 3633: 3557: 3479: 3421: 2813: 2793: 2779: 2713: 2572: 2552: 2538: 2533:
AP295 forgot to ping us to the WP:DRN discussion he started about us. Please see at
2528: 2486: 2450: 2419: 2374: 2347: 2343: 2311: 2266: 2129: 2125: 2088: 2055: 2015: 2011: 1980: 1962: 1948: 1929: 1798: 1769: 1739: 1720: 1716: 1672: 1658: 1628: 1609: 1595: 1526: 1507: 1486: 1471: 1429: 1414: 1394: 629: 624: 533: 506: 505:
For now, I took it out, not seeing a good way to fix it or a good reason to keep it.
492: 4221:{\displaystyle \langle x^{2}\rangle +\langle x\rangle ^{2}<\langle y^{2}\rangle } 3671:
I took a stab at saying it in English. Sadly, the article uses dimension p, not n.
1587:
data in a higher dimensional space into a lower dimensional space by maximizing the
2797:
orthogonal to the first i-1 PCs. This is a much more natural way of looking at it.
2500: 2407: 2069: 2019: 1852: 1708: 1330: 1296: 633: 2479:
your idiosyncratic approach to the lead. Re residuals vs distances, you've said
2418:
on p.35, and on the next page they examine the sum of squares of the residuals.
2141:, a "best fitting" line can be defined as one that minimizes the average squared 2406:
is clearly more correct within those definitions, and is in common use, e.g. at
1678:
https://services.math.duke.edu/education/modules2/materials/test/test/errors.gif
1295:
of subspaces. Its orthogonal basis is called the principal components basis. ...
397: 213: 4733:
PCAs if they do not know what an eigenvalue means. The concept is mathematical.
3713:
It looks better, thank you for improving it. Let's close the DRN topic then. --
3478:
That might be a good approach for an "Introduction" section; not for the lead.
2916:
are our first i-1 principal components, then a vector v can be decomposed into
1653:
Starting with the definition of "best fitting line" is what I mean by delaying.
3741: 3686: 3653: 3589: 3543: 3512: 3498: 3465: 3447: 3406: 2798: 2764: 2742: 2728: 2586: 2557: 2504: 2465: 2435: 2357: 2329: 2296: 2281: 2262: 2107: 2073: 2039: 2024: 1985: 1966: 1934: 1915: 1900: 1871: 1828: 1813: 1784: 1754: 1681: 1648: 1633: 1614: 1572: 1546: 1531: 1512: 1449: 1434: 1375: 1345: 1311: 1269: 1235: 1216: 1186: 883: 849: 811: 779: 740: 701: 686: 614: 599: 570: 554: 548:
Dividing each attribute by its standard deviation changes principle components
190: 108: 3588:
How's this? I think it reads pretty well and it's closer to the usual style.
4715:
out the mathematical component would harm either mathematics or Knowledge.
4428:{\displaystyle X=\sum _{k=1}^{P}\sigma _{k}{\hat {u}}_{k}{\hat {v}}_{k}^{T}} 4334:
does not correspond to a line that passes through the multidimensional mean
2230:
best-fitting line can be chosen from directions perpendicular to the first
1738:
was better, though still not great. Let's go back and craft a compromise.
553:
different set of principle components. Does this not defeat the purpose?
3556:
You could do some fixing instead of just complaining and reverting, no?
3511:
I didn't rename the article, but I made a few changes along those lines.
1588: 1559: 1459: 4231: 3847:
e.g. if mean is 0. Another example is in 2d for a cloud of points with
488:
and a version that divides each row of that by its standard deviation.
1324:
Fine. The only comment I have is: the orthonormal basis of a complete
481:{\displaystyle \mathbf {B} =\mathbf {X} -\mathbf {h} \mathbf {u} ^{T}} 2571:
Yes, I understand that. Is there a rewording that you suggest then?
2537:
and enter your summary of the dispute if you like. I just did mine.
2535:
Knowledge:Dispute_resolution_noticeboard#Principal_component_analysis
524:
that is now not defined. It's not clear what's the best way to use
4230: 3755:
Pronumerals inconsistent between Introduction and Details sections
2261:
It should be removed or postponed to the methodology sections. If
3028:{\displaystyle v=(a_{1}p_{1}+a_{2}p_{2}+\dots +a_{i-1}p_{i-1})+R} 389:
I've then gone through the text and made the appropriate changes
896:
I made sure it says that there are two methods, unambiguously.
345: 309: 267: 26: 3460:
Anyway, if you see what I mean, I think we should start from
3035:
Where vector R must be orthogonal to the subspace spanned by
2685:
principal component is a direction orthogonal to the first
2416:
the amount that is unexplained by the pc model—the residual
2449:
think the context is here that's broader than statistics.
759:
blocked if you continue violating encyclopedic standards.
382:
representing measurements from a single sample, and each
370:
X changed to be the standard way round for a data matrix
3842:
Incorrect statement in section 4 Further Considerations
1735: 431: 390: 294: 280: 3952: 4609: 4487: 4467: 4441: 4344: 4308: 4273: 4241: 4163: 4134: 4105: 4076: 4047: 3920: 3885: 3853: 3791: 3768: 3609: 3074: 3041: 2922: 2832: 2691: 2661: 2655:
principal components." where I had "Each subsequent
2635: 2605: 2599:
This sentence OK, but they you have "Each subsequent
2236: 2206: 2180: 2151: 444: 1680:
Maybe I'll make something similar for this article.
491:
Any preferences, or anyone willing to work it over?
212:, a collaborative effort to improve the coverage of 107:, a collaborative effort to improve the coverage of 4802:Applications: Intelligence section doesn't belong. 4631: 4595: 4473: 4453: 4427: 4326: 4294: 4259: 4220: 4149: 4128:one goes through the mean/ center of gravity. But 4120: 4091: 4062: 4033: 3906: 3871: 3797: 3774: 3624: 3442:with eigendecomposition or SVD, as I mentioned in 3398: 3060: 3027: 2908: 2703: 2677: 2647: 2621: 2248: 2222: 2192: 2166: 480: 3570:I've been trying to fix this article since March. 1389: 4870:Knowledge level-5 vital articles in Mathematics 3603:Better. I'd avoid the formal mathy "points in 2135: 1657:Let's hear other Wikipedists' opinions, too. -- 1248:I'd like to replace the first paragraph with: 353:This page has archives. Sections older than 8: 4283: 4274: 4248: 4242: 4215: 4202: 4190: 4183: 4177: 4164: 4017: 4004: 3981: 3974: 3968: 3955: 3895: 3886: 3860: 3854: 3462:this revision - Principal Component Analysis 298:; for the discussion at that location, see 4228:. This example corresponds to the figure. 3647: 2585:If you say so. I already made the change. 2351: 734: 281:Non-linear iterative partial least squares 158: 69: 4623: 4612: 4611: 4608: 4584: 4574: 4563: 4562: 4552: 4542: 4531: 4530: 4523: 4513: 4502: 4488: 4486: 4466: 4440: 4419: 4414: 4403: 4402: 4395: 4384: 4383: 4376: 4366: 4355: 4343: 4313: 4312: 4307: 4272: 4240: 4209: 4193: 4171: 4162: 4136: 4135: 4133: 4107: 4106: 4104: 4078: 4077: 4075: 4049: 4048: 4046: 4011: 3984: 3962: 3947: 3938: 3921: 3919: 3884: 3852: 3790: 3767: 3616: 3612: 3611: 3608: 3387: 3382: 3366: 3361: 3342: 3332: 3327: 3311: 3292: 3282: 3277: 3267: 3254: 3244: 3239: 3229: 3213: 3208: 3186: 3170: 3151: 3141: 3128: 3118: 3105: 3100: 3084: 3079: 3073: 3046: 3040: 3004: 2988: 2969: 2959: 2946: 2936: 2921: 2891: 2872: 2859: 2837: 2831: 2690: 2666: 2660: 2634: 2610: 2604: 2235: 2211: 2205: 2179: 2158: 2154: 2153: 2150: 472: 467: 461: 453: 445: 443: 4821:Notation for the SVD uses W instead of V 3827:Please add a description in the article 1552:This article is about PCA, and PCA only. 4860:Knowledge vital articles in Mathematics 2139:two, three, or higher dimensional space 363:when more than 10 sections are present. 160: 71: 30: 4875:C-Class vital articles in Mathematics 4302:. Here the first principal component 7: 4295:{\displaystyle \langle xy\rangle =0} 3907:{\displaystyle \langle xy\rangle =0} 2373:please; don't lead with a tutorial. 206:This article is within the scope of 101:This article is within the scope of 4885:High-importance Statistics articles 4603:while first principal component is 4481:singular values, (column) mean is 4260:{\displaystyle \langle y\rangle =0} 3872:{\displaystyle \langle y\rangle =0} 1797:How about "In data analysis" then? 640:when you are editing. Thank you. 60:It is of interest to the following 4900:High-priority mathematics articles 4327:{\displaystyle \propto {\hat {y}}} 25: 18:Talk:Principal components analysis 2143:distance from a point to the line 1255:, and several related procedures 357:may be automatically archived by 226:Knowledge:WikiProject Mathematics 4855:Knowledge level-5 vital articles 3625:{\displaystyle \mathbb {R} ^{n}} 2408:Matlab's "PCA Residual" function 2167:{\displaystyle \mathbb {R} ^{n}} 2145:. For a collection of points in 2137:Given a collection of points in 468: 462: 454: 446: 314: 271: 229:Template:WikiProject Mathematics 193: 183: 162: 121:Knowledge:WikiProject Statistics 94: 73: 40: 31: 4890:WikiProject Statistics articles 4672:Much too complex for most users 246:This article has been rated as 141:This article has been rated as 124:Template:WikiProject Statistics 4865:C-Class level-5 vital articles 4781:18:39, 23 September 2022 (UTC) 4758:18:41, 23 September 2022 (UTC) 4656:) 08:45, 5 December 2021 (UTC) 4632:{\displaystyle {\hat {u}}_{1}} 4617: 4581: 4568: 4558: 4536: 4408: 4389: 4318: 4141: 4112: 4083: 4054: 3818:22:48, 26 September 2021 (UTC) 3198: 3111: 3016: 2929: 2903: 2852: 1: 4796:00:22, 16 December 2022 (UTC) 1502:04:16, 1 September 2020 (UTC) 1390:#The_z-score_bit_is_messed_up 1268:Any suggestions/objections? 1229:Maybe something like this: " 515:16:28, 13 February 2018 (UTC) 501:06:58, 12 February 2018 (UTC) 421:01:33, 13 November 2019 (UTC) 220:and see a list of open tasks. 115:and see a list of open tasks. 4895:C-Class mathematics articles 4836:00:15, 19 January 2023 (UTC) 4677:bamboozled by this article. 4667:08:47, 3 December 2021 (UTC) 4157:is the largest component if 3837:23:02, 8 November 2021 (UTC) 3444:Principal Component Analysis 1849:lines and planes of best fit 1577:principal component analysis 1257:Principal component analysis 426:The z-score bit is messed up 290:Principal component analysis 4880:C-Class Statistics articles 4235:A 2d cloud of points with 3750:17:27, 28 August 2020 (UTC) 3723:20:08, 26 August 2020 (UTC) 3709:04:27, 26 August 2020 (UTC) 3695:01:03, 26 August 2020 (UTC) 3681:00:37, 26 August 2020 (UTC) 3666:00:01, 26 August 2020 (UTC) 3642:22:35, 25 August 2020 (UTC) 3598:22:27, 25 August 2020 (UTC) 3566:16:12, 24 August 2020 (UTC) 3552:15:04, 24 August 2020 (UTC) 3521:22:17, 25 August 2020 (UTC) 3507:22:01, 25 August 2020 (UTC) 3488:21:17, 25 August 2020 (UTC) 3474:17:50, 25 August 2020 (UTC) 3456:01:07, 25 August 2020 (UTC) 3430:00:12, 25 August 2020 (UTC) 3415:16:31, 24 August 2020 (UTC) 2822:16:12, 24 August 2020 (UTC) 2807:14:08, 24 August 2020 (UTC) 2788:05:03, 24 August 2020 (UTC) 2773:04:31, 24 August 2020 (UTC) 2751:04:31, 24 August 2020 (UTC) 2737:03:48, 24 August 2020 (UTC) 2722:23:08, 23 August 2020 (UTC) 2595:20:18, 23 August 2020 (UTC) 2581:20:16, 23 August 2020 (UTC) 2566:18:02, 23 August 2020 (UTC) 2547:22:56, 22 August 2020 (UTC) 2513:00:58, 23 August 2020 (UTC) 2495:22:47, 22 August 2020 (UTC) 2474:21:57, 22 August 2020 (UTC) 2459:21:24, 22 August 2020 (UTC) 2444:20:36, 22 August 2020 (UTC) 2428:20:26, 22 August 2020 (UTC) 2383:20:13, 22 August 2020 (UTC) 2338:18:41, 22 August 2020 (UTC) 2320:17:46, 22 August 2020 (UTC) 2305:16:52, 22 August 2020 (UTC) 2290:16:38, 22 August 2020 (UTC) 2275:13:22, 22 August 2020 (UTC) 2116:17:27, 22 August 2020 (UTC) 2097:04:16, 22 August 2020 (UTC) 2082:23:00, 21 August 2020 (UTC) 2064:20:52, 21 August 2020 (UTC) 2048:16:07, 20 August 2020 (UTC) 2033:15:20, 20 August 2020 (UTC) 1994:03:07, 20 August 2020 (UTC) 1975:22:56, 19 August 2020 (UTC) 1957:22:48, 19 August 2020 (UTC) 1943:18:25, 19 August 2020 (UTC) 1924:14:30, 19 August 2020 (UTC) 1909:17:03, 18 August 2020 (UTC) 1880:18:50, 17 August 2020 (UTC) 1861:20:19, 16 August 2020 (UTC) 1837:01:33, 16 August 2020 (UTC) 1822:16:49, 15 August 2020 (UTC) 1807:06:44, 15 August 2020 (UTC) 1793:18:22, 14 August 2020 (UTC) 1778:00:27, 13 August 2020 (UTC) 1763:22:16, 12 August 2020 (UTC) 1748:20:46, 12 August 2020 (UTC) 1729:08:50, 12 August 2020 (UTC) 1403:19:15, 23 August 2020 (UTC) 542:18:10, 23 August 2020 (UTC) 4916: 4150:{\displaystyle {\hat {y}}} 4121:{\displaystyle {\hat {x}}} 4092:{\displaystyle {\hat {x}}} 4063:{\displaystyle {\hat {y}}} 1690:20:14, 3 August 2020 (UTC) 1667:19:49, 3 August 2020 (UTC) 1642:18:21, 3 August 2020 (UTC) 1623:17:59, 3 August 2020 (UTC) 1604:17:46, 3 August 2020 (UTC) 1540:17:27, 3 August 2020 (UTC) 1521:17:22, 3 August 2020 (UTC) 1480:17:12, 3 August 2020 (UTC) 1443:15:24, 3 August 2020 (UTC) 1384:15:49, 24 April 2020 (UTC) 1225:15:03, 30 March 2020 (UTC) 1195:23:32, 29 March 2020 (UTC) 906:21:33, 28 March 2020 (UTC) 892:13:01, 28 March 2020 (UTC) 873:12:49, 28 March 2020 (UTC) 858:12:20, 28 March 2020 (UTC) 820:12:28, 28 March 2020 (UTC) 802:11:49, 28 March 2020 (UTC) 788:23:12, 27 March 2020 (UTC) 769:22:55, 27 March 2020 (UTC) 753:22:49, 27 March 2020 (UTC) 724:22:33, 27 March 2020 (UTC) 710:21:56, 27 March 2020 (UTC) 677:21:43, 27 March 2020 (UTC) 659:21:40, 27 March 2020 (UTC) 608:21:15, 27 March 2020 (UTC) 593:19:29, 27 March 2020 (UTC) 579:12:32, 27 March 2020 (UTC) 563:22:00, 26 March 2020 (UTC) 393:-- I hope I got them all! 4816:19:26, 11 June 2022 (UTC) 4454:{\displaystyle M\times N} 3762:The Details section uses 1423:08:28, 24 July 2020 (UTC) 1354:12:24, 7 April 2020 (UTC) 1339:13:58, 5 April 2020 (UTC) 1320:13:48, 5 April 2020 (UTC) 1305:12:10, 5 April 2020 (UTC) 1278:14:29, 4 April 2020 (UTC) 1244:14:34, 3 April 2020 (UTC) 406:19:09, 17 June 2013 (UTC) 245: 178: 140: 89: 68: 4729:People have no business 2909:{\displaystyle P_{i-1}=} 917: 252:project's priority scale 4801: 4743:10:09, 6 May 2022 (UTC) 4725:07:29, 6 May 2022 (UTC) 4707:10:19, 5 May 2022 (UTC) 4691:07:28, 5 May 2022 (UTC) 3823:Biplots and scree plots 3061:{\displaystyle P_{i-1}} 2193:{\displaystyle i\leq n} 209:WikiProject Mathematics 4850:C-Class vital articles 4633: 4597: 4518: 4475: 4455: 4429: 4371: 4335: 4328: 4296: 4261: 4222: 4151: 4122: 4093: 4064: 4035: 3908: 3873: 3799: 3776: 3626: 3400: 3062: 3029: 2910: 2705: 2679: 2678:{\displaystyle i^{th}} 2649: 2623: 2622:{\displaystyle i^{th}} 2258: 2250: 2224: 2223:{\displaystyle i^{th}} 2200:, a direction for the 2194: 2168: 1388:See the section above 482: 360:Lowercase sigmabot III 104:WikiProject Statistics 4634: 4598: 4498: 4476: 4456: 4430: 4351: 4338:More generally, for 4329: 4297: 4262: 4234: 4223: 4152: 4123: 4094: 4065: 4036: 3914:. For this example, 3909: 3874: 3800: 3777: 3627: 3537:I see my link to the 3401: 3063: 3030: 2911: 2826:Yes, Bishop does. If 2706: 2680: 2650: 2624: 2251: 2225: 2195: 2169: 483: 278:The contents of the 47:level-5 vital article 4607: 4485: 4465: 4439: 4342: 4306: 4271: 4239: 4161: 4132: 4103: 4074: 4045: 3918: 3883: 3851: 3805:and replace i by k. 3789: 3766: 3607: 3495:principal components 3436:recursive definition 3072: 3039: 2920: 2830: 2689: 2659: 2633: 2603: 2400:errors and residuals 2327:Gram–Schmidt process 2234: 2204: 2178: 2149: 2103:errors and residuals 1253:Principal Components 442: 232:mathematics articles 4424: 3392: 3371: 3337: 3287: 3249: 3218: 3110: 3089: 2704:{\displaystyle i-1} 2648:{\displaystyle i-1} 2249:{\displaystyle i-1} 1591:of each dimension." 1289:Pythagorean theorem 127:Statistics articles 4629: 4593: 4557: 4471: 4451: 4425: 4401: 4336: 4324: 4292: 4257: 4218: 4147: 4118: 4089: 4060: 4031: 4022: 3904: 3869: 3795: 3772: 3622: 3396: 3378: 3357: 3323: 3273: 3235: 3204: 3096: 3075: 3058: 3025: 2906: 2701: 2675: 2645: 2619: 2256:best-fitting lines 2246: 2220: 2190: 2164: 1285:Mean squared error 478: 378:-- i.e. with each 201:Mathematics portal 56:content assessment 4644:comment added by 4620: 4571: 4548: 4539: 4496: 4474:{\displaystyle P} 4411: 4392: 4321: 4144: 4115: 4086: 4057: 3929: 3798:{\displaystyle l} 3775:{\displaystyle l} 3668: 3652:comment added by 2414:that talks about 2369: 2356:comment added by 1583:) is a method to 755: 739:comment added by 367: 366: 306: 305: 266: 265: 262: 261: 258: 257: 157: 156: 153: 152: 16:(Redirected from 4907: 4657: 4638: 4636: 4635: 4630: 4628: 4627: 4622: 4621: 4613: 4602: 4600: 4599: 4594: 4589: 4588: 4579: 4578: 4573: 4572: 4564: 4556: 4547: 4546: 4541: 4540: 4532: 4528: 4527: 4517: 4512: 4497: 4489: 4480: 4478: 4477: 4472: 4460: 4458: 4457: 4452: 4434: 4432: 4431: 4426: 4423: 4418: 4413: 4412: 4404: 4400: 4399: 4394: 4393: 4385: 4381: 4380: 4370: 4365: 4333: 4331: 4330: 4325: 4323: 4322: 4314: 4301: 4299: 4298: 4293: 4266: 4264: 4263: 4258: 4227: 4225: 4224: 4219: 4214: 4213: 4198: 4197: 4176: 4175: 4156: 4154: 4153: 4148: 4146: 4145: 4137: 4127: 4125: 4124: 4119: 4117: 4116: 4108: 4098: 4096: 4095: 4090: 4088: 4087: 4079: 4069: 4067: 4066: 4061: 4059: 4058: 4050: 4040: 4038: 4037: 4032: 4027: 4026: 4016: 4015: 3989: 3988: 3967: 3966: 3943: 3942: 3930: 3922: 3913: 3911: 3910: 3905: 3878: 3876: 3875: 3870: 3804: 3802: 3801: 3796: 3781: 3779: 3778: 3773: 3631: 3629: 3628: 3623: 3621: 3620: 3615: 3405: 3403: 3402: 3397: 3391: 3386: 3370: 3365: 3353: 3352: 3336: 3331: 3322: 3321: 3297: 3296: 3286: 3281: 3272: 3271: 3259: 3258: 3248: 3243: 3234: 3233: 3217: 3212: 3197: 3196: 3181: 3180: 3156: 3155: 3146: 3145: 3133: 3132: 3123: 3122: 3109: 3104: 3088: 3083: 3067: 3065: 3064: 3059: 3057: 3056: 3034: 3032: 3031: 3026: 3015: 3014: 2999: 2998: 2974: 2973: 2964: 2963: 2951: 2950: 2941: 2940: 2915: 2913: 2912: 2907: 2902: 2901: 2877: 2876: 2864: 2863: 2848: 2847: 2710: 2708: 2707: 2702: 2684: 2682: 2681: 2676: 2674: 2673: 2654: 2652: 2651: 2646: 2628: 2626: 2625: 2620: 2618: 2617: 2532: 2255: 2253: 2252: 2247: 2229: 2227: 2226: 2221: 2219: 2218: 2199: 2197: 2196: 2191: 2173: 2171: 2170: 2165: 2163: 2162: 2157: 1652: 1550: 1490: 1453: 1176: 1173: 1170: 1167: 1164: 1161: 1158: 1155: 1152: 1149: 1146: 1143: 1140: 1137: 1134: 1131: 1128: 1125: 1122: 1119: 1116: 1113: 1110: 1107: 1104: 1101: 1098: 1095: 1092: 1089: 1086: 1083: 1080: 1077: 1074: 1071: 1068: 1065: 1062: 1059: 1056: 1053: 1050: 1047: 1044: 1041: 1038: 1035: 1032: 1029: 1026: 1023: 1020: 1017: 1014: 1011: 1008: 1005: 1002: 999: 996: 993: 990: 987: 984: 981: 978: 975: 972: 969: 966: 963: 960: 957: 954: 951: 948: 945: 942: 939: 936: 933: 930: 927: 924: 921: 657: 638:content policies 618: 487: 485: 484: 479: 477: 476: 471: 465: 457: 449: 362: 346: 318: 310: 297: 275: 274: 268: 234: 233: 230: 227: 224: 203: 198: 197: 187: 180: 179: 174: 166: 159: 147:importance scale 129: 128: 125: 122: 119: 98: 91: 90: 85: 77: 70: 53: 44: 43: 36: 35: 27: 21: 4915: 4914: 4910: 4909: 4908: 4906: 4905: 4904: 4840: 4839: 4823: 4804: 4773:Twotontwentyone 4750:Twotontwentyone 4674: 4639: 4610: 4605: 4604: 4580: 4561: 4529: 4519: 4483: 4482: 4463: 4462: 4437: 4436: 4382: 4372: 4340: 4339: 4304: 4303: 4269: 4268: 4237: 4236: 4205: 4189: 4167: 4159: 4158: 4130: 4129: 4101: 4100: 4072: 4071: 4043: 4042: 4021: 4020: 4007: 4002: 3996: 3995: 3990: 3980: 3958: 3948: 3934: 3916: 3915: 3881: 3880: 3849: 3848: 3844: 3825: 3787: 3786: 3764: 3763: 3757: 3610: 3605: 3604: 3539:change of basis 3338: 3307: 3288: 3263: 3250: 3225: 3182: 3166: 3147: 3137: 3124: 3114: 3070: 3069: 3042: 3037: 3036: 3000: 2984: 2965: 2955: 2942: 2932: 2918: 2917: 2887: 2868: 2855: 2833: 2828: 2827: 2687: 2686: 2662: 2657: 2656: 2631: 2630: 2606: 2601: 2600: 2526: 2232: 2231: 2207: 2202: 2201: 2176: 2175: 2152: 2147: 2146: 2072:, please help. 2008: 2006:Arbitrary break 1646: 1544: 1484: 1447: 1410: 1408:Entry paragraph 1178: 1177: 1174: 1171: 1168: 1165: 1162: 1159: 1156: 1153: 1150: 1147: 1144: 1141: 1138: 1135: 1132: 1129: 1126: 1123: 1120: 1117: 1114: 1111: 1108: 1105: 1102: 1099: 1096: 1093: 1090: 1087: 1084: 1081: 1078: 1075: 1072: 1069: 1066: 1063: 1060: 1057: 1054: 1051: 1048: 1045: 1042: 1039: 1036: 1033: 1030: 1027: 1024: 1021: 1018: 1015: 1012: 1009: 1006: 1003: 1000: 997: 994: 991: 988: 985: 982: 979: 976: 973: 970: 967: 964: 961: 958: 955: 952: 949: 946: 943: 940: 937: 934: 931: 928: 925: 922: 919: 641: 612: 550: 466: 440: 439: 428: 372: 358: 347: 341: 323: 293: 272: 231: 228: 225: 222: 221: 199: 192: 172: 143:High-importance 126: 123: 120: 117: 116: 84:High‑importance 83: 54:on Knowledge's 51: 41: 23: 22: 15: 12: 11: 5: 4913: 4911: 4903: 4902: 4897: 4892: 4887: 4882: 4877: 4872: 4867: 4862: 4857: 4852: 4842: 4841: 4828:31.205.215.152 4822: 4819: 4803: 4800: 4799: 4798: 4783: 4766:I agree with @ 4764: 4763: 4762: 4761: 4760: 4745: 4712: 4673: 4670: 4626: 4619: 4616: 4592: 4587: 4583: 4577: 4570: 4567: 4560: 4555: 4551: 4545: 4538: 4535: 4526: 4522: 4516: 4511: 4508: 4505: 4501: 4495: 4492: 4470: 4450: 4447: 4444: 4422: 4417: 4410: 4407: 4398: 4391: 4388: 4379: 4375: 4369: 4364: 4361: 4358: 4354: 4350: 4347: 4320: 4317: 4311: 4291: 4288: 4285: 4282: 4279: 4276: 4256: 4253: 4250: 4247: 4244: 4217: 4212: 4208: 4204: 4201: 4196: 4192: 4188: 4185: 4182: 4179: 4174: 4170: 4166: 4143: 4140: 4114: 4111: 4085: 4082: 4056: 4053: 4041:PCs are along 4030: 4025: 4019: 4014: 4010: 4006: 4003: 4001: 3998: 3997: 3994: 3991: 3987: 3983: 3979: 3976: 3973: 3970: 3965: 3961: 3957: 3954: 3953: 3951: 3946: 3941: 3937: 3933: 3928: 3925: 3903: 3900: 3897: 3894: 3891: 3888: 3868: 3865: 3862: 3859: 3856: 3843: 3840: 3824: 3821: 3794: 3771: 3756: 3753: 3738: 3736: 3735: 3734: 3733: 3732: 3731: 3730: 3729: 3728: 3727: 3726: 3725: 3619: 3614: 3586: 3585: 3584: 3583: 3582: 3581: 3580: 3579: 3578: 3577: 3576: 3575: 3574: 3573: 3572: 3571: 3535: 3534: 3533: 3532: 3531: 3530: 3529: 3528: 3527: 3526: 3525: 3524: 3523: 3509: 3458: 3395: 3390: 3385: 3381: 3377: 3374: 3369: 3364: 3360: 3356: 3351: 3348: 3345: 3341: 3335: 3330: 3326: 3320: 3317: 3314: 3310: 3306: 3303: 3300: 3295: 3291: 3285: 3280: 3276: 3270: 3266: 3262: 3257: 3253: 3247: 3242: 3238: 3232: 3228: 3224: 3221: 3216: 3211: 3207: 3203: 3200: 3195: 3192: 3189: 3185: 3179: 3176: 3173: 3169: 3165: 3162: 3159: 3154: 3150: 3144: 3140: 3136: 3131: 3127: 3121: 3117: 3113: 3108: 3103: 3099: 3095: 3092: 3087: 3082: 3078: 3055: 3052: 3049: 3045: 3024: 3021: 3018: 3013: 3010: 3007: 3003: 2997: 2994: 2991: 2987: 2983: 2980: 2977: 2972: 2968: 2962: 2958: 2954: 2949: 2945: 2939: 2935: 2931: 2928: 2925: 2905: 2900: 2897: 2894: 2890: 2886: 2883: 2880: 2875: 2871: 2867: 2862: 2858: 2854: 2851: 2846: 2843: 2840: 2836: 2753: 2739: 2700: 2697: 2694: 2672: 2669: 2665: 2644: 2641: 2638: 2616: 2613: 2609: 2524: 2523: 2522: 2521: 2520: 2519: 2518: 2517: 2516: 2515: 2396: 2395: 2394: 2393: 2392: 2391: 2390: 2389: 2388: 2387: 2386: 2385: 2340: 2292: 2259: 2245: 2242: 2239: 2217: 2214: 2210: 2189: 2186: 2183: 2161: 2156: 2133: 2122: 2121: 2120: 2119: 2118: 2007: 2004: 2003: 2002: 2001: 2000: 1999: 1998: 1997: 1996: 1926: 1911: 1896: 1895: 1894: 1893: 1892: 1891: 1890: 1889: 1888: 1887: 1886: 1885: 1884: 1883: 1882: 1824: 1705: 1704: 1703: 1702: 1701: 1700: 1699: 1698: 1697: 1696: 1695: 1694: 1693: 1692: 1654: 1625: 1592: 1569: 1556: 1553: 1523: 1504: 1468: 1456: 1409: 1406: 1363: 1362: 1361: 1360: 1359: 1358: 1357: 1356: 918: 913: 912: 911: 910: 909: 908: 845: 844: 843: 842: 841: 840: 839: 838: 837: 836: 835: 834: 833: 832: 831: 830: 829: 828: 827: 826: 825: 824: 823: 822: 694: 549: 546: 545: 544: 475: 470: 464: 460: 456: 452: 448: 427: 424: 413:138.51.113.194 371: 368: 365: 364: 352: 349: 348: 343: 339: 337: 334: 333: 325: 324: 319: 313: 304: 303: 276: 264: 263: 260: 259: 256: 255: 244: 238: 237: 235: 218:the discussion 205: 204: 188: 176: 175: 167: 155: 154: 151: 150: 139: 133: 132: 130: 113:the discussion 99: 87: 86: 78: 66: 65: 59: 37: 24: 14: 13: 10: 9: 6: 4: 3: 2: 4912: 4901: 4898: 4896: 4893: 4891: 4888: 4886: 4883: 4881: 4878: 4876: 4873: 4871: 4868: 4866: 4863: 4861: 4858: 4856: 4853: 4851: 4848: 4847: 4845: 4838: 4837: 4833: 4829: 4820: 4818: 4817: 4813: 4809: 4797: 4793: 4789: 4784: 4782: 4778: 4774: 4769: 4768:Limit-theorem 4765: 4759: 4755: 4751: 4746: 4744: 4740: 4736: 4735:Limit-theorem 4732: 4728: 4727: 4726: 4722: 4718: 4713: 4710: 4709: 4708: 4704: 4700: 4699:Limit-theorem 4695: 4694: 4693: 4692: 4688: 4684: 4678: 4671: 4669: 4668: 4664: 4660: 4655: 4651: 4647: 4643: 4624: 4614: 4590: 4585: 4575: 4565: 4553: 4549: 4543: 4533: 4524: 4520: 4514: 4509: 4506: 4503: 4499: 4493: 4490: 4468: 4448: 4445: 4442: 4420: 4415: 4405: 4396: 4386: 4377: 4373: 4367: 4362: 4359: 4356: 4352: 4348: 4345: 4315: 4309: 4289: 4286: 4280: 4277: 4254: 4251: 4245: 4233: 4229: 4210: 4206: 4199: 4194: 4186: 4180: 4172: 4168: 4138: 4109: 4080: 4051: 4028: 4023: 4012: 4008: 3999: 3992: 3985: 3977: 3971: 3963: 3959: 3949: 3944: 3939: 3935: 3931: 3926: 3923: 3901: 3898: 3892: 3889: 3866: 3863: 3857: 3841: 3839: 3838: 3834: 3830: 3822: 3820: 3819: 3815: 3811: 3806: 3792: 3783: 3769: 3760: 3754: 3752: 3751: 3747: 3743: 3724: 3720: 3716: 3712: 3711: 3710: 3706: 3702: 3698: 3697: 3696: 3692: 3688: 3684: 3683: 3682: 3678: 3674: 3670: 3669: 3667: 3663: 3659: 3655: 3651: 3645: 3644: 3643: 3639: 3635: 3617: 3602: 3601: 3600: 3599: 3595: 3591: 3569: 3568: 3567: 3563: 3559: 3555: 3554: 3553: 3549: 3545: 3540: 3536: 3522: 3518: 3514: 3510: 3508: 3504: 3500: 3496: 3491: 3490: 3489: 3485: 3481: 3477: 3476: 3475: 3471: 3467: 3463: 3459: 3457: 3453: 3449: 3445: 3441: 3437: 3433: 3432: 3431: 3427: 3423: 3418: 3417: 3416: 3412: 3408: 3393: 3388: 3383: 3379: 3375: 3372: 3367: 3362: 3358: 3354: 3349: 3346: 3343: 3339: 3333: 3328: 3324: 3318: 3315: 3312: 3308: 3304: 3301: 3298: 3293: 3289: 3283: 3278: 3274: 3268: 3264: 3260: 3255: 3251: 3245: 3240: 3236: 3230: 3226: 3222: 3219: 3214: 3209: 3205: 3201: 3193: 3190: 3187: 3183: 3177: 3174: 3171: 3167: 3163: 3160: 3157: 3152: 3148: 3142: 3138: 3134: 3129: 3125: 3119: 3115: 3106: 3101: 3097: 3093: 3090: 3085: 3080: 3076: 3053: 3050: 3047: 3043: 3022: 3019: 3011: 3008: 3005: 3001: 2995: 2992: 2989: 2985: 2981: 2978: 2975: 2970: 2966: 2960: 2956: 2952: 2947: 2943: 2937: 2933: 2926: 2923: 2898: 2895: 2892: 2888: 2884: 2881: 2878: 2873: 2869: 2865: 2860: 2856: 2849: 2844: 2841: 2838: 2834: 2825: 2824: 2823: 2819: 2815: 2810: 2809: 2808: 2804: 2800: 2795: 2791: 2790: 2789: 2785: 2781: 2776: 2775: 2774: 2770: 2766: 2762: 2758: 2754: 2752: 2748: 2744: 2740: 2738: 2734: 2730: 2725: 2724: 2723: 2719: 2715: 2698: 2695: 2692: 2670: 2667: 2663: 2642: 2639: 2636: 2614: 2611: 2607: 2598: 2597: 2596: 2592: 2588: 2584: 2583: 2582: 2578: 2574: 2570: 2569: 2568: 2567: 2563: 2559: 2554: 2549: 2548: 2544: 2540: 2536: 2530: 2514: 2510: 2506: 2502: 2498: 2497: 2496: 2492: 2488: 2483: 2482:statistician. 2477: 2476: 2475: 2471: 2467: 2462: 2461: 2460: 2456: 2452: 2447: 2446: 2445: 2441: 2437: 2432: 2431: 2430: 2429: 2425: 2421: 2417: 2413: 2409: 2405: 2401: 2384: 2380: 2376: 2371: 2370: 2367: 2363: 2359: 2355: 2349: 2345: 2341: 2339: 2335: 2331: 2328: 2323: 2322: 2321: 2317: 2313: 2308: 2307: 2306: 2302: 2298: 2293: 2291: 2287: 2283: 2278: 2277: 2276: 2272: 2268: 2264: 2260: 2257: 2243: 2240: 2237: 2215: 2212: 2208: 2187: 2184: 2181: 2159: 2144: 2140: 2134: 2131: 2130:best fit line 2127: 2123: 2117: 2113: 2109: 2104: 2100: 2099: 2098: 2094: 2090: 2085: 2084: 2083: 2079: 2075: 2071: 2067: 2066: 2065: 2061: 2057: 2052: 2051: 2050: 2049: 2045: 2041: 2035: 2034: 2030: 2026: 2021: 2017: 2013: 2005: 1995: 1991: 1987: 1982: 1978: 1977: 1976: 1972: 1968: 1964: 1960: 1959: 1958: 1954: 1950: 1946: 1945: 1944: 1940: 1936: 1931: 1927: 1925: 1921: 1917: 1912: 1910: 1906: 1902: 1897: 1881: 1877: 1873: 1868: 1864: 1863: 1862: 1858: 1854: 1850: 1844: 1840: 1839: 1838: 1834: 1830: 1825: 1823: 1819: 1815: 1810: 1809: 1808: 1804: 1800: 1796: 1795: 1794: 1790: 1786: 1781: 1780: 1779: 1775: 1771: 1766: 1765: 1764: 1760: 1756: 1751: 1750: 1749: 1745: 1741: 1737: 1733: 1732: 1731: 1730: 1726: 1722: 1718: 1714: 1710: 1691: 1687: 1683: 1679: 1674: 1670: 1669: 1668: 1664: 1660: 1655: 1650: 1645: 1644: 1643: 1639: 1635: 1630: 1626: 1624: 1620: 1616: 1611: 1607: 1606: 1605: 1601: 1597: 1593: 1590: 1586: 1582: 1578: 1574: 1570: 1567: 1564: 1561: 1557: 1554: 1548: 1543: 1542: 1541: 1537: 1533: 1528: 1524: 1522: 1518: 1514: 1509: 1505: 1503: 1499: 1495: 1494:Michael Hardy 1488: 1483: 1482: 1481: 1477: 1473: 1469: 1467: 1464: 1461: 1457: 1451: 1446: 1445: 1444: 1440: 1436: 1431: 1427: 1426: 1425: 1424: 1420: 1416: 1407: 1405: 1404: 1400: 1396: 1391: 1386: 1385: 1381: 1377: 1372: 1368: 1367:Limit-theorem 1355: 1351: 1347: 1342: 1341: 1340: 1336: 1332: 1327: 1323: 1322: 1321: 1317: 1313: 1308: 1307: 1306: 1302: 1298: 1294: 1290: 1286: 1282: 1281: 1280: 1279: 1275: 1271: 1266: 1264: 1262: 1258: 1254: 1246: 1245: 1241: 1237: 1233: 1227: 1226: 1222: 1218: 1213: 1212: 1207: 1206: 1200: 1197: 1196: 1192: 1188: 1182: 916: 907: 903: 899: 898:Limit-theorem 895: 894: 893: 889: 885: 880: 879:Limit-theorem 876: 875: 874: 870: 866: 865:Limit-theorem 862: 861: 860: 859: 855: 851: 821: 817: 813: 809: 808:Limit-theorem 805: 804: 803: 799: 795: 794:Limit-theorem 791: 790: 789: 785: 781: 776: 775:Limit-theorem 772: 771: 770: 766: 762: 761:Limit-theorem 757: 756: 754: 750: 746: 742: 738: 731: 730:Limit-theorem 727: 726: 725: 721: 717: 716:Limit-theorem 713: 712: 711: 707: 703: 699: 698:Awesome Aasim 695: 692: 688: 684: 683:Limit-theorem 680: 679: 678: 674: 670: 669:Limit-theorem 666: 665:Awesome Aasim 662: 661: 660: 656: 653: 650: 647: 644: 639: 635: 631: 626: 622: 621:Limit-theorem 619:I think what 616: 611: 610: 609: 605: 601: 596: 595: 594: 590: 586: 585:Limit-theorem 582: 581: 580: 576: 572: 567: 566: 565: 564: 560: 556: 547: 543: 539: 535: 531: 527: 523: 519: 518: 517: 516: 512: 508: 503: 502: 498: 494: 489: 473: 458: 450: 435: 433: 425: 423: 422: 418: 414: 408: 407: 403: 399: 394: 392: 387: 385: 381: 377: 376:data matrices 369: 361: 356: 351: 350: 336: 335: 332: 331: 327: 326: 322: 317: 312: 311: 308: 301: 300:its talk page 296: 291: 287: 283: 282: 277: 270: 269: 253: 249: 248:High-priority 243: 240: 239: 236: 219: 215: 211: 210: 202: 196: 191: 189: 186: 182: 181: 177: 173:High‑priority 171: 168: 165: 161: 148: 144: 138: 135: 134: 131: 114: 110: 106: 105: 100: 97: 93: 92: 88: 82: 79: 76: 72: 67: 63: 57: 49: 48: 38: 34: 29: 28: 19: 4824: 4805: 4730: 4679: 4675: 4640:— Preceding 4461:matrix with 4337: 3845: 3826: 3807: 3784: 3761: 3758: 3737: 3648:— Preceding 3587: 3494: 3439: 3435: 2760: 2756: 2550: 2525: 2501:metric space 2480: 2415: 2403: 2397: 2352:— Preceding 2136: 2036: 2009: 1866: 1848: 1842: 1706: 1580: 1576: 1411: 1387: 1364: 1267: 1260: 1256: 1252: 1249: 1247: 1230: 1228: 1214: 1210: 1208: 1203: 1201: 1198: 1183: 1179: 914: 846: 735:— Preceding 634:edit warring 551: 529: 525: 521: 504: 490: 436: 429: 409: 395: 388: 383: 379: 373: 354: 328: 320: 307: 279: 247: 207: 142: 102: 62:WikiProjects 45: 4697:necessary. 4099:, only the 2124:I agree in 1205:components. 1202:Knowledge: 295:its history 223:Mathematics 214:mathematics 170:Mathematics 4844:Categories 3434:This is a 2101:The terms 1713:Cowlinator 1573:statistics 1371:Polybios23 630:this essay 284:page were 118:Statistics 109:statistics 81:Statistics 4788:Koedinger 4717:Redabyss1 4683:Redabyss1 2412:this book 1736:this edit 1209:Pearson: 50:is rated 4808:Jmacwiki 4654:contribs 4642:unsigned 3829:Biggerj1 3715:Gufosowa 3701:Dicklyon 3673:Dicklyon 3662:contribs 3650:unsigned 3634:Dicklyon 3558:Dicklyon 3480:Dicklyon 3440:computed 3422:Dicklyon 2814:Dicklyon 2794:Dicklyon 2780:Dicklyon 2714:Dicklyon 2573:Dicklyon 2553:Dicklyon 2539:Dicklyon 2529:Gufosowa 2487:Dicklyon 2451:Dicklyon 2420:Dicklyon 2404:residual 2375:Dicklyon 2366:contribs 2354:unsigned 2348:Dicklyon 2344:Gufosowa 2312:Gufosowa 2267:Gufosowa 2126:Dicklyon 2089:Dicklyon 2056:Dicklyon 2016:Dicklyon 2012:Gufosowa 1981:Dicklyon 1963:Dicklyon 1949:Dicklyon 1930:Dicklyon 1843:best fit 1799:Dicklyon 1770:Dicklyon 1740:Dicklyon 1721:Gufosowa 1717:Dicklyon 1673:Gufosowa 1659:Gufosowa 1629:Gufosowa 1610:Gufosowa 1596:Gufosowa 1589:variance 1527:Gufosowa 1508:Gufosowa 1487:Gufosowa 1472:Gufosowa 1430:Gufosowa 1415:Gufosowa 1395:Dicklyon 1142:evc_corr 1082:corr_mat 1058:evc_corr 1052:evl_corr 1016:corrcoef 1004:corr_mat 749:contribs 737:unsigned 534:Dicklyon 507:Dicklyon 493:Dicklyon 355:365 days 321:Archives 4659:Sunejdk 4646:Sunejdk 2070:Agor153 2020:Agor153 1853:Agor153 1709:Agor153 1585:project 1455:anyway. 1331:Agor153 1297:Agor153 1166:evc_cov 1118:cov_mat 1094:evc_cov 1088:evl_cov 1028:cov_mat 250:on the 145:on the 52:C-class 3810:Ajkirk 1867:edit: 1566:link 3 1563:link 2 1560:link 1 1106:linalg 1070:linalg 1046:points 1022:points 965:random 953:points 920:import 625:policy 398:Jheald 384:column 286:merged 58:scale. 4731:using 3742:AP295 3687:AP295 3654:AP295 3590:AP295 3544:AP295 3513:AP295 3499:AP295 3466:AP295 3448:AP295 3407:AP295 2799:AP295 2765:AP295 2743:AP295 2729:AP295 2587:AP295 2558:AP295 2505:AP295 2466:AP295 2436:AP295 2410:; or 2358:AP295 2330:AP295 2297:AP295 2282:AP295 2263:AP295 2108:AP295 2074:AP295 2040:AP295 2025:AP295 1986:AP295 1967:AP295 1935:AP295 1916:AP295 1901:AP295 1872:AP295 1829:AP295 1814:AP295 1785:AP295 1755:AP295 1707:Dear 1682:AP295 1649:AP295 1634:AP295 1615:AP295 1547:AP295 1532:AP295 1513:AP295 1450:AP295 1435:AP295 1376:AP295 1346:AP295 1312:AP295 1270:AP295 1236:AP295 1217:AP295 1187:AP295 1160:round 1136:round 1124:print 1001:(,,]) 998:array 971:randn 941:range 923:numpy 884:AP295 850:AP295 812:AP295 780:AP295 741:AP295 702:AP295 687:AP295 615:AP295 600:AP295 571:AP295 555:AP295 288:into 39:This 4832:talk 4812:talk 4792:talk 4777:talk 4754:talk 4739:talk 4721:talk 4703:talk 4687:talk 4663:talk 4650:talk 4267:and 4200:< 4070:and 3879:and 3833:talk 3814:talk 3746:talk 3719:talk 3705:talk 3691:talk 3677:talk 3658:talk 3638:talk 3594:talk 3562:talk 3548:talk 3517:talk 3503:talk 3484:talk 3470:talk 3452:talk 3426:talk 3411:talk 2818:talk 2803:talk 2784:talk 2769:talk 2757:line 2747:talk 2733:talk 2718:talk 2591:talk 2577:talk 2562:talk 2543:talk 2509:talk 2491:talk 2470:talk 2455:talk 2440:talk 2424:talk 2379:talk 2362:talk 2334:talk 2316:talk 2301:talk 2286:talk 2271:talk 2174:and 2112:talk 2093:talk 2078:talk 2060:talk 2044:talk 2029:talk 1990:talk 1971:talk 1953:talk 1939:talk 1920:talk 1905:talk 1876:talk 1857:talk 1833:talk 1818:talk 1803:talk 1789:talk 1774:talk 1759:talk 1744:talk 1725:talk 1686:talk 1663:talk 1638:talk 1619:talk 1600:talk 1571:"In 1536:talk 1517:talk 1498:talk 1476:talk 1439:talk 1419:talk 1399:talk 1380:talk 1350:talk 1335:talk 1326:flag 1316:talk 1301:talk 1293:flag 1274:talk 1240:talk 1221:talk 1191:talk 983:1000 902:talk 888:talk 869:talk 854:talk 816:talk 798:talk 784:talk 765:talk 745:talk 720:talk 706:talk 691:talk 673:talk 604:talk 589:talk 575:talk 559:talk 538:talk 511:talk 497:talk 432:Here 417:talk 402:talk 391:diff 242:High 137:High 4435:an 2763:. 2398:Re 1581:PCA 1261:PCA 1112:eig 1076:eig 1040:cov 932:for 528:or 380:row 4846:: 4834:) 4814:) 4794:) 4779:) 4756:) 4741:) 4723:) 4705:) 4689:) 4665:) 4652:• 4618:^ 4569:^ 4550:∑ 4537:^ 4521:σ 4500:∑ 4446:× 4409:^ 4390:^ 4374:σ 4353:∑ 4319:^ 4310:∝ 4284:⟩ 4275:⟨ 4249:⟩ 4243:⟨ 4216:⟩ 4203:⟨ 4191:⟩ 4184:⟨ 4178:⟩ 4165:⟨ 4142:^ 4113:^ 4084:^ 4055:^ 4018:⟩ 4005:⟨ 3982:⟩ 3975:⟨ 3969:⟩ 3956:⟨ 3896:⟩ 3887:⟨ 3861:⟩ 3855:⟨ 3835:) 3816:) 3748:) 3721:) 3707:) 3693:) 3679:) 3664:) 3660:• 3640:) 3596:) 3564:) 3550:) 3519:) 3505:) 3486:) 3472:) 3454:) 3428:) 3413:) 3347:− 3316:− 3302:⋯ 3191:− 3175:− 3161:⋯ 3051:− 3009:− 2993:− 2979:⋯ 2896:− 2882:… 2842:− 2820:) 2805:) 2786:) 2771:) 2749:) 2735:) 2720:) 2696:− 2640:− 2593:) 2579:) 2564:) 2545:) 2511:) 2493:) 2472:) 2457:) 2442:) 2426:) 2402:, 2381:) 2368:) 2364:• 2336:) 2318:) 2310:-- 2303:) 2288:) 2273:) 2241:− 2185:≤ 2114:) 2095:) 2080:) 2062:) 2046:) 2031:) 1992:) 1973:) 1955:) 1941:) 1922:) 1907:) 1878:) 1859:) 1835:) 1820:) 1805:) 1791:) 1776:) 1761:) 1746:) 1727:) 1715:, 1711:, 1688:) 1665:) 1640:) 1621:) 1602:) 1575:, 1568:): 1538:) 1519:) 1500:) 1478:) 1441:) 1421:) 1401:) 1382:) 1352:) 1337:) 1318:) 1303:) 1276:) 1263:). 1242:) 1234:" 1223:) 1193:) 1175:)) 1154:np 1151:), 1130:np 1100:np 1064:np 1034:np 1010:np 992:np 959:np 950:): 938:in 929:np 926:as 904:) 890:) 871:) 856:) 818:) 800:) 786:) 767:) 751:) 747:• 722:) 708:) 675:) 606:) 591:) 577:) 561:) 540:) 513:) 499:) 459:− 419:) 404:) 4830:( 4810:( 4790:( 4775:( 4752:( 4737:( 4719:( 4701:( 4685:( 4661:( 4648:( 4625:1 4615:u 4591:, 4586:j 4582:) 4576:k 4566:v 4559:( 4554:j 4544:k 4534:u 4525:k 4515:P 4510:1 4507:= 4504:k 4494:N 4491:1 4469:P 4449:N 4443:M 4421:T 4416:k 4406:v 4397:k 4387:u 4378:k 4368:P 4363:1 4360:= 4357:k 4349:= 4346:X 4316:y 4290:0 4287:= 4281:y 4278:x 4255:0 4252:= 4246:y 4211:2 4207:y 4195:2 4187:x 4181:+ 4173:2 4169:x 4139:y 4110:x 4081:x 4052:y 4029:. 4024:] 4013:2 4009:y 4000:0 3993:0 3986:2 3978:x 3972:+ 3964:2 3960:x 3950:[ 3945:= 3940:T 3936:X 3932:X 3927:N 3924:1 3902:0 3899:= 3893:y 3890:x 3867:0 3864:= 3858:y 3831:( 3812:( 3793:l 3770:l 3744:( 3717:( 3703:( 3689:( 3675:( 3656:( 3636:( 3618:n 3613:R 3592:( 3560:( 3546:( 3515:( 3501:( 3482:( 3468:( 3450:( 3424:( 3409:( 3394:R 3389:T 3384:i 3380:p 3376:= 3373:R 3368:T 3363:i 3359:p 3355:+ 3350:1 3344:i 3340:p 3334:T 3329:i 3325:p 3319:1 3313:i 3309:a 3305:+ 3299:+ 3294:2 3290:p 3284:T 3279:i 3275:p 3269:2 3265:a 3261:+ 3256:1 3252:p 3246:T 3241:i 3237:p 3231:1 3227:a 3223:= 3220:R 3215:T 3210:i 3206:p 3202:+ 3199:) 3194:1 3188:i 3184:p 3178:1 3172:i 3168:a 3164:+ 3158:+ 3153:2 3149:p 3143:2 3139:a 3135:+ 3130:1 3126:p 3120:1 3116:a 3112:( 3107:T 3102:i 3098:p 3094:= 3091:v 3086:T 3081:i 3077:p 3054:1 3048:i 3044:P 3023:R 3020:+ 3017:) 3012:1 3006:i 3002:p 2996:1 2990:i 2986:a 2982:+ 2976:+ 2971:2 2967:p 2961:2 2957:a 2953:+ 2948:1 2944:p 2938:1 2934:a 2930:( 2927:= 2924:v 2904:] 2899:1 2893:i 2889:p 2885:, 2879:, 2874:2 2870:p 2866:, 2861:1 2857:p 2853:[ 2850:= 2845:1 2839:i 2835:P 2816:( 2801:( 2792:@ 2782:( 2767:( 2745:( 2731:( 2716:( 2699:1 2693:i 2671:h 2668:t 2664:i 2643:1 2637:i 2615:h 2612:t 2608:i 2589:( 2575:( 2560:( 2551:@ 2541:( 2531:: 2527:@ 2507:( 2489:( 2468:( 2453:( 2438:( 2422:( 2377:( 2360:( 2346:@ 2342:@ 2332:( 2314:( 2299:( 2284:( 2269:( 2244:1 2238:i 2216:h 2213:t 2209:i 2188:n 2182:i 2160:n 2155:R 2110:( 2091:( 2076:( 2058:( 2042:( 2027:( 2018:@ 2014:@ 2010:@ 1988:( 1979:@ 1969:( 1961:@ 1951:( 1937:( 1928:@ 1918:( 1903:( 1874:( 1855:( 1831:( 1816:( 1801:( 1787:( 1772:( 1757:( 1742:( 1723:( 1684:( 1671:@ 1661:( 1651:: 1647:@ 1636:( 1627:@ 1617:( 1608:@ 1598:( 1579:( 1549:: 1545:@ 1534:( 1525:@ 1515:( 1506:@ 1496:( 1489:: 1485:@ 1474:( 1466:3 1463:2 1460:1 1452:: 1448:@ 1437:( 1428:@ 1417:( 1397:( 1378:( 1369:@ 1365:@ 1348:( 1333:( 1314:( 1299:( 1272:( 1259:( 1238:( 1219:( 1189:( 1172:3 1169:, 1163:( 1157:. 1148:3 1145:, 1139:( 1133:. 1127:( 1121:) 1115:( 1109:. 1103:. 1097:= 1091:, 1085:) 1079:( 1073:. 1067:. 1061:= 1055:, 1049:) 1043:( 1037:. 1031:= 1025:) 1019:( 1013:. 1007:= 995:. 989:* 986:) 980:, 977:3 974:( 968:. 962:. 956:= 947:5 944:( 935:i 900:( 886:( 877:@ 867:( 852:( 814:( 806:@ 796:( 782:( 773:@ 763:( 743:( 728:@ 718:( 704:( 696:@ 693:) 689:( 681:@ 671:( 663:@ 655:m 652:i 649:s 646:a 643:A 617:: 613:@ 602:( 587:( 573:( 557:( 536:( 530:Z 526:B 522:Z 509:( 495:( 474:T 469:u 463:h 455:X 451:= 447:B 415:( 400:( 330:1 302:. 254:. 149:. 64:: 20:)

Index

Talk:Principal components analysis

level-5 vital article
content assessment
WikiProjects
WikiProject icon
Statistics
WikiProject icon
WikiProject Statistics
statistics
the discussion
High
importance scale
WikiProject icon
Mathematics
WikiProject icon
icon
Mathematics portal
WikiProject Mathematics
mathematics
the discussion
High
project's priority scale
Non-linear iterative partial least squares
merged
Principal component analysis
its history
its talk page

1

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.