Can you explain why we got a=-1/sqrt(2) and b=1/sqrt(2) at the second part of the question?

Solve the PCA problem - this is the first eigenvector…

correct me if i'm wrong, but i think you took the wrong matrix.

you to the matrix given in the question but this is not X. x is only the first column of it.

to get X i think you need to add 1 to all the Xs. so the first column is the same, and the second is all 1's.

i still didn't get the right answer though, but i think what you did was wrong

I first normalized the data (substracting (0.6, 0.6) from every vector).

Then, xx^t is:

16/5 -9/5

-9/5 16/5

I found that the eigenvalues are 5 and 7/5 by looking at the characteristic polynomial.

The maximal eigenvalue is 5, so I looked for a solution to:

xx^tv = 5v

and got two equations, that were linearly dependent, and resulted in v1 = - v2 (where v = (v1, v2)).

Finally, together with the constraint v1 ^2 + v2 ^ 2 = 1, the result is 1/sqrt(2) and -1/sqrt(2).

I think you can change multiply the result by -1 and still get an optimal solution.

Actually, I'm not sure. If the above calculation is correct, we would get the same results even if we add (1000, 0) to all the points. But the same a and b can't be correct if we add (1000, 0) to all points.

So - I'm not sure.

I agree with Guy. (-1/sqrt(2), 1/sqrt(2)) is indeed the first eigenvector, but it is not the right assignment for a and b.

The right assignment is a = -1, b = 0, because this is the correct way to go from a vector to a line equation. PCA finds a direction, so there is no logic in b != 0, I think the answer is just wrong.

X = [2 0; 1 0; 0 2; 0 1; 0 0] ;

X=X';

X = X - 0.6 ;

[V,~] = eig(X*X');

V

eig(X*X')

the solution is right …

please note that you have to normalized ( === > mean over row = 0) the data first.

I think that pca really returns the direction, so (-1/sqrt(2), 1/sqrt(2)) means that a = -1.

I calculated the line that minimizes the sum of square distance from the points to it, using the formula found here:

en.wikipedia.org wiki Distance_from_a_point_to_a_line (replace spaces with /, I can't post links).

This is supposed to be equivalent to pca.

The result I got was really a=-1, and b=6/5. I may have a calculation error, but I think the result looks reasonable (a is as expected, b = 6/5 does give a line that is close to the points).

I also think that b=0 otherwise the distance of the points is not perpendicular to the line which is what expected from PCA.

If you draw the line according to $a=-1/\sqrt{2}$ and $b=1/\sqrt{2}$ you will get that the points are not perpendicular to this line.