The lstsq routine processes any system: overridden, underdetermined, or well defined. Its output is what you get from pinv (a) * b, but it is faster than calculating the pseudo-inverse. That's why:
General recommendations: do not calculate the inverse matrix if you do not need it. Solving a system for a particular right-hand side is faster than inverting its matrix.
However, your approach with the solution T a = a T b is faster, even if you invert the matrix. What gives? The fact is that inverting a T a is valid only when a has the full rank of the column. Thus, you limited the problem to this particular situation and got speed as a compromise for less generality and, as I will show below, for less security.
But matrix inversion is still inefficient. If you know that a has the full rank of a column, the following is faster than any of your three attempts:
np.linalg.solve(np.dot(aT, a), np.dot(aT, b))
However, lstsq is still preferable above when it comes to weakly conditional matrices. Product formation a T basically squares the condition number, so you are more likely to get meaningless results. Here's an example with caution using the SciPy linalg module (which is essentially equivalent to NumPy but has more methods):
import numpy as np import scipy.linalg as sl a = sl.hilbert(10)
Here lstsq gives almost the same result as solve (the only solution to this system). However, sol3 is completely wrong due to numerical problems (which you won't even be warned about).
sol1:
[ -9.89821788e+02, 9.70047434e+04, -2.30439738e+06, 2.30601241e+07, -1.19805858e+08, 3.55637424e+08, -6.25523002e+08, 6.44058066e+08, -3.58346765e+08, 8.31333426e+07]
sol2:
[ -9.89864366e+02, 9.70082635e+04, -2.30446978e+06, 2.30607638e+07, -1.19808838e+08, 3.55645452e+08, -6.25535946e+08, 6.44070387e+08, -3.58353147e+08, 8.31347297e+07]
sol3:
[ 1.06913852e+03, -4.61691763e+04, 4.83968833e+05, -2.08929571e+06, 4.55280530e+06, -5.88433943e+06, 5.92025910e+06, -5.56507455e+06, 3.62262620e+06, -9.94523917e+05]