Why is my default version of LU faster than mine? Python numpy

I attend a course in numerical analysis at my university. We study the decomposition of LU . I tried to implement my version before looking at my teacher. I thought mine was pretty fast, but actually comparing them, my lecturer version is much faster, although it uses loops! Why is this?

Lecturer version

def LU_decomposition(A):
    """Perform LU decomposition using the Doolittle factorisation."""

    L = np.zeros_like(A)
    U = np.zeros_like(A)
    N = np.size(A, 0)

    for k in range(N):
        L[k, k] = 1
        U[k, k] = (A[k, k] - np.dot(L[k, :k], U[:k, k])) / L[k, k]
        for j in range(k+1, N):
            U[k, j] = (A[k, j] - np.dot(L[k, :k], U[:k, j])) / L[k, k]
        for i in range(k+1, N):
            L[i, k] = (A[i, k] - np.dot(L[i, :k], U[:k, k])) / U[k, k]

    return L, U

My version

def lu(A, non_zero = 1):
    '''
    Given a matrix A, factorizes it into two matrices L and U, where L is
    lower triangular and U is upper triangular. This method implements
    Doolittle method which sets l_ii = 1, i.e. L is a unit triangular
    matrix.

    :param      A: Matrix to be factorized. NxN
    :type       A: numpy.array

    :param non_zero: Value to which l_ii is assigned to. Must be non_zero.
    :type  non_zero: non-zero float.

    :return: (L, U)
    '''
    # Check if the matrix is square
    if A.shape[0] != A.shape[1]:
        return 'Input argument is not a square matrix.'

    # Store the size of the matrix
    n = A.shape[0]

    # Instantiate two zero matrices NxN (L, U)
    L = np.zeros((n,n), dtype = float)
    U = np.zeros((n,n), dtype = float)

    # Start algorithm
    for k in range(n):
        # Specify non-zero value for l_kk (Doolittle's)
        L[k, k] = non_zero
        # Case k = 0 is trivial
        if k == 0:
            # Complete first row of U
            U[0, :] = A[0, :] / L[0, 0]
            # Complete first column of L
            L[:, 0] = A[:, 0] / U[0, 0]
        # Case k = n-1 is trivial
        elif k == n-1:
            # Obtain  u_nn
            U[-1, -1] = (A[-1, -1] - np.dot(L[-1, :], U[:, -1])) / L[-1, -1]

        else:
            # Obtain u_kk
            U[k, k] = (A[k, k] - np.dot(L[k, :], U[:, k])) / L[k, k]
            # Complete kth row of U
            U[k, k+1:] = (A[k, k+1:] - [np.dot(L[k, :], U[:, i]) for i in \
                         range(k+1, n)]) / L[k, k]
            # Complete kth column of L
            L[k+1:, k] = (A[k+1:, k] - [np.dot(L[i, :], U[:, k]) for i in \
                         range(k+1, n)]) / U[k, k]
    return L, U

Benchmarking

I used the following commands:

A = np.random.randint(1, 10, size = (4,4))
%timeit lu(A)
57.5 µs ± 2.67 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit LU_decomposition(A)
42.1 µs ± 776 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

And also, how is it that the scipy version is much better?

scipy.linalg.lu(A)
6.47 µs ± 219 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
+4
source share
2 answers

python, . numpy , , , numpy, python, .

Scipy , , , , , , python.

Cython , python.

+2

, - , :

  • python [np.dot(L[k, :], U[:, i]) for i in range(k+1, n)]
  • numpy A[k, k+1:] - temp_list
  • numpy temp_ndarray / L[k, k]
  • ,

CPU , . Numpy , ! , , X numpy 1 python, . , .

Scipy , ( python - ). , , , , :)

+1
source

Source: https://habr.com/ru/post/1693440/


All Articles