I tried this little job to rewrite the filter using the delay statement z
For example, for e = 4, I renamed input u and printed y
d1*z= u d2*z= c*u + a*d1 d3*z= d2 + a*(d3-d1*z) d4*z= d3 + a*(d4-d2*z) d5*z= d4 + a*(d5-d3*z) y = (b2*d3*z + b3*d4*z + b4*d5*z)
Note that di are filter states.
d3 * z is the next d3 value (it is apparently the d2 variable in your code)
Then you can remove di to write the transfer function y / u to z.
Then you will find that the minimal representation requires only e states by factorizing / simplifying the above transfer function.
The denominator z*(za)^3 , that is, the pole at the point 0, and the other in with multiplicity (e-1).
Then you can put your filter in the standard representation of the state space matrix:
z*X = A*X + B*u y = C*X + d*u
With a special shape of the poles, you can decompose the transfer in a partial decomposition of the shares and get the matrices A and B in this special form (matlab as a designation)
A = [0 1 0 0; B=[0; 0 a 1 0; 0; 0 0 a 1; 0; 0 0 0 a] 1]
C and d are a little easier than ... They are extracted from the numerators and the direct expression of the partial expansion of the fraction. These are polynomials in bi, c (degree 1) and a (degree e)
For e = 4 we have
C=[ a^3*b2 - a^2*b3 + a*b4 , -a^2*b2 + a*b3 + (ca^2)*b4 , a*b2 + (ca^2)*b3 + (-2*a^2*c-2*a^2-a+a^4)*b4 , (ca^2)*b2 + (-a^2-a^2*ca)*b3 + (-2*a*c+2*a^3)*b4 ] d= -a*b2 - a*c*b3 + a^2*b4
If you can find recurrence in controlling C and d and precommute them
then the filter can be reduced to those simple vector operations:
z*X = a*[ 0; x2 ; x3 ; x4 ... xe ] + [x2 ; x3 ; x4 ... xe ; u ]; Y = C*[ x1 ; x2 ; x3 ; x4 ... xe ] + d*u
Or expressed as a function (Xnext,y)=filter(X,u,a,C,d,e) pseudocode:
y = dot_product( C , X) + d*u;
Please note that if you use BLAS features, your code will be ported to many hardware, not just Applecentric, and I think the performance will not be much different.
EDIT: About Partial Expansion of the Fraction
A pure expansion of the partial fraction would give an idea of the diagonal state space and the matrix B, complete 1s. This is also an interesting option. (filters in parallel)
My version used above is more like a cascade or ladder (filters in the series).