In the case where n is much longer than 2 ^ p, you can avoid the pain in quadratic time by doing something like this:
def mod1(n,p): while n.bit_length() > 3*p: k = n.bit_length() // p k1 = k>>1 k1p = k1*p M = (1<<k1p)-1 n = (n & M) + (n >> k1p) Mp = (1<<p)-1 while n.bit_length() > p: n = (n&Mp) + (n>>p) if n==Mp: return 0 return n
[EDITED because I messed up the formatting before; thanks to Benjamin for that. Moral: Do not copy or paste from the standby window in SO. Excuse me!]
(Note: the criterion for shortening the length of n, rather than taking p from it and choosing k1 exactly, is a little wrong, but that doesn't matter, so I didn't worry about fixing them.)
If I take p = 12345 and n = 9 ** 200000 (yes, I know that p is then not simple, but it does not matter here), then this is about 13 times faster.
Unfortunately, this will not help you , because in the LL test, n is never more than about (2 ^ p) ^ 2. Unfortunately.
source share