It will be much faster to remove small factors until the remainder is simple.
static private long maxfactor (long n)
{
long k = 2;
while (k * k <= n)
{
if (n % k == 0)
{
n /= k;
}
else
{
++k;
}
}
return n;
}
For example, if n = 784, it does 9 modulo operations instead of a few hundred. Counting even with sqrt limit will still do 21 modulo ops only in maxfactor, and another dozen in primo.
New optimized version here