O (n ^ 2). I think this is optimal since you have n ^ 2 cells.
Note that the upper left corner and lower right corner of any square lie along the same diagonal.
Now, if we could process each diagonal in O (n) time, we would have an O (n ^ 2) time algorithm.
Say we have a candidate for the upper left corner. We can calculate the number of continuous black cells below and to the right of it and take at least two of them and call it T.
For the lower right candidate, we can calculate the number of continuous black cells to his left, and at the top and take a minimum of two, call him.
Once we have two numbers T and B, we can tell if a given upper left, lower right candidate really gives us a square with all black borders.
Now these two numbers can be calculated for each cell O (n ^ 2) times by doing four passes over the entire matrix (How?).
So let's say this is done.
Now consider the diagonal. Our goal is to find the upper left lower right pair along this diagonal in O (n) time.
Suppose we have T computed in the array T [1 ... m], where m is the length of the diagonal. Similarly, we have an array B [1 ... m]. T [1] corresponds to the upper left edge of the diagonal and T [m] in the lower right corner. Similar to B.
Now we are processing the diagonal as follows, for each candidate to the upper left I am trying to find the lower right candidate j, which will give the largest square. Note that j> = i. Also note that if (i, j) is a candidate, then (i ', j) does not exist, where i'> i.
Note that i and j form a square if T[i] >= j-i+1 and B[j] >= j-i+1 .
i.e. T[i] +i - 1 >= j and B[j] -j - 1 >= -i .
So, we form new arrays such that TL[k] = T[k] + k -1 and BR[k] = B[k] -k - 1 .
So, given the two new arrays TL and BR and i, we need to answer the following queries:
What is the largest j such that TL [i]> = j and BR [j]> = -i?
Now suppose we were able to process BR for maximum range requests (this can be done in O (m) time).
Now, given TL [i], in the range [i, TL [i]] we find the maximum value of BR, for example, BR [j1].
Now, if BR [j1]> = -i, we find the maximum BR in the range [j1, TL [i]] and continue this path.
As soon as we find a candidate (TL [i], BR [j]), we can ignore the array BR [1 ... j] for future i.
This allows us to process each diagonal in O (n) time, giving a complete O (n ^ 2) algorithm.
I forgot a lot of details and gave a sketch, because it was already too long. Feel free to edit this with clarifications.
Phew