What is the best way to find the density of values ​​in a given matrix?

Suppose I have an n:n matrix with short - only positive values ​​in it:

 0 1 0 3 1 0 5 6 7 1 0 4 6 2 7 9 

I am looking for the m:m matrix inside this one that contains most of the values ​​that are greater than 0. My problem is that the solution that I still do not scale well with n (and m ).

In fact, the n:n matrix represents the prices of the product, and the axes represent the days from a given (arbitrary) day. Thus, you can search for prices for a certain period of time. The m:m matrix is ​​actually a 7 x 7 matrix that contains a subset (as a form) of prices. I am looking for the part of the matrix n:n where I have the most prices.

In the above example, the matrix m:m

 7 1 6 2 

where m is 2.

Here are the parts of the prototype that I have written so far:

 private static class ResultMatrixData { private byte fillCount; private short distanceFromToday; public ResultMatrixData() { fillCount = 0; distanceFromToday = Short.MAX_VALUE; } public ResultMatrixData(short[][] pricesMatrix, short iArg, short jArg) { byte fillCount = 0; for (int i = iArg; i < iArg + 7; i++) { for (int j = jArg; j < jArg + 7; j++) { if (pricesMatrix[i][j] > 0) { fillCount++; } } } this.fillCount = fillCount; distanceFromToday = iArg > jArg ? iArg : jArg; } } private ResultMatrixData calculateSingleResult(short[][] pricesMatrix) { ResultMatrixData bestSoFar = new ResultMatrixData(); ResultMatrixDataComparator comparator = new ResultMatrixDataComparator(); for (short i = 0; i < NUMBER_OF_DAYS - 6; i++) { for (short j = 0; j < NUMBER_OF_DAYS - 6; j++) { ResultMatrixData current = new ResultMatrixData(pricesMatrix, i, j); if (comparator.compare(current, bestSoFar) >= ResultMatrixDataComparator.GREATER_THAN) { bestSoFar = current; } } } return bestSoFar; } private static class ResultMatrixDataComparator implements Comparator<ResultMatrixData> { private static final int LESS_THAN = -1; private static final int EQUAL = 0; private static final int GREATER_THAN = 1; @Override public int compare(ResultMatrixData first, ResultMatrixData second) { if (first.fillCount > second.fillCount) { return GREATER_THAN; } else if (first.fillCount < second.fillCount) { return LESS_THAN; } else { if (first.distanceFromToday < second.distanceFromToday) { return GREATER_THAN; } else if (first.distanceFromToday > second.distanceFromToday) { return LESS_THAN; } } return EQUAL; } } 

My problem is that the runtime seems square or exponential (I did not perform an exact asymptotic analysis):

 n (days) | running time in ms 1 * 365 | 48 2 * 365 | 123 3 * 365 | 278 4 * 365 | 482 5 * 365 | 733 6 * 365 | 1069 7 * 365 | 1438 8 * 365 | 1890 9 * 365 | 2383 10 * 365 | 2926 11 * 365 | 3646 12 * 365 | 4208 13 * 365 | 5009 

Do you have any suggestions on how I can optimize this algorithm?

Note: this is not homework.

Edit: As others said in the answers, the time complexity here is O (( n - m ) ^ 2). I am looking for something sub-quadratic and that scales well, and n converges to infinity.

+4
source share
4 answers

Theoretically speaking, speaking of the worst case scenario when calculating complexity, you cannot do better than O ((mn) ^ 2), which is actually O (n ^ 2) if n is not greater than C * m when C is some positive constant. The reason is that even if you knew that the only possible matrices are all zeros, with the exception of one cell, in the worst case, you cannot answer the question without going through all the cells of the m: m matrix, except for n ^ 2 cells.

I would suggest the following algorithm, which may even give more options than specified.

  • Create a matrix A of the same size as the original matrix M, in which cell (i, j) will hold the number of non-zeros in the rectangle between (0,0). You can fill it simply by going from right to left one by one and calculating: A (i, j) = A (i-1, j) + (A (i, j-1) -A (i-1, j- 1)) + (M (i, j)! = 0) . Or instead of (A (i, j-1) -A (i-1, j-1)), you can have a counter for the number of those that you encountered in the i-line. In this pseudo-code, A (0, j) or (i, 0) means 0, assuming that the indices begin with 1.

  • Now you can request in O (1) the number of nonzero in each triangular submatrix M ((i, j) (l, k)) M, including n: n matrices (and, as already mentioned, you have (mn) ^ 2 of them: num_of_non-zeros in M ​​((i, j) (l, k)) = A (l, k) -A (l, j) -A (i, k) + A (i, j)

Note that you can combine 1 and 2 with the same double cycle and check the number of nonzero values ​​of the submatrix n: n M ((in, jn) (i, j)) immediately after calculating A (I, J).

So, you get a quadratic simple algorithm that can be easily extended to other similar applications.

+2
source

There are 2 pieces of data that you are not using enough:

  • You still maintain your “best” result. you can break out of a certain “rectangle” by evaluating whether it can beat your current “best” (so if you have already seen 2x2 with three non-zero elements, and you just hit your second zero, evaluating a certain rectangle, which you can break out of him

  • you also know the maximum possible counter for the rectangle = mxm (so 4 is at your disposal). the moment you find a rectangle with four non-zeros, you can break it all - its the best you could get.

However

non of these sentences is an algorithmic improvement.

you can try the "scan window": 1.starting at 0,0, calculate the score for the "top left" mxm window, using the full scan, as it is now.

  • scan directly to one column - take your score, align the grade for the column of the leftmost (lowest index) and add an score for the neighboring column to the right.

  • go to step 2 until you reach the end of the line, then expand one line down (align the ratings for the top line, add them for the line below you

  • the progress is “left” (with respect to the lower indices of the column) until you reach the edge, at which point expand one division and start the heading again (step 2).

this will save you some recalculations if m is large. just to demonstrate the iterative order of this algorithm, it is scanning a 3x3 rectangle from a 7x7 board:

  going right --> hitting the edge, moving hit edge, go down etc down one row, heading left head right xxx.... .xxx... ..xxx.. ....xxx ....... ....... ....... ....... ....... xxx.... .xxx... ..xxx.. ....xxx ....xxx ...xxx. xxx.... ....... ....... xxx.... .xxx... ..xxx.. ....xxx ....xxx ...xxx. xxx.... xxx.... ....xxx ....... ....... ....... ... ....... ....xxx ...xxx. ... xxx.... xxx.... ... ....xxx ....... ....... ....... ....... ....... ....... ....... xxx.... ....xxx ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... 

this way I only “calculate” 6 elements instead of 9 every time I go to the next rectangle - the edges. advantage increases, the more your t.

parallelizing this

You can scan each "line" as a separate task (through several cores and even machines).

  Task 1 | Task 2 | Task N xxx.... ....xxx | ....... ....... | ....... ....... xxx.... ....xxx | xxx.... ....xxx | ....... ....... xxx.... --> ....xxx | xxx.... --> ....xxx | ....... --> ....... ....... ....... | xxx.... ....xxx | xxx.... ....xxx ....... ....... | ....... ....... | xxx.... ....xxx ....... ....... | ....... ....... | xxx.... ....xxx 

you just need to choose the best result from the results returned by each task (each task returns the best result for its row)

theoretical estimates: for m - the size of the "window", and M - the size of the board, there are (Mm) x (Mm) such windows, and in the worst case, that's it. so I don’t think you can avoid the O (n ^ 2) curve here. you can just play with the odds

+1
source

from the question, I assume the following: find the maximal (nxn) submatrix B_ {i, j} in A that contains the smallest number of zeros

if it is correct:

  • calculate (or guess) the maximum element that is present in the matrix and accept its negation, name this value as: POISON
  • go through all elements A and POISON to unnecessary numbers (<= 0)
  • calculate the full prefix of each line ( http://en.wikipedia.org/wiki/Prefix_sum )

     y_0=x_0; for(int i=1;i<n;i++){ y_i=y_{i-1}+x_i 

    }

  • apply the following for all line prefixes:

    x_i = x_i - x_ {im}

    delete the first elements of "m-1"

  • line prefixes form a matrix, translate it (you can get around this depending on your implementation, but if you do, the implementation will be more complicated)

  • repeat step 3 in step 5.

  • repeat step 4 at the output of step 6
  • find the maximum element in the matrix ... if it is in (i, j) then in the original matrix: starting from (j, i) there will be a maximum confusion

complexity based on A size (n): O (n * n)

0
source

I'm not sure how far you could go with this approach, but here we go:
I would try swapping rows and columns to move zeros in the upper left corner.
Thus, the resulting m: m matrix will be found in the upper left corner.

We need to evaluate whether it is interesting to change rows / columns.
To do this, we construct a cost function based on this weight matrix:

  7 6 5 4 3 2 1 6 6 5 4 3 2 1 5 5 5 4 3 2 1 4 4 4 4 3 2 1 3 3 3 3 3 2 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 

In other words, the weight of row i, column j (based on 0) is min (ni, nj).

Each 0 found is worth the corresponding weight, and we want to minimize the total cost.
An exchange is interesting if it reduces the overall cost.

To reduce the cost of evaluation, we could use some structure for a sparse matrix:

  • the location of the positions of zeros in the row, that is, the set (columnIndex) for each rowIndex
  • the location of the positions of zeros in the column, i.e. a set (rowIndex) for each columnIndex

Now we have a problem sorting rows and columns.
A suboptimal approach will be to solve the sub-sum separately:

  • swap rows,
  • swap
  • iteration until costs evolve.

There is an advantage to replacing the two lines i and k if:

 weigth.atRow(i).sumOfIndices(zerosPerRow.at(i)) + weigth.atRow(k).sumOfIndices(zerosPerRow.at(k)) > weigth.atRow(i).sumOfIndices(zerosPerRow.at(k)) + weigth.atRow(k).sumOfIndices(zerosPerRow.at(i)) 

Please note that this is not a complete relationship order, so not all sorting algorithms will be successful.

Perhaps there is interest in reducing an even greater combinatorial function with additional heuristics: swap rows with the most zeros at the bottom, a column with most zeros to the right.

Obviously, full-rank rows / columns do not need to be sorted after moving up / left.

So, it is possible that sorting a subset of rows / columns of the same rank is a reasonable (sub) optimal algorithm.

0
source

Source: https://habr.com/ru/post/1491701/


All Articles