Search for a two-dimensional dataset area

Question

Search for a two-dimensional dataset area

I have a .txt file containing about 100,000 points in a two-dimensional field. When I draw the dots, there is a well-defined 2-D area (think of a two-dimensional disk that has changed a bit).

What is the easiest way to calculate the region of this region? Any way to make this easy in matlab?

I made a polygonal approximation by finding a bunch (for example, 40) at the border of the region and calculating the area of the polygonal region in Matlab, but I was wondering if there is another, less tedious method than finding 40 points on the border.

+6

algorithm matlab computational-geometry

db1234 Aug 4 '11 at 0:20

source share

5 answers

Consider the following example:

%# random points x = randn(300,1); y = randn(300,1); %# convex hull dt = DelaunayTri(x,y); k = convexHull(dt); %# area of convex hull ar = polyarea(dt.X(k,1),dt.X(k,2)) %# plot plot(dt.X(:,1), dt.X(:,2), '.'), hold on fill(dt.X(k,1),dt.X(k,2), 'r', 'facealpha', 0.2); hold off title( sprintf('area = %g',ar) )

There is a short screencast by Doug Hull that solves this exact problem.

EDIT:

I am posting a second answer inspired by the solution proposed by @ Jean-FrançoisCorbett .

First, I create random data and using an interactive brush tool , I delete some points to make it look like the desired shape of the "kidney" ...

In order to have a basic level for comparison, we can manually track the area of the environment using the IMFREEHAND function (I do this using my laptop touchpad, so it’s not the most accurate drawing!). Then we find the region of this polygon using POLYAREA . Like my previous answer, I also compute a convex hull:

Now, and based on the previous SO question, I answered (2D histogram), the idea is to lay a grid over the data. The choice of grid resolution is very important, mine was numBins = [20 30]; for the data used.

Then we count the number of squares containing a sufficient number of points (I used at least 1 point as a threshold, but you could try a higher value). Finally, we multiply this number by the area of one square of the grid to get an approximate total area.

 %### DATA ### %# some random data X = randn(100000,1)*1; Y = randn(100000,1)*2; %# HACK: remove some point to make data look like a kidney idx = (X<-1 & -4<Y & Y<4 ); X(idx) = []; Y(idx) = []; %# or use the brush tool %#brush on %### imfreehand ### figure line('XData',X, 'YData',Y, 'LineStyle','none', ... 'Color','b', 'Marker','.', 'MarkerSize',1); daspect([1 1 1]) hROI = imfreehand('Closed',true); pos = getPosition(hROI); %# pos = wait(hROI); delete(hROI) %# total area ar1 = polyarea(pos(:,1), pos(:,2)); %# plot hold on, plot(pos(:,1), pos(:,2), 'Color','m', 'LineWidth',2) title('Freehand') %### 2D histogram ### %# center of bins numBins = [20 30]; xbins = linspace(min(X), max(X), numBins(1)); ybins = linspace(min(Y), max(Y), numBins(2)); %# map X/Y values to bin-indices Xi = round( interp1(xbins, 1:numBins(1), X, 'linear', 'extrap') ); Yi = round( interp1(ybins, 1:numBins(2), Y, 'linear', 'extrap') ); %# limit indices to the range [1,numBins] Xi = max( min(Xi,numBins(1)), 1); Yi = max( min(Yi,numBins(2)), 1); %# count number of elements in each bin H = accumarray([Yi(:), Xi(:)], 1, [numBins(2) numBins(1)]); %# total area THRESH = 0; sqNum = sum(H(:)>THRESH); sqArea = (xbins(2)-xbins(1)) * (ybins(2)-ybins(1)); ar2 = sqNum*sqArea; %# plot 2D histogram/thresholded_histogram figure, imagesc(xbins, ybins, H) axis on, axis image, colormap hot; colorbar; %#caxis([0 500]) title( sprintf('2D Histogram, bins=[%d %d]',numBins) ) figure, imagesc(xbins, ybins, H>THRESH) axis on, axis image, colormap gray title( sprintf('H > %d',THRESH) ) %### convex hull ### dt = DelaunayTri(X,Y); k = convexHull(dt); %# total area ar3 = polyarea(dt.X(k,1), dt.X(k,2)); %# plot figure, plot(X, Y, 'b.', 'MarkerSize',1), daspect([1 1 1]) hold on, fill(dt.X(k,1),dt.X(k,2), 'r', 'facealpha',0.2); hold off title('Convex Hull') %### plot ### figure, hold on %# plot histogram imagesc(xbins, ybins, H>=1) axis on, axis image, colormap gray %# plot grid lines xoff = diff(xbins(1:2))/2; yoff = diff(ybins(1:2))/2; xv1 = repmat(xbins+xoff,[2 1]); xv1(end+1,:) = NaN; yv1 = repmat([ybins(1)-yoff;ybins(end)+yoff;NaN],[1 size(xv1,2)]); yv2 = repmat(ybins+yoff,[2 1]); yv2(end+1,:) = NaN; xv2 = repmat([xbins(1)-xoff;xbins(end)+xoff;NaN],[1 size(yv2,2)]); xgrid = [xv1(:);NaN;xv2(:)]; ygrid = [yv1(:);NaN;yv2(:)]; line(xgrid, ygrid, 'Color',[0.8 0.8 0.8], 'HandleVisibility','off') %# plot points h(1) = line('XData',X, 'YData',Y, 'LineStyle','none', ... 'Color','b', 'Marker','.', 'MarkerSize',1); %# plot convex hull h(2) = patch('XData',dt.X(k,1), 'YData',dt.X(k,2), ... 'LineWidth',2, 'LineStyle','-', ... 'EdgeColor','r', 'FaceColor','r', 'FaceAlpha',0.5); %# plot freehand polygon h(3) = plot(pos(:,1), pos(:,2), 'g-', 'LineWidth',2); %# compare results title(sprintf('area_{freehand} = %g, area_{grid} = %g, area_{convex} = %g', ... ar1,ar2,ar3)) legend(h, {'Points' 'Convex Jull','FreeHand'}) hold off

Here is the final result of all three superimposed methods with area approximation mapping:

+6

Amro Aug 4 '11 at 0:40

source share

I know almost nothing, so don’t put a lot of attention into it ... think about Delaney’s triangulation. Then remove all outer (larger) edges of the case that exceed a certain maximum. Repeat until you delete anything. Fill in the remaining triangles.

This will cause you to have other ejection points.

+2

Ed staub Aug 4 '11 at 1:49

source share

I suggest using a space-filling curve, such as a z-curve or better a moore curve. Sfc fills a complete gap and indexes every point well. For example, for all f (x) = y, you can sort the points of the curve in ascendending order, and from this result you take so many points until you get a complete circuit. Then these points can be used to calculate the area. Since you have many points, perhaps you want to use fewer points and use a cluster that makes the result less accurate.

0

Bytemain Aug 4 '11 at 3:58

source share

I think you can get the boundary points using the convex hull algorithm with a restriction on the length of the edge (you need to sort the points along the vertical axis). Thus, this will follow the convexity of your region. I suggest a length of about 0.02. In any case, you can experiment with different lengths by pulling the result and viewing it visually.

0

Sergey Bankevich Aug 4 '11 at 11:01

source share

Jean-François Corbett · Accepted Answer · 2011-08-04T11:57:53+0000

My answer is simple and perhaps the least elegant and accurate. But first, a comment on the previous answers:

Since your shape usually has the shape of a kidney (not convex), calculating the area of its convex hull will not be performed, and an alternative is to determine its concave body (see, for example, http://www.concavehull.com/home.php?main_menu= 1 ) and calculate its area. But defining a concave hull is much more complicated than a convex hull. Plus, tear points can cause problems in both the convex and concave shells.

Delaunay triangulation followed by cropping, as suggested in @Ed Staub's answer, might be a bit simpler.

My own suggestion is: How accurate are your surface calculations? I think not really. With a concave body or clipped Delaunay triangulation, you still have to make an arbitrary choice where the “border” of your figure is (the edge is not a sharp knife, and I see fragmentation dots splashing around It).

Therefore, a simpler algorithm can be just as good for your application.

Separate the image in an orthogonal grid. Scroll through all the "pixels" of the grid or squares; if the given square contains at least one point (or, possibly, two points?), mark the square as full, otherwise empty. Finally, add the area of all full squares. Bingo.

The only parameter is the resolution length (square size). Its value should be set to something similar to the trim length in the case of Delaunay triangulation, i.e. "The points in my form are closer to each other than this length, and indicates further than this length should be ignored."

Perhaps an additional parameter is the number of point thresholds for the square, which is considered complete. Maybe 2 would be nice to ignore the tip of the glasses, but that might determine the basic shape too much for your taste ... Try 1 and 2, and maybe take an average of both. Or use 1 and remove the squares that have no neighbors (game style). Cute, empty squares whose 8 neighbors are filled should be considered full to avoid holes in the middle of the form.

There is no end to how much this algorithm can be refined, but due to the arbitrariness inherent in defining the problem in your particular application, any refinement is probably the equivalent of the “polish Turk” algorithm.

Search for a two-dimensional dataset area

EDIT:

More articles: