Recognition of handwritten circles, diamonds and rectangles.

I am looking for some tips on recognizing three handwritten shapes - circles, diamonds and rectangles. I tried different approbaki, but they could not, so perhaps you could point me in a different, better direction.

What I tried:

1) A simple algorithm based on the point product between the points of the handwritten form and the ideal form. This does not work so badly at recognizing a rectangle, but failures on circles and diamonds. The problem is that the dot product of a circle and diamond is quite similar even to ideal shapes.

2) The same approach, but using dynamic temporary bending as a measure of similarity. Strong problems.

3) Neural networks. I tried several approbations - providing data points to neural networks (Feedforward and Kohonen) or providing a rasterized image. For Kohonen, he always classified all the data (the case that was used for training) into the same category. Better with dots (but at the same level as aproach 1 and 2), and with a rasterized image it was very slow (I need at least size input neurons of size 2 and for a small size of the raster circle are indistinguishable even for me; )) and also to no avail. I think this is because all these shapes are closed shapes? I'm not a big ANN specialist (1 semester passed), maybe I'm using them incorrectly?

4) Saving the figure as a Freeman Chain Code and using some similarity calculation algorithms. I, however, that in the FCC the figures will be really different from each other. No success here (but I have no such path very deeply).

I am creating an Android application with this, but I think this language does not matter here.

+6
source share
11 answers

Given the possible variation of the handwritten inputs, I would suggest that the neural network approach is the way to go; it will be difficult or impossible for you to accurately model these classes manually. The LastCoder attempt works to a certain extent, but it does not cope with significant variations or promises high accuracy if you work further - this type of manual engineering approach has been canceled a long time ago.

Modern handwritten character classification results these days are usually achieved using convolutional neural networks (CNNs) . Given that you only have 3 classes, the problem should be simpler than classifying numbers or characters, although from experience with MNIST is a set of data with handwritten numbers, I expect that your circles, squares and diamonds can sometimes be difficult to so that even people can distinguish.

So, if it was before me, I would use CNN. I would inject binary images taken from the drawing area into the first layer of the network. This may require some pre-processing. If the drawn shapes cover a very small area of ​​the input space, you can benefit from increasing them (i.e. increasing the line thickness) to make the shapes more invariant to small differences. It may also be useful to center the figure in the image, although the pooling step may make it easier to do so.

I would also like to note that the more training data, the better. Often there is a trade-off between increasing the size of one dataset and improving one model. Synthesizing more examples (for example, using skew, rotate, shift, stretch, etc.), or spend several hours drawing shapes, can do more good than you could get at the same time, trying to improve your model.

Good luck with your application!

+3
source

Here is some working code for the form classifier. http://jsfiddle.net/R3ns3/ I pulled the threshold numbers (* Threshold variables in the code) from the air, so of course they can be tuned to get better results.

I use the bounding box, the midpoint in the subseries, the angle between the points, the polar angle from the center of the frame, and angle recognition. It can classify drawn rectangles, diamonds and circles. The code records the points when you click the mouse and tries to classify when you stop drawing.

HTML

<canvas id="draw" width="300" height="300" style="position:absolute; top:0px; left:0p; margin:0; padding:0; width:300px; height:300px; border:2px solid blue;"></canvas> 

Js

 var state = { width: 300, height: 300, pointRadius: 2, cornerThreshold: 125, circleThreshold: 145, rectangleThreshold: 45, diamondThreshold: 135, canvas: document.getElementById("draw"), ctx: document.getElementById("draw").getContext("2d"), drawing: false, points: [], getCorners: function(angles, pts) { var list = pts || this.points; var corners = []; for(var i=0; i<angles.length; i++) { if(angles[i] <= this.cornerThreshold) { corners.push(list[(i + 1) % list.length]); } } return corners; }, draw: function(color, pts) { var list = pts||this.points; this.ctx.fillStyle = color; for(var i=0; i<list.length; i++) { this.ctx.beginPath(); this.ctx.arc(list[i].x, list[i].y, this.pointRadius, 0, Math.PI * 2, false); this.ctx.fill(); } }, classify: function() { // get bounding box var left = this.width, right = 0, top = this.height, bottom = 0; for(var i=0; i<this.points.length; i++) { var pt = this.points[i]; if(left > pt.x) left = pt.x; if(right < pt.x) right = pt.x; if(top > pt.y) top = pt.y; if(bottom < pt.y) bottom = pt.y; } var center = {x: (left+right)/2, y: (top+bottom)/2}; this.draw("#00f", [ {x: left, y: top}, {x: right, y: top}, {x: left, y: bottom}, {x: right, y: bottom}, ]); // find average point in each sector (9 sectors) var sects = [ {x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0}, {x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0}, {x:0,y:0,c:0},{x:0,y:0,c:0},{x:0,y:0,c:0} ]; var x3 = (right + (1/(right-left)) - left) / 3; var y3 = (bottom + (1/(bottom-top)) - top) / 3; for(var i=0; i<this.points.length; i++) { var pt = this.points[i]; var sx = Math.floor((pt.x - left) / x3); var sy = Math.floor((pt.y - top) / y3); var idx = sy * 3 + sx; sects[idx].x += pt.x; sects[idx].y += pt.y; sects[idx].c ++; if(sx == 1 && sy == 1) { return "UNKNOWN"; } } // get the significant points (clockwise) var sigPts = []; var clk = [0, 1, 2, 5, 8, 7, 6, 3] for(var i=0; i<clk.length; i++) { var pt = sects[clk[i]]; if(pt.c > 0) { sigPts.push({x: pt.x / pt.c, y: pt.y / pt.c}); } else { return "UNKNOWN"; } } this.draw("#0f0", sigPts); // find angle between consecutive 3 points var angles = []; for(var i=0; i<sigPts.length; i++) { var a = sigPts[i], b = sigPts[(i + 1) % sigPts.length], c = sigPts[(i + 2) % sigPts.length], ab = Math.sqrt(Math.pow(bx-ax,2)+Math.pow(by-ay,2)), bc = Math.sqrt(Math.pow(bx-cx,2)+ Math.pow(by-cy,2)), ac = Math.sqrt(Math.pow(cx-ax,2)+ Math.pow(cy-ay,2)), deg = Math.floor(Math.acos((bc*bc+ab*ab-ac*ac)/(2*bc*ab)) * 180 / Math.PI); angles.push(deg); } console.log(angles); var corners = this.getCorners(angles, sigPts); // get polar angle of corners for(var i=0; i<corners.length; i++) { corners[i].t = Math.floor(Math.atan2(corners[i].y - center.y, corners[i].x - center.x) * 180 / Math.PI); } console.log(corners); // whats the shape ? if(corners.length <= 1) { // circle return "CIRCLE"; } else if(corners.length == 2) { // circle || diamond // difference of polar angles var diff = Math.abs((corners[0].t - corners[1].t + 180) % 360 - 180); console.log(diff); if(diff <= this.circleThreshold) { return "CIRCLE"; } else { return "DIAMOND"; } } else if(corners.length == 4) { // rectangle || diamond // sum of polar angles of corners var sum = Math.abs(corners[0].t + corners[1].t + corners[2].t + corners[3].t); console.log(sum); if(sum <= this.rectangleThreshold) { return "RECTANGLE"; } else if(sum >= this.diamondThreshold) { return "DIAMOND"; } else { return "UNKNOWN"; } } else { alert("draw neater please"); return "UNKNOWN"; } } }; state.canvas.addEventListener("mousedown", (function(e) { if(!this.drawing) { this.ctx.clearRect(0, 0, 300, 300); this.points = []; this.drawing = true; console.log("drawing start"); } }).bind(state), false); state.canvas.addEventListener("mouseup", (function(e) { this.drawing = false; console.log("drawing stop"); this.draw("#f00"); alert(this.classify()); }).bind(state), false); state.canvas.addEventListener("mousemove", (function(e) { if(this.drawing) { var x = e.pageX, y = e.pageY; this.points.push({"x": x, "y": y}); this.ctx.fillStyle = "#000"; this.ctx.fillRect(x-2, y-2, 4, 4); } }).bind(state), false); 
+4
source

The linear Hough transform of a square or diamond should be easily recognized. They will produce four point masses. The square will be in pairs at zero and 90 degrees with the same y-coordinates for both pairs; in other words, a rectangle. A diamond will have two other angles corresponding to how thin a diamond is, for example. 45 and 135 or 60 and 120.

For the circle, you need the Hough circular transform, and it will create a single bright cluster of points in the 3d (x, y, r) Hough space.

OpenCV implements both linear and cyclic Hough transforms, and it is possible to run OpenCV on Android . These implementations include a threshold for identifying lines and circles. See page 329 and pg. 331 is the documentation here .

If you are not familiar with Hough transformations, the Wikipedia page is not bad.

Another algorithm that you may find interesting and possibly useful is given in this article on polygon similarity . I implemented it many years ago, and it is still around here . If you can convert shapes into vector loops, this algorithm could compare them with patterns, and a similarity index would show the goodness of fit. The algorithm ignores rotational orientation, so if your definition of square and diamond refers to the axes of the surface of the drawing, you will have to modify the algorithm a bit to distinguish between these cases.

+2
source

Being here is a fairly standard classification task in a potentially visible domain. You can do this in several ways, but the best way is unknown and can sometimes depend on the small details of the problem.

So, this is not an answer in itself, but there is a website - Kaggle.com, which holds a classification contest. One of the sampling / experiment tasks that they list is to read numerical digits with one hand. It is close enough to this problem that the same methods will almost certainly be applied quite well.

I suggest you go to https://www.kaggle.com/c/digit-recognizer and look around.

But if this is too vague, I can tell you that I read it and play with this problem space, that Random Forests is a better base starting place than neural networks.

+1
source

In this case (3 simple objects), you can try the RanSaC fitting for an ellipse (getting a circle) and lines (getting the sides of a rectangle or diamond) - on each connected object, if there are several objects for classification by at the same time. Based on the actual settings (expected size, etc.), the RanSaC parameters (how close the reference point should be as a voter, how many voters you need in minimun) should be configured. When you find the line with the RanSaC fitting, delete the close points and continue to the next line. The angles of the lines should clearly distinguish between a diamand and a rectangle.

+1
source

A very simple approach, optimized to classify precisely these three objects, may be as follows:

  • calculate the center of gravity of an object for classification
  • then calculate the distance of the center to the object points as a function of the angle (from 0 to 2 pi).
  • classify the resulting graph on the basis of smoothness and / or variance, as well as the position and height of local maxima and minima (possibly after smoothing the graph).
+1
source

I suggest a way to do this in the following steps: -

  • Take the convex hull of the image (consider the convex shapes)
  • segmented using clustering algorithms
  • Try to fit the curves or a straight line to it, and use the training set that you can use for classifications to measure and threshold.
  • For your application, try splitting into 4 clusters.
  • Once you classify clusters as a line or curves, you can use the information to get the curve: circle, rectangle or diamond
+1
source

I think the answers that already exist are good, but perhaps the best way to think about is that you should try to break the problem down into meaningful parts.

  • If possible, resolve the problem completely. For example, if you recognize gestures, just analyze the gestures in real time. Using gestures, you can provide feedback to the user about how your program interprets their gesture, and the user will change what they do accordingly.
  • Clear the image in question. Before you do anything, come up with an algorithm to try and choose what exactly you are trying to analyze. Also use the appropriate filter (possibly convolution) to remove image artifacts before starting the process.
  • Once you find out what exactly you are going to analyze, analyze it and return the score, one for the circle, one for the noise, one for the row and the last for the pyramid.
  • Repeat this step with the next viable candidate until you come up with a better candidate that is not noise.

I suspect that you will find that you do not need a complicated algorithm to search for a circle, line, or pyramid, but rather, it is about structuring your code accordingly.

+1
source

If I were you, I would use existing image processing libraries such as AForge.
Take a look at this sample article:
http://www.aforgenet.com/articles/shape_checker

0
source

I have a github jar that can help if you are willing to unpack it and obey the apache license. You can try to recreate it in any other language.

Its edge detector. The best step from there might be:

  • find angles (median 90 degrees)
  • find the median and maximum radius
  • find the angle of inclination / horizontal angle
  • the decision maker determines which form

Play with him and find what you want.

My can is open to the public at this address. He is not ready yet, but can help.

Just thought I could help. If someone wants to be part of the project, please do it.

0
source

I did this recently with the identification of circles (bone centers) in medical images.

Note. Steps 1-2 - this is if you capture the image.

Psuedo code steps

Step 1. Select the edges.
edges = edge_map(of the source image) (using edge detector (s))
(laymens: show lines / edges - make them searchable)

Step 2. Trace each unique rib.
I would like (use the nearest neighbor search 9x9 or 25x25) to identify / follow / track each edge, collecting each point in the list (they become neighbors) and taking into account the gradient at each point.
This step gives: a set of edges.
(where one edge/curve/line = list of [point_gradient_data_structure]s
(laymens: collect a set of points along the edge of the image)

Step 3. Analysis of each edge (points and gradient data)
For each edge,
if there is a similar gradient for a given region / set of neighbors (range of points along the edge), then we have a straight line .
If the gradient changes gradually, we have a curve .
Each region / range of points that are a straight line or curve has an average (central) and other gradient statistics.

Step 4. Object Detection
We can use the summary information from step 3 to draw conclusions about diamonds, circles, or squares. (i.e. 4 lines that have endpoints next to each other with appropriate gradients represent a diamond or a square. One (or more) curves with sufficient points / gradients (with a common focal point) makes a full circle).

Note. Using image pyramid can improve algorithm performance both in terms of results and in terms of speed.

This method (steps 1-4) would work with well-defined shapes, as well as be able to identify shapes that were drawn less than perfectly, and could handle slightly disabled lines (if necessary).


Note. When using some machine learning methods (mentioned by other posters), it would be useful / important to have good β€œclassifiers” to basically break down the problem into smaller parts / components, so then use the solution to better understand / β€œsee” the objects. I think machine learning can be a little difficult for this question, but it can still give reasonable results. PCA (face recognition) can also work.

0
source

Source: https://habr.com/ru/post/958966/


All Articles