Microsoft Kinect and background / environmental noise

I am currently programming the Microsoft Kinect program for the Windows SDK 2 on Windows 8.1. Everything is going well, and in the dev home environment there is clearly not much noise in the background compared to the "real world".

I would like to receive recommendations from those who have experience with real-world applications with Kinect. How does Kinect (especially v2) fare in a live environment with passers-by, spectators, and unexpected objects in the background? I expect that in the space from the Kinect sensor to the user, as a rule, there will be no interference, however, what I really remember right now is the background noise as such.

While I know that Kinect does not track well in direct sunlight (either on the sensor or on the user) - are there certain lighting conditions or other external factors that I need to consider in the code?

The answer I'm looking for is:

  • What problems can arise in a living environment?
  • How did you code or work around?
+6
source share
2 answers

I created the application for home use, as before, and presented this application in a public setting. The result was confusing for me, because there were a lot of mistakes that I would never have expected in a controlled environment. However, this helped me because it led me to add some interesting changes to my code that focuses only on detecting people.

  • There are conditions to verify the validity of the "man."

    When I showed my application in the middle of the presentation floor with many other objects and details, I found that even chairs can be mistaken for people for a short time, which led to switching my application between the user and the inanimate object, as a result of which he lost control over the user and lost his progress. To counter these or other false positive human findings, I added my own additional checks for the person. My most successful method was to compare the proportions of the human body. I implemented this measurement in head units. ( image of the main blocks ) Below is the code of how I did it (SDK version 1.8, C #)

    bool PersonDetected = false; double[] humanRatios = { 1.0f, 4.0, 2.33, 3.0 }; /*Array indexes * 0 - Head (shoulder to head) * 1 - Leg length (foot to knee to hip) * 2 - Width (shoulder to shoulder center to shoulder) * 3 - Torso (hips to shoulder) */ .... double[] currentRatios = new double[4]; double headSize = Distance(skeletons[0].Joints[JointType.ShoulderCenter], skeletons[0].Joints[JointType.Head]); currentRatios[0] = 1.0f; currentRatios[1] = (Distance(skeletons[0].Joints[JointType.FootLeft], skeletons[0].Joints[JointType.KneeLeft]) + Distance(skeletons[0].Joints[JointType.KneeLeft], skeletons[0].Joints[JointType.HipLeft])) / headSize; currentRatios[2] = (Distance(skeletons[0].Joints[JointType.ShoulderLeft], skeletons[0].Joints[JointType.ShoulderCenter]) + Distance(skeletons[0].Joints[JointType.ShoulderCenter], skeletons[0].Joints[JointType.ShoulderRight])) / headSize; currentRatios[3] = Distance(skeletons[0].Joints[JointType.HipCenter], skeletons[0].Joints[JointType.ShoulderCenter]) / headSize; int correctProportions = 0; for (int i = 1; i < currentRatios.Length; i++) { diff = currentRatios[i] - humanRatios[i]; if (abs(diff) <= MaximumDiff)//I used .2 for my MaximumDiff correctProportions++; } if (correctProportions >= 2) PersonDetected = true; 

    Another method with which I was successful was to find the average value of the total distance between the squares. I found that non-human discoveries had more variable cumulative distances, while people were more consistent. The average value that I found out using a one-dimensional vector support media machine (I found that user distances were less than 9)

     //in AllFramesReady or SkeletalFrameReady Skeleton data; ... float lastPosX = 0; // trying to detect false-positives float lastPosY = 0; float lastPosZ = 0; float diff = 0; foreach (Joint joint in data.Joints) { //add the distance squared diff += (joint.Position.X - lastPosX) * (joint.Position.X - lastPosX); diff += (joint.Position.Y - lastPosY) * (joint.Position.Y - lastPosY); diff += (joint.Position.Z - lastPosZ) * (joint.Position.Z - lastPosZ); lastPosX = joint.Position.X; lastPosY = joint.Position.Y; lastPosZ = joint.Position.Z; } if (diff < 9)//this is what my svm learned PersonDetected = true; 
  • Use player identifiers and indexes to remember who is who

    This is due to a previous problem, where if Kinect switched the two users that he was tracking to others, then my application will crash due to sudden changes in the data. To counter this, I would track both the skeletal index of each player and the player identifier. To learn more about how I did this, see Kinect User Discovery .

  • Add customizable options to adapt to different situations

    When I imagined, the same tilt angle and other basic Kintz parameters (for example, close to the mode) did not work in the new environment. Allow the user to customize some of these options so that they can get the best setting for the job.

  • Expect people to do stupid things

    The next time I introduced, I had an adjustable tilt, and you can guess someone burned the Kinect motor. Everything that can be broken on Kinect, someone will break. It is not enough to leave a warning in the documentation. You should add cautionary checks on Kinect equipment to make sure people are not upset when they inadvertently break something. Here is the code that checks if the user has used the engine more than 20 times in two minutes.

     int motorAdjustments = 0; DateTime firstAdjustment; ... //in motor adjustment code if (motorAdjustments == 0) firstAdjustment = DateTime.Now; ++motorAdjustments; if (motorAdjustments < 20) { //adjust the tilt } else { DateTime timeCheck = firstAdjustment; if (DateTime.Now > timeCheck.AddMinutes(2)) { //reset all variables motorAdjustments = 1; firstAdjustment = DateTime.Now; //adjust the tilt } } 

    I would notice that all these were problems for me with the first version of Kinect, and I don’t know how many of them were solved in the second version, because I, unfortunately, have not figured it out yet. However, I would still implemented some of these methods, if you do not use auxiliary methods, because there will be exceptions, especially in computer vision.

+4
source

Outlaw Lemur described in detail most of the problems that you may encounter in real-world scenarios.

Using Kinect for Windows version 2, you do not need to configure the engine, since there is no engine, and the sensor has more field of view. It will make your life easier.

I would like to add the following tips and tricks:

1) Avoid direct light (physical or internal lighting)

Kinect has an infrared sensor that can be confusing. This sensor should not be in direct contact with any light sources. You can emulate such an environment in your home / office by playing with a regular laser pointer and torches.

2) If you are tracking only one person, select the closest tracked user

If your application needs only one player, that player must be a) fully tracked and b) closer to the sensor than the others. This is an easy way to get participants to understand who is being tracked without making your interface more complex.

  public static Body Default(this IEnumerable<Body> bodies) { Body result = null; double closestBodyDistance = double.MaxValue; foreach (var body in bodies) { if (body.IsTracked) { var position = body.Joints[JointType.SpineBase].Position; var distance = position.Length(); if (result == null || distance < closestBodyDistance) { result = body; closestBodyDistance = distance; } } } return result; } 

3) Use tracking identifiers to distinguish between different players

Each player has a TrackingID property. Use this property when players intervene or move in random positions. Do not use this property as an alternative to face recognition.

  ulong _trackinfID1 = 0; ulong _trackingID2 = 0; void BodyReader_FrameArrived(object sender, BodyFrameArrivedEventArgs e) { using (var frame = e.FrameReference.AcquireFrame()) { if (frame != null) { frame.GetAndRefreshBodyData(_bodies); var bodies = _bodies.Where(b => b.IsTracked).ToList(); if (bodies != null && bodies.Count >= 2 && _trackinfID1 == 0 && _trackingID2 == 0) { _trackinfID1 = bodies[0].TrackingId; _trackingID2 = bodies[1].TrackingId; // Alternatively, specidy body1 and body2 according to their distance from the sensor. } Body first = bodies.Where(b => b.TrackingId == _trackinfID1).FirstOrDefault(); Body second = bodies.Where(b => b.TrackingId == _trackingID2).FirstOrDefault(); if (first != null) { // Do something... } if (second != null) { // Do something... } } } } 

4) Display alerts when the player is too far away or too close to the sensor.

To achieve greater accuracy, players need to stand at a certain distance: not too far or too close to the sensor. Here's how to check it out:

 const double MIN_DISTANCE = 1.0; // in meters const double MAX_DISTANCE = 4.0; // in meters double distance = body.Joints[JointType.SpineBase].Position.Z; // in meters, too if (distance > MAX_DISTANCE) { // Prompt the player to move closer. } else if (distance < MIN_DISTANCE) { // Prompt the player to move farther. } else { // Player is in the right distance. } 

5) Always know when a player has entered or left the scene.

Vitruvius provides an easy way to understand when someone has entered or left the stage.

Here is the source code , and here is how to use it in your application:

  UsersController userReporter = new UsersController(); userReporter.BodyEntered += UserReporter_BodyEntered; userReporter.BodyLeft += UserReporter_BodyLeft; userReporter.Start(); void UserReporter_BodyEntered(object sender, UsersControllerEventArgs e) { // A new user has entered the scene. Get the ID from e param. } void UserReporter_BodyLeft(object sender, UsersControllerEventArgs e) { // A user has left the scene. Get the ID from e param. } 

6) Have a visual clue about which player is being tracked

If there are a lot of people around the player, you may need to show on the screen who is being tracked. You can select a depth frame bitmap or use Microsoft Kinect Interactions.

This is an example of removing the background and saving the player’s pixels .

7) Avoid glossy floors

Some floors (bright, glossy) can reflect people, and Kinect can confuse some of their joints (for example, Kinect can stretch its legs to a reflected body). If you cannot avoid glossy floors, use the FloorClipPlane property of your BodyFrame. However, the best solution would be to have a simple rug where you expect people to stand. The carpet will also act as an indicator of the correct distance, so you will provide a better user interface.

+5
source

Source: https://habr.com/ru/post/980738/


All Articles