Disclaimer: The reason you still haven’t received an answer to this question is probably due to the current research problem. Therefore, I cannot give you a direct answer, but I will try to help with some information and useful resources for this topic.
There are only two different approaches to creating a skeleton from a depth map. The first is the use of machine learning, the second is purely algorithmic.
For machine learning, you will need many patterns of people performing a given move, and use these patterns to teach your favorite learning algorithm. The approach Microsoft has taken and implemented in the XBox ( source ) works very well, but you need millions of samples to make it reliable ... quite a flaw.
The “algorithmic” approach (understanding without using a set of workouts) can be implemented in different ways and is a research problem. It is often based on modeling possible body postures and tries to match what the resulting image is to depth. This is the approach PrimeSense (guys behind kinect camera depth technology) chose for its NITE skeleton tracking tool.
The OpenKinect community supports the wiki, which lists some interesting research materials about this topic. You may also be interested in this thread on the OpenNI mailing list .
If you are looking for an implementation of a skeleton tracking tool, PrimeSense has released NITE ( closed source), the one they made: this is part of the OpenNI framework . This is what is used in most videos that you may have seen, including skeleton tracking. I think he is capable of processing up to 2 skeletons at a time, but this requires confirmation.
source share