It depends on what kind of transformation you use to convert 3D real-world coordinates to 2d screen coordinates, for example. perspective, isometry, etc. Usually you will have a conversion forward (3d → 2d) and backward (2d → 3d) to a game where the inverse transformation loses information. (i.e. moving each three-dimensional point will be mapped to a single 2d point, but returning from a point may not give the same three-dimensional point). You can often project a mouse point onto an object to get the missing dimension.
For drag and drop, you usually get the user to specify the operation (translation on the projection plane, zooming in or out, rotation around the reference point). Your input is the mouse coordinate at the beginning and end of the drag and drop that you transform into your 3d coordinate system to get two 3d coordinates that will give you dx, dy, dz for drag / drop, etc.
source share