I managed to do this without explicitly using matrices. I used Java, so the syntax is different, but comparable. One of the things I used is the mix () function. It returns value1 when factor is 1 and value2 when factor is 0, and has a linear transition for each value between them.
private double mix(double value1, double value2, double factor) { return (value1 * factor) + (value2 * (1 - factor)); }
When I call this function, I use value1 for perspective and mix(focalLength/voxel.z, orthoZoom, factor) for spelling, for example: mix(focalLength/voxel.z, orthoZoom, factor)
When determining your focal length and spelling ratio, it is useful to know that anything at the distance of focalLength/orthoZoom from the camera will be projected to the same point throughout the transition.
Hope this helps. You can download my program to see how it looks at https://github.com/npetrangelo/3rd-Dimension/releases .
source share