The general idea here is that you want to transfer your raw images through an encoder and encode the file this way. The encoder will take care of generating all of your keyframes and intermediate (P and B) frames, as well as generating any necessary decoding metadata that you need to save. In addition, using an encoding tool such as ffmpeg will also take care of saving the video file in a well-known container format and properly structuring your video headers. All this is difficult and tedious to do manually, not to mention the propensity to make mistakes.
Do you use ffmpeg or some other encoder for you. I suggest using ffmpeg because it has the necessary functionality. If you want to do all this in code, ffmpeg is open source, and you can wrap the parts you need in the .net shell and invoke things like that. Keep in mind ffmpeg licenses if you are developing a redistributable application.
This should start: Creating movies from image files using ffmpeg / mencoder
To add sound, check this out: https://stackoverflow.com/questions/1329333/how-can-i-add-audio-mp3-to-a-flv-just-video-with-ffmpeg
Now, if you want to synchronize audio and video (let's say the sequence of images is the people who speak, and the sound is their speech), you have a much more complex problem at your fingertips. At this point, you need to correctly multiplex audio and video frames based on their duration. FFMpeg will probably not do this, since it will set each image in your video sequence to play with the same duration, which usually does not correlate properly with sound frames.
source share