Side by side with audio from one of the inputs

Two separate scenes from Gymkata, but probably a bad example of choice, because it seems like two people are hiding behind a wall with their faces and watching the fighting of an epic battle.
Assuming both videos have the same format, frame size, frame rate, etc:
ffmpeg -i videoandaudio.mp4 -i video.mp4 -filter_complex \
"[0:v]pad=iw*2:ih,setpts=PTS-STARTPTS[bg]; \
[1:v]setpts=PTS-STARTPTS[fg]; \
[bg][fg]overlay=shortest=1:x=w,format=yuv420p[filtered]" \
-map "[filtered]" -map 0:a -codec:a copy output.mp4
This filtergraph will be:
: