Pretty much what that person did was create a solid, unmotioned frame in a program like photoshop, then filled in the caption bubble with blue (or green). In most programs there is a place to "bluescreen" your footage, which simply means make the blue portions of the video transparent so that you can add this as a top layer, overlapping the footage below. Then to add the motion they cropped the mouth out of the original clip and used a timing hold or freeze frame to lipsync. A pretty complicated description, but the execution is VERY easy.