Upload an image, select your native language (L1) and target language (L2), and get a generated caption with text-to-speech audio.