Vocal_Eyes😎🕶👨‍🦯

Converts images into scene graph triplets then into a short factual description.

Examples