Many natural language processing (NLP) models can understand language but are ambiguous about images.
Between Images and Text: CLIP
Many natural language processing (NLP) models can understand language but are ambiguous about images.