how is that relevant? If an ai model is marketed as multimodal, then reading a clock is one of the things you’d expect they can do. Because it’s explicitly marketed as being able to understand images.
There is a lot of other edges cases. They don’t work well with math, either explicit or even implied calculations. I’ve had cases where the LLM gave copytext from news releases for a summary figure, instead of actually doing the calculation from the parts provided.
Another massive failure with LLMs was subtitles editing. It’s still way faster and easier to use a specialized subtitle editing suite. I honestly couldn’t get the LLM to copy the source SRT file “to memory” (to then apply prompts).
They are marketed as “AI” when they are really just giving plausible output (that can often be very helpful) to a prompt.
It’s explicitly marketed as being able to understand images.
Are these marketers in this very room now? Do we really have any “AI” that is “explicitly marketed as being able to understand images”? Did you read all the fine text under asterics, if there are really some of such "AI"s?
I think there’s a vast difference between “I say I can take in images as input for prompts with limitations “ and “I’m using the wrong tool for a completely absurd use case” like your microscope analogy implies.
LLM is the wrong tool for image analysis, even if the providers say that it is possible. Possibility doesn’t mean effectiveness or even usefulness. Like a microscope and onions.
how is that relevant? If an ai model is marketed as multimodal, then reading a clock is one of the things you’d expect they can do. Because it’s explicitly marketed as being able to understand images.
LLM don’t seem suited for things like OCR.
There is a lot of other edges cases. They don’t work well with math, either explicit or even implied calculations. I’ve had cases where the LLM gave copytext from news releases for a summary figure, instead of actually doing the calculation from the parts provided.
Another massive failure with LLMs was subtitles editing. It’s still way faster and easier to use a specialized subtitle editing suite. I honestly couldn’t get the LLM to copy the source SRT file “to memory” (to then apply prompts).
They are marketed as “AI” when they are really just giving plausible output (that can often be very helpful) to a prompt.
Are these marketers in this very room now? Do we really have any “AI” that is “explicitly marketed as being able to understand images”? Did you read all the fine text under asterics, if there are really some of such "AI"s?
https://help.openai.com/en/articles/8400551-chatgpt-image-inputs-faq
Now read that FAQ. I see just a bunch of limitations descriptions, not a “I can read and correctly understand 100 percent of the images”
I think there’s a vast difference between “I say I can take in images as input for prompts with limitations “ and “I’m using the wrong tool for a completely absurd use case” like your microscope analogy implies.
LLM is the wrong tool for image analysis, even if the providers say that it is possible. Possibility doesn’t mean effectiveness or even usefulness. Like a microscope and onions.