ClockBench: Even the best AI models can't reliably read the clock

Pro@programming.dev · edit-2 2 days ago

ClockBench: Even the best AI models can't reliably read the clock

vrighter@discuss.tchncs.de · 2 days ago

how is that relevant? If an ai model is marketed as multimodal, then reading a clock is one of the things you’d expect they can do. Because it’s explicitly marketed as being able to understand images.

Alphane Moon@lemmy.world · 1 day ago

LLM don’t seem suited for things like OCR.

There is a lot of other edges cases. They don’t work well with math, either explicit or even implied calculations. I’ve had cases where the LLM gave copytext from news releases for a summary figure, instead of actually doing the calculation from the parts provided.

Another massive failure with LLMs was subtitles editing. It’s still way faster and easier to use a specialized subtitle editing suite. I honestly couldn’t get the LLM to copy the source SRT file “to memory” (to then apply prompts).

They are marketed as “AI” when they are really just giving plausible output (that can often be very helpful) to a prompt.

Lembot_0004@discuss.online · 2 days ago

It’s explicitly marketed as being able to understand images.

Are these marketers in this very room now? Do we really have any “AI” that is “explicitly marketed as being able to understand images”? Did you read all the fine text under asterics, if there are really some of such "AI"s?

criss_cross@lemmy.world · 1 day ago

https://help.openai.com/en/articles/8400551-chatgpt-image-inputs-faq

Lembot_0004@discuss.online · 1 day ago

Now read that FAQ. I see just a bunch of limitations descriptions, not a “I can read and correctly understand 100 percent of the images”

criss_cross@lemmy.world · 1 day ago

I think there’s a vast difference between “I say I can take in images as input for prompts with limitations “ and “I’m using the wrong tool for a completely absurd use case” like your microscope analogy implies.

Lembot_0004@discuss.online · 1 day ago

LLM is the wrong tool for image analysis, even if the providers say that it is possible. Possibility doesn’t mean effectiveness or even usefulness. Like a microscope and onions.

ClockBench: Even the best AI models can't reliably read the clock

ClockBench: Even the best AI models can't reliably read the clock

ClockBench AI Benchmark