• vrighter@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    1
    ·
    2 days ago

    how is that relevant? If an ai model is marketed as multimodal, then reading a clock is one of the things you’d expect they can do. Because it’s explicitly marketed as being able to understand images.

    • Alphane Moon@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      LLM don’t seem suited for things like OCR.

      There is a lot of other edges cases. They don’t work well with math, either explicit or even implied calculations. I’ve had cases where the LLM gave copytext from news releases for a summary figure, instead of actually doing the calculation from the parts provided.

      Another massive failure with LLMs was subtitles editing. It’s still way faster and easier to use a specialized subtitle editing suite. I honestly couldn’t get the LLM to copy the source SRT file “to memory” (to then apply prompts).

      They are marketed as “AI” when they are really just giving plausible output (that can often be very helpful) to a prompt.

    • Lembot_0004@discuss.online
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      13
      ·
      2 days ago

      It’s explicitly marketed as being able to understand images.

      Are these marketers in this very room now? Do we really have any “AI” that is “explicitly marketed as being able to understand images”? Did you read all the fine text under asterics, if there are really some of such "AI"s?