• kersplomp@piefed.blahaj.zone
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    11 hours ago

    Really cool idea, but the site seems a bit biased for the chinese models, or is otherwise set up weird. I’m not able to reproduce how consistently bad the others are in web dev arena, which generally accepted as the gold standard for testing AI web dev ability.

    • AppleTea@lemmy.zip
      link
      fedilink
      arrow-up
      4
      ·
      10 hours ago

      Each model is allowed 2000 tokens to generate its clock. Here is its prompt: Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting.

      are you using the same prompt?

      • kersplomp@piefed.blahaj.zone
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 hours ago

        There’s a couple differences. It’s giving it the current time as part of the prompt, which is interesting. The other difference is that it’s asking it to make it responsive. But even when I use that exact prompt (inserting the time obv), it works fine on claude, openai, and gemini.

        So there’s definitely an issue specific to this page somewhere. Maybe it’s not iframing them? I’m on mobile so I can’t check.