• qt0x40490FDB@lemmy.ml
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    1 day ago

    So, how would you define AGI, and what sorts of tasks require reasoning? I would have thought earning the gold medal on the IMO would have been a reasoning task, but I’m happy to learn why I’m wrong.

    • terrific@lemmy.ml
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 day ago

      I definitely think that’s remarkable. But I don’t think scoring high on an external measure like a test is enough to prove the ability to reason. For reasoning, the process matters, IMO.

      Reasoning models work by Chain-of-Thought which has been shown to provide some false reassurances about their process https://arxiv.org/abs/2305.04388 .

      Maybe passing some math test is enough evidence for you but I think it matters what’s inside the box. For me it’s only proved that tests are a poor measure of the ability to reason.

      • qt0x40490FDB@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        24 hours ago

        I’m sorry, but this reads to me like “I am certain I am right, so evidence that implies I’m wrong must be wrong.” And while sometimes that really is the right approach to take, more often than not you really should update the confidence in your hypothesis rather than discarding contradictory data.

        But, there must be SOMETHING which is a good measure of the ability to reason, yes? If reasoning is an actual thing that actually exists, then it must be detectable, and there must be a way to detect it. What benchmark do you purpose?

        You don’t have to seriously answer, but I hope you see where I’m coming from. I assume you’ve read Searle, and I cannot express to you the contempt in which I hold him. I think, if we are to be scientists and not philosophers (and good philosophers should be scientists too) we have to look to the external world to test our theories.

        For me, what goes on inside does matter, but what goes on inside everyone everywhere is just math, and I haven’t formed an opinion about what math is really most efficient at instantiating reasoning, or thinking, or whatever you want to talk about.

        To be honest, the other day I was convinced it was actually derivatives and integrals, and, because of this, that analog computers would make much better AIs than digital computers. (But Hava Siegelmann’s book is expensive, and, while I had briefly lifted my book buying moratorium, I think I have to impose it again).

        Hell, maybe Penrose is right and we need quantum effects (I really really really doubt it, but, to the extent that it is possible for me, I try to keep an open mind).

        🤷‍♂️

        • terrific@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          ·
          22 hours ago

          I’m not sure I can give a satisfying answer. There are a lot of moving parts here, and a big issue here is definitions which you also touch upon with your reference to Searle.

          I agree with the sentiment that there must be some objective measure of reasoning ability. To me, reasoning is more than following logical rules. It’s also about interpreting the intent of the task. The reasoning models are very sensitive to initial conditions and tend to drift when the question is not super precise or if they don’t have sufficient context.

          The AI models are in a sense very fragile to the input. Organic intelligence on the other hand is resilient and also heuristic. I don’t have any specific idea for the test, but it should test the ability to solve a very ill-posed problem.

    • cmhe@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 day ago

      I think we also should require to set some energy limits to those tests. Before it was assumed that those tests are done by humans, that can do those tests after eating some crackers and a bit of water.

      Now we are comparing that to massive data centers that need nuclear reactors to have enough power to work through these problems…