• Adalast@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    11 hours ago

    I wasn’t attempting to attack what you said, merely pointing out that once you cross the line into philosophy things get really murky really fast.

    You assert that LLMs aren’t taught the rules, but every word is not just a word. The tokenization process includes part of speech tagging, predicate tagging, etc. The ‘rules’ that you are talking about are actually encapsulated in the tokenization process. The way the tokenization process for LLMs, at least as of a few years ago when I read a textbook on building LLMs, is predicated on the rules of the language. Parts of speech, syntax information, word commonality, etc. are all major parts of the ingestion process before training is done. They may not have had a teacher giving them the ‘rules’, but that does not mean it was not included in the training.

    And circling back to the philosophical question of what it means to “learn” or “know” something, you actually exhibited what I was talking about in your response on the math question. Putting to piles of apples on a table and counting them to find the total is a naïve application of the principals of addition to a situation, but it is not describing why addition operates the way it does. That answer does not get discussed until Number Theory in upper division math courses in college. If you have never taken that course or studied Number Theory independently, you do not know ‘why’ adding two numbers together gives you the total, you know ‘that’ adding two numbers together gives you the total, and that is enough for your life.

    Learning, and by extension knowledge, have many forms and processes that certainly do not look the same by comparison. Learning as a child is unrecognizable when compared directly to learning as an adult, especially in our society. Non-sapient animals all learn and have knowledge, but the processes for it are unintelligible to most people, save those who study animal intelligence. So to say the LLM does or does not “know” anything is to assert that their “knowing” or “learning” will be recognizable and intelligible to the lay man. Yes, I know that it is based on statistical mechanics, I studied those in my BS for Applied Mathematics. I know it is selecting the most likely word to follow what has been generated. The thing is, I recognize that I am doing exactly the same process right now, typing this message. I am deciding what sequence of words and tones of language will be approachable and relatable while still conveying the argument I wish to levy. Did I fail? Most certainly. I’m a pedantic neurodivergent piece of shit having a spirited discussion online, I am bound to fail because I know nothing about my audience aside from the prompt to which you gave me to respond. So I pose the question, when behaviors are symmetric, and outcomes are similar, how can an attribute be applied to one but not the other?