LLMs are solving MCAT, the bar test, SAT etc like they’re nothing. At this point their performance is super human. However they’ll often trip on super simple common sense questions, they’ll struggle with creative thinking.

Is this literally proof that standard tests are not a good measure of intelligence?

  • originalfrozenbanana@lemm.ee
    link
    fedilink
    arrow-up
    7
    arrow-down
    1
    ·
    8 months ago

    Citation needed that LLMs are passing these tests like they’re nothing.

    LLMs don’t have intelligence, they are sentence generators. Sometimes those sentences are correct, sometimes they’re gobbledygook.

    For instance, they fabricate real-looking but nevertheless totally fake citations in research papers https://www.nature.com/articles/s41598-023-41032-5

    To your point we already know standardized tests are biased and poor tools to measure intelligence. Partly that’s because they don’t actually measure intelligence- they often measure rote knowledge. We don’t need LLMs to make that determination, we already can.

    • givesomefucks@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      8 months ago

      OP picked standardized tests that only require memorization because they have zero idea what a real IQ test like the WAIS is like.

      Also how those IQ tests work. You kind of have to go in “blind” to get an accurate result. And LLM can’t do anything “blind” because you have to train them.

      A chatbots can’t even take a real IQ test, if we trained a chatbots to take a real IQ test, it would be a pointless test

      • JackGreenEarth@lemm.ee
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        1
        ·
        8 months ago

        Nobody is a blank slate. Everyone has knowledge from their past experience, and instincts from their genetics. AIs are the same. They are trained on various things just as humans have experienced various things, but they can be just as blind as each other on the contents of the test.

        • givesomefucks@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          8 months ago

          No, they wouldn’t.

          Because real IQ tests arent just multiple choice exams

          You would have to train it to handle the different tasks, and training it at the tasks would make it better at the tasks, raising their scores.

          I don’t know if the issue is you don’t know about how IQ tests work, or what LLM can do.

          But it’s probably both instead of one or the other.

    • EdibleFriend@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      8 months ago

      Talked about this a few times over the last few weeks but here we go again…

      I am teaching myself to write and had been using chatgpt for super basic grammar assistance. Seemed like an ideal thing, toss a sentence I was iffy about into it and ask it what it thought. After all I wasn’t going to be asking it some college level shit. A few days ago I asked it about something I was questionable on. I honestly can’t remember the details but it completely ignored the part of the sentence I wasn’t sure about and told me something else was wrong. What it said was wrong was just…not wrong. The ‘correction’ it gave me was some shit a third grader would look at and say ‘uhhhhh…I’m gonna ask someone else now…’

      • Ottomateeverything@lemmy.world
        link
        fedilink
        arrow-up
        4
        arrow-down
        1
        ·
        8 months ago

        That’s because LLMs aren’t intelligent. They’re just parrots that repeat what they’ve heard before. This stuff being sold as an “AI” with any “intelligence” is extremely misleading and causing people to think it’s going to be able to do things it can’t.

        Case in point, you were using it and trusting it until it became very obvious it was wrong. How many people never get to that point? How much has it done wrong before then? Etc.