• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    11 days ago

    One thing about Anthropic/OpenAI models is they go off the rails with lots of conversation turns or long contexts. Like when they need to remember a lot of vending machine conversation I guess.

    A more objective look: https://arxiv.org/abs/2505.06120v1

    https://github.com/NVIDIA/RULER

    Gemini is much better. TBH the only models I’ve seen that are half decent at this are:

    • “Alternate attention” models like Gemini, Jamba Large or Falcon H1, depending on the iteration. Some recent versions of Gemini kinda lose this, then get it back.

    • Models finetuned specifically for this, like roleplay models or the Samantha model trained on therapy-style chat.

    But most models are overtuned for oneshots like fix this table or write me a function, and don’t invest much in long context performance because it’s not very flashy.