• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    13
    ·
    edit-2
    14 days ago

    Also, for any interested, desktop inference and quantization is my autistic interest. Ask my anything.

    I don’t like Gemma 4 much so far, but if you want to try it anyway:


    But TBH I’d point most people to Qwen 3.5/3.6 or Step 3.5 instead. They seem big, but being sparse MoEs, they can run quite quickly on single-GPU desktops: https://huggingface.co/models?other=ik_llama.cpp&sort=modified

      • brucethemoose@lemmy.world
        link
        fedilink
        English
        arrow-up
        9
        ·
        edit-2
        14 days ago

        Ughhh, I could go on forever, but to keep it short:

        Basically, the devs are Tech Bros. They’re scammer-adjacent. I’ve been in local inference for years, and wouldn’t touch ollama if you paid me to. I’d trust Gemini API over them any day.

        I’d recommend base llama.cpp or ik_llama.cpp or kobold.cpp, but if you must use an “turnkey” and popular UI, LMStudio is way better.

        But the problem is, if you want a performant local LLM, nothing about local inference is really turnkey. It’s just too hardware sensitive, and moves too fast.