• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    edit-2
    21 hours ago

    It’s not theoretical. They’ve already released an 300B LLM dubbed Pangu Pro, trained on Huawei NPUs:

    https://huggingface.co/papers/2505.21411

    And it’s open weights!

    https://huggingface.co/IntervitensInc/pangu-pro-moe-model

    It’s actually a really neat model: the experts are split into 8 ‘groups’ and routed so that the same number are active in each group at any given time. In other words, it’s specifically architected for 8X Huawei NPU servers, so that there’s no excessive cross-communication or idle time between them.

    So yeah, even if it’s not a B200, proof’s in the puddin, and huge models are being trained and run on these things.