OK. I'm at wit's end attempting to convince Google's LLM to pronounce an English name correctly.

Powderhorn@beehaw.org · 2 months ago

OK. I'm at wit's end attempting to convince Google's LLM to pronounce an English name correctly.

PonyOfWar@pawb.social · 2 months ago

Did you try spelling it with phonetics?

Powderhorn@beehaw.org · 2 months ago

I know IPA (the linguistic term, not the beer … OK, I also know the beer, but that’s not important right now) … and, yeah, I tried that, but on a laptop without a numpad, it’s a bit of a slog.

What was maddening was the LLM got it right somewhere around 10% of the time after I corrected it. This was a voice conversation, so every time I corrected it, that should have been clear data. Aren’t these systems simply supposed to be pattern recognition? How is it outputting wildly different pronunciations (N>5) with constant inputs?

TehPers@beehaw.org · edit-2 2 months ago

The models themselves are nondeterministic. Also, they tend to include a hidden (or sometimes visible) random seed that gets input into the models as well.

Powderhorn@beehaw.org · 2 months ago

How delightful. I mean, I knew there were reasons you don’t get the same results twice, but I’ve not dived into how all this works, as it seems to be complete bullshit. But it’s nice to hear that’s a feature.

howrar@lemmy.ca · edit-2 2 months ago

I’m pretty sure whatever voice system you’re using is just transcribing things to text and feeding it into an LLM, so it wouldn’t actually have that audio data. I’m not aware of any audio equivalent of LLMs existing.

Powderhorn@beehaw.org · edit-2 2 months ago

The equivalent is NLP (natural language processing), which was already a huge research area in the '90s. In fact, had I not been a fucking idiot and caught the journalism bug, with my studies in CS and linguistics, I’d likely be doing quite well.

This said, that was about voice input being converted to text – e.g., Dragon Naturally Speaking – but apparently little progress has been made going in the other direction. NotebookLM had other weird glitches where standard English words get weird vowels some 5% of the time.