I know LLM are all the rage, but I need to talk about just how good the voice recognition of ChatGPT is. It's absolutely amazing, and I just can't deal with other voice recognition anymore. Why isn't this more talked about. The model they use for voice to text is simply the best, it's the first voice recognition I use that just understands me as good as a human would.
I always work with claude, use chatgpt for dictation and no patience, use gemini for discussing buddhism. Highly recommend Buddhism part š, Gemini knows it too well.
I gotta give it to them, ChatGPT voice recognition is quite amazing, including their chatbot. For overall LLM response quality, I donāt think they are very special. Many LLMs, at the premium levels, are quite good, or perhaps better. Most recently, Iāve been defaulting to Gemini 2.5 Pro for most technical questions. Iām guessing thereās also something about the combination of Gemini and the (obviously) superior Google search index that makes it stand out.
maybe the LLMs are all talking about it amongst themselves and we just can't hear them
Gemini doesnāt know I always first open chatgpt to do transcribe. Chatgpt doesnāt know, I always paste its transcription to Gemini.
@shiyi.gu Is this true? chatgpt for voice dictation, Gemini as your general purpose?
I used it to try and practice my French and it was pretty good, especially interactive mode on the app, however I think there was a limit for interactive mode and I stopped using it to preserve it haha
It's the only one that seems to understand me when I switch from French to English to Spanish mid prompt
And on that note, I wish there was a voice recognition LLM hybrid specialized for coding dictation. Something where I could be like: "defn add let foo 2 + 2 in the body use map to sum it actually rename it to bar" And it'll edit in almost realtime what I say, like a human would.
I have high hopes & rooting for Neuralink. Conveying word-thoughts with fingers or thumbs is so clumsy. For example, to get "conveying" in the previous sentence I had to attempt it 4 times. (I'm lying horizontally, stretching my neck, using my index finger to glide type.)
@didibus out here, vibe coding reality in #off-topic
@feedmyinbox02_clojuri Have you tried the ChatGPT voice dictation? Bet it would get conveying right the first time
I think what they need to add is a mix mode where they fuse the speech to text and agentic LLM text editing. Because now if you just say: "I wanted to ask no delete that I am in need of help" It'll just type that. We need to merge it with the agent intelligence to understand what I want the actual things to edit and type to be versus what is higher level directions
If someone wants to steal this idea and make a startup with VC funding go for it by the way haha
Eye tracking. Head tracking. Multi party conversation tracking
Zoom 2.0?
I guess it'd have to integrate into your os
Control the mouse, keyboard, see the screen, camera, etc
@didibus interesting idea. I still dislike speech; hate to have to vocalize but it's an option. As you say, it knows no difference between instructions and a string. It's strings all the way down.
If neural patterns producing strings versus āinstructions for your string-producerā are distinct, Neuralink would be nice.
Well call me old fashioned, but I'd rather speech than brain surgery š¤£
But I do think that speech sucks because it sucks. Like tall to my wife or coworkers isn't annoying or difficult. If I could talk to a computer the same it be great.