I’ve been looking for a way to use speech recognition to automate the transcription of interviews, meetings, speeches, conference presentations, and so on.
I spend a lot of time on the phone interviewing experts for the articles and reports I write. Normally I conduct the interview with a headset and do my best to type a transcript of what is said. I’m slow and a terrible typist, so my transcript misses a lot and comes out with many misspellings that are impossible to correct. Usually for an hour-long interview it takes me another hour to go through and fix mistakes, filling in gaps, and making guesses at uninterpretable words.
I would greatly benefit from a speech recognition solution that could create a fairly accurate transcript from audio, for example, live over the phone or from an mp3 file.
This need was emphasized to me even more this week, when I attended a conference and spent two days trying to take notes and capture useful quotes from speakers. I have a digital voice recorder and have all of the presentations in mp3 format, but it’s going to be quite a challenge to comb through all of that audio to find relevant quotes for the articles I will be writing about the conference. How much easier it would be it I had a software application that could convert all of those mp3s into fairly accurate text transcripts!
Unfortunately, it appears that voice recognition software is not ready to handle meetings and so on where multiple voices are involved. These systems have to be trained to recognized the voice of a single user.
I’m using this blog post to mark and share some possible solutions I have encountered. I will plan to add to this list as time goes — if and when the technology continues to improve.
+ Dragon Naturally Speaking by Nuance is supposed to be the best reasonably-priced speech recognition software for professional use. Nuance says Dragon is not able to transcribe multiple voices, but I’m tempted to shell out the $200 just to see what kind of results I might get with it. Suppose it were 50 percent accurate transcribing unfamiliar voices? That might be good enough for me.
+ Windows has its own built-in speech recognition capability. I plan to test this out to see whether I can make it work somehow. However, it’s hard to believe that Microsoft could come up with a better solution than a specialist company like Nuance.
+ One suggestion I’ve run into a lot is to transcribe a meeting or lecture by “parroting” or “re-speaking.” In other words, using speech rec software like Dragon, you listen to the recording of the meeting on headphones and repeat what you hear into your computer mic. Because Dragon is trained to your voice, it can create an automatic transcript. Sounds laborious, but it would probably be better that having to type it all out myself.
+ I also heard about a company called Koemei that has a cloud-based solution for converting video and audio assets into text. Looks as if this might work pretty well, however, their entry-level service is $149 per month. That sounds like a lot, but maybe someday…. For $20 per month I would definitely try it.
+ Another idea I have thought of is to call my Google Voice number and play the audio recording into my voicemail. Google Voice automatically transcribes my voicemails into text and often does an acceptable job — good enough so I could paste the results into a word processor and make quick corrections. I’m not sure yet if Google Voice can handle long audio streams, though. I’m thinking about testing this solution to see if I can make it work somehow.
+ Here’s an interesting video by Chaelaz showing how to use YouTube’s closed-captioning transcription service to convert audio to text. Looks as if you would have to create a video first and upload it to YouTube, but that’s an interesting possible work-around for what I’m trying to do.
ARB — 21 June 2013