I’ve been looking for a way to use speech recognition to automate the transcription of interviews, meetings, speeches, conference presentations, and so on.
I spend a lot of time on the phone interviewing experts for the articles and reports I write. Normally I conduct the interview with a headset and do my best to type a transcript of what is said. I’m slow and a terrible typist, so my transcript misses a lot and comes out with many misspellings that are impossible to correct. Usually for an hour-long interview it takes me another hour to go through and fix mistakes, filling in gaps, and making guesses at uninterpretable words.
I would greatly benefit from a speech recognition solution that could create a fairly accurate transcript from audio, for example, live over the phone or from an mp3 file.
This need was emphasized to me even more this week, when I attended a conference and spent two days trying to take notes and capture useful quotes from speakers. I have a digital voice recorder and have all of the presentations in mp3 format, but it’s going to be quite a challenge to comb through all of that audio to find relevant quotes for the articles I will be writing about the conference. How much easier it would be it I had a software application that could convert all of those mp3s into fairly accurate text transcripts!
Unfortunately, it appears that voice recognition software is not ready to handle meetings and so on where multiple voices are involved. These systems have to be trained to recognized the voice of a single user.
I’m using this blog post to mark and share some possible solutions I have encountered. I will plan to add to this list as time goes — if and when the technology continues to improve.
+ Dragon Naturally Speaking by Nuance is supposed to be the best reasonably-priced speech recognition software for professional use. Nuance says Dragon is not able to transcribe multiple voices, but I’m tempted to shell out the $200 just to see what kind of results I might get with it. Suppose it were 50 percent accurate transcribing unfamiliar voices? That might be good enough for me.
+ Windows has its own built-in speech recognition capability. I plan to test this out to see whether I can make it work somehow. However, it’s hard to believe that Microsoft could come up with a better solution than a specialist company like Nuance.
+ One suggestion I’ve run into a lot is to transcribe a meeting or lecture by “parroting” or “re-speaking.” In other words, using speech rec software like Dragon, you listen to the recording of the meeting on headphones and repeat what you hear into your computer mic. Because Dragon is trained to your voice, it can create an automatic transcript. Sounds laborious, but it would probably be better that having to type it all out myself.
+ I also heard about a company called Koemei that has a cloud-based solution for converting video and audio assets into text. Looks as if this might work pretty well, however, their entry-level service is $149 per month. That sounds like a lot, but maybe someday…. For $20 per month I would definitely try it.
+ Another idea I have thought of is to call my Google Voice number and play the audio recording into my voicemail. Google Voice automatically transcribes my voicemails into text and often does an acceptable job — good enough so I could paste the results into a word processor and make quick corrections. I’m not sure yet if Google Voice can handle long audio streams, though. I’m thinking about testing this solution to see if I can make it work somehow.
+ Here’s an interesting video by Chaelaz showing how to use YouTube’s closed-captioning transcription service to convert audio to text. Looks as if you would have to create a video first and upload it to YouTube, but that’s an interesting possible work-around for what I’m trying to do.
ARB — 21 June 2013
14 thoughts on “Using Speech Recognition to Automatically Transcribe Interviews, Meetings, and Speeches”
Hi – You have hit upon the next big thing for the computer – speech-to-text transcription software that can handle several different speakers, even two. From what I’ve read, two of the biggest difficulties are (1) inability of the software to detect a change from one voice to another and (2) inabililty of the software to supply punctuation where needed. I would be so interested to see what you find on your journey toward a better system for multi-voice transcription. Also, just heard about Koemei on a jobs posting site which lists a prospective client looking for someone to “enter the Koemei interface” in order to proofread and clean up already transcribed texts. Would you happen to know how that is done? Thanks so much for your inquiring mind – cordially, canteloube
Take a look at voicescripttech.com…it handles multiple speakers
Thanks for sharing this. Are you aware of any new solution?. I’m still looking for an app to transcript my interviews automatically. Cheers
Like Sergio, I’m wondering if there have been any new devlopments. The Voicemail-to-email service offered by my phone company does a pretty good job with short voicemail messages.
Now, if I could only find software for my computer to produce transcriptions of recorded lectures and podcast interviews! I’m hearing-impaired and have a lot of trouble understanding what professors and presenters are saying. I’m retired and on a budget, so feature-rich remote services like Voicescript are too costly
Hi, I can suggest Speechlogger.appspot.com which is a new web app based on the most advanced technology by Google + a unique algorithm for automatic punctuation. It also has auto save, different export options, instant voice translation and more. Best thing it’s free and doesn’t require any registration.
Hope you find it helpful.
VoiceScript seems to be the only viable thing I can find out there – and I’ve been looking for a couple of weeks. It handles multiple speakers and works in many languages. Manual transcription is around 2 bucks a minute and the VoiceScript website says that their service can save 60%. So, if they do multiple speakers, automatically for anything less than $0.80 a minute, that would be pretty cheap – as a commercial offering. I am not sure about ‘casual’ usage. It may not be designed for that – but I think it is worth a call to find out. I sent an email to them to see what they offer. Back to you all when I get a response. Brad.
I’ve been looking for the same thing. I want to pipe various spoken podcasts into a tool so I can then index the text for future searching. I’ve read (But haven’t tested) that Adobe Soundbooth can batch transcribe .mp3 files to text using an included tool called Adobe Media Encoder. These aren’t tools that I’ve ever used, so take this with a grain of salt.
VoiceScript works. Saw a demo and actually used the system. Very clean.
They are focussed on bigger business, no shrink-wrapped application. But, if you’ve got a bunch of ongoing meetings, they are worth a call.
Adobe Soundbooth no longer exist. Is there any update to be able to load in mp3’s and get text?
You can try Speechlogger, from our experience it works fine with English, high quality recordings. You can either play from an external device to your PC’s line in, or (a bit tricky) install a virtual line in, as explained on the site.
Hope this helps.
Check this out – conversational speech transcription being added to Watson http://blogs.wsj.com/digits/2015/05/28/speech-recognition-gets-conversational/
Dragon Professional Individual now transcribes audio files of any speaker — without training. Of course it won’t distinguish between speakers but it’s pretty cool that you don’t have to train your profile anymore. I haven’t actually upgraded yet to see how accurate it is but that was one of the features Nuance was marketing at the end of 2015 when it was released.
Speechlogger worked fairly well for me as long as it had a direct audio feed. I selected Stereo Mix as the default mic in Windows and it worked. I like how you don’t have to leave your cursor in a specific place and it just keeps on going.
You could do something similar in a Google Doc using the new voice typing feature but you can’t use your computer for anything else while it’s going because once you move the mouse out of the Google Doc, the mic shuts off. Also you kinda have to sit there and watch it the whole time in case the mic freezes. But if you have another computer nearby, you could access the same doc and be editing it in realtime.
Also looking for software to convert audio of lectures, talks etc to text files. Audio is recorded on Sony voice recorded and uploaded as mp3 files to Mac. It seems like the voice recognition engine used by text messaging app on iphones is good. Is this used in a standalone app?
I’m Tom and I run https://vocalmatic.com. It’s an auto-transcription platform where you can upload your audio files and we will convert it into text.
You can then edit it using our online editor which features the transcribed text and the audio player all on the same page.
Hopefully this saves you all a lot of time!
You’ll get 30 minutes of auto-transcription for free.
Give it a try and let me know what you think!