Kicking the tires on Adobe CS4 speech transcription

By S Simmons. Filed in Adobe Premiere Pro, Mac software  |  
TOP digg

Adobe’s CS4 Production Premium suite of applications was packed full of big new features. There’s mocha for After Effects, native RED .R3D editing in Premiere Pro and After Effects, XML import of Final Cut Pro projects, a unified interface among the applications, better dynamic linking and a lot more. And there is also the ability to automatically transcribe video and audio clips. For editors doing documentary work or a lot of talking heads then this could be a killer feature. Automated transcription almost seems to good to be true … and you know what they say about something that is too good to be true…

The transcribe feature is available in both Premiere Pro CS4 and Soundbooth CS4. In both Soundbooth and Premiere Pro the transcription function is part of the metadata browser. The CS4 applications are metadata kings with more tabs and titles than anyone will probably ever need. Since transcribed text is metadata they are saved with the file itself. Transcribe a clip in Soundbooth and that transcription will show up in Premiere Pro as well. In the metadata pane is a Speech Transcript window:

Control begins here. The little yellow triangle will alert you if your source media has changed and that you need to transcribe again to update the text. When you are ready, hit the Transcribe… button. The Speech Transcription Options lets you select languages and dialects, the quality of the operation and if you have more than one speaker. Hit OK and off it goes to transcribing:

The clip I tested with was an English speaker with a Canadian accent. The clip is 1:08  and Soundbooth took 2:00 to do the transcription High quality. If you’ve ever used any other automated transcription software you know their accuracy can be way off so I figured I would start with the best quality it can produce. How did it do? See for yourself:

Adobe CS4 speech transcription test clip from Scott Simmons on Vimeo.

This is the transcription from Soundbooth:

let’s see everybody’s dollar missing channels like it was Brandon Man you know what its hands on me having Harwood who I think is the ideology is at you know what to expect and seventy but you know what is a ready response so concerns in nicely it was so much understeer than you can imagine stars in the mid one year actually the thing about it there’s a little bit in technology have ended here to keep it from me in a moron you knew it definitely sounds like a Trans-Am car this is you know guards not like the M for a new moon but it was definitely true in August I shouldn’t of late sixties trends and cars I want to get too hard on that call him now will roll in on the straightaway

Premiere Pro CS4 does transcription as well by launching Adobe Media Encoder and handling it in that separate application. This is nice as you can continue to work in PPro while it is transcribing as well as set up a batch. This is the transcription from Premiere Pro but on the medium quality setting. It is definitely faster, taking only about 30 seconds to transcribe the same piece of video:

see everybody’s dollar missing channels like it was Brandon Man you know what it’s had some nice having car with RU I thank the ideology is at and what that is expected and seventy but you know what is a ready response so concerns in nicely it was so much understeer there and then the engine starts to come in one year I actually think Jonah the technology happening here to keep it from me in a moron RU you it definitely sounds like a Trans-Am car this is you know guards not like the M3 new moon but it was definitely true in August I shouldn’t of late sixties trends and cars a lot to get too hard on that call him now will roll in on the straightaway

So that’s almost comical. Now I admit that this might not be the absolute most pristine footage to transcribe since according to the Adobe user manual “Accurate speech transcripts require good audio quality. Background noise significantly reduces accuracy. To remove such noise, use the tools and processes in Soundbooth.” This is commentary from a moving car so there is some car noise but overall it’s not too far off from what you might record on a documentary-style shoot or an interview where you didn’t have total control of the surroundings. And let’s be honest, I really wouldn’t expect any software to be able to transcribe a phrase like “bags of ya-ya juice” … but it did get “Trans-Am car” so go figure. I also tried the transcription on a controlled talking head interview and it did do better, probably 30 – 40% more accurate.

Thankfully you can go into the transcribe pane and correct the words that transcription missed. There are a number of options available when you right-click in the Speech Transcript window for inserting and merging words:

That could really take a long time for hours and hours of footage but then it might be a good task for an intern to take on! I think the best thing to say about this feature is that it now actually exists and can be built upon and improved in future versions. Obviously the clearer the speaking is, the better the speaker enunciates and the less background noise the better. Sometimes you just can’t control those things so it would be a decision among the post team if it is worth the time, money and man-hours to utilize this feature.

My first thought when I heard about this speech transcription was how great it would be to take a bunch of interviews into Soundbooth and do a transcription that I could then print out for an edit in Avid or Final Cut Pro. I was hoping that you would be able to transcribe video or audio and then be able to take that transcription to a text editor with timecode intact. While you can copy/paste the transcription into a text editor there is no associated timecode numbers that come along. Bad if you want to use these apps as a transcribing system only. But if you are staying in Premiere Pro for your edit you now have an amazing new way to navigate clips. Click a word in the speech transcript and the playhead immediately jumps to that word and you can play, mark IN and OUT points, whatever you normally do. Plus there is a search field at the top of the metadata window that allows you to search for a single word:

Searching could be vastly improved if you could search for a whole phrase instead of just a single word. I drool at the thought of how handy this could be with a very long clip that has been transcribed, corrected for the mistakes, and then printed with a paper edit built from the printout. Soundbooth actually has an Export > Speech Transcription option that exports an XML. The terribly bad Adobe CS4 help files say this is for exporting words as cue points but I wonder if some smart XML expert could do something more with the file.

Overall I would say speech transcription is off to a great start in Adobe CS4 and is one more compelling reason to add the Adobe Creative Suite 4 Production Premium to your editing toolkit. With good quality audio and a few version upgrades this might be one feature an editor will wonder how they ever lived without.

16 comments to “Kicking the tires on Adobe CS4 speech transcription”

  1. Comment by Bryce:

    So can I do this transcription in soundbooth, create a PDF and use it in Avid Script Sync?

  2. Comment by editblog-admin:

    Since you can copy/paste the text you can put it in to any documents. Using it with Avid ScriptSync would be a great use! This was suggested on and Avid blog back when the Soundbooth beta was out. If you try it out let us know how it works.

  3. Comment by Gazelle:

    I think this is very cool tech. and that the other big guys will follow suit very soon. Although, to a degree I think Adobe missed the mark a bit in that this could very easily be turned into a Closed Captioning tool. I think the STT has a lot of potential.

  4. Comment by Michael Critz:

    I heard about this feature on theh TWiP podcast. I’m glad I got to see it in action. Honestly, Premiere edits like a pig dances… not so good. So, I’m not inclined to start using Premiere instead of FCP anytime soon.

  5. Comment by Jon Chappell:

    I can’t wait for Apple to implement this. Imagine this coupled with Final Cut Server – you could type in a few words and instantly find that elusive video clip among thousands of others.

    One question – if you correct a word does it “learn” the correction? It’d be very frustrating if it’s making the same mistakes over and over again.

  6. Comment by editblog-admin:

    Good question on the learn the correction Jon. Since it correction is made in the transcript and that’s all metadata that should stay with that clip from that point forward. What I don’t know is it smart enough to know that if you have 5 takes of the same script can you make the change on take 1 and then run transcription again on take 2 -5 and have it get it right. I suspect not.

  7. Comment by DMarcus:

    I at least have to give props to Adobe for making an attempt to tackle a process that most of us doc editors and producers have to deal with on a constant basis. A few years back I even tried to use a speech recognition software to help transcribe hours of footage. The results were just as comical as Scott’s. So to me, to have an assistant take the time to go through and correct perhaps 50-70% of a transcription…I rather just get it done correctly the first time. This kind of technology unfortunately takes hours upon of hours of voice training to truly get it done right.

    Now I wish Apple would get off their butts and make some real feature improvements to FCP…I mean make an effort guys! It’s been like 4 years.

  8. Comment by Dylan Reeve:

    I’ve been dubious about this since I heard about it the first time. I’ve seen an used quite a few speech-to-text apps, and the only time I’ve seen anything close to accurate results has been when app was trained for the voice, the sound was clear and without confusing background noise, and the speaker was mindful of the process and avoided running words together. Basically pretty much everything 90% of our video isn’t.

    I don’t know what technology exists, but it’s hard to imagine it ever getting a whole lot better for the average clip.

  9. Comment by Fred Blatz:

    I’ve tested the transcription feature in Soundbooth. It isn’t perfect but my results were far superior than those of the test clip above. I think the engine noise undermined the accuracy of the transcription. Although even with pristine audio you’ll need to listen to clip and correct the transcript. But I found it was pretty accurate.

    The other cool feature is that it weds the text to the time code location in the clip. Very useful if you have long interviews and you’re trying to figure out where something was said. As a documentary filmmaker the feature is a must have. It will save you thousands of $$ in transcription fees.

  10. Comment by Mark Gillespie:

    I think this might be just what I’ve been looking for…I produce a weekly podcast, and never bothered to transcribe all of the episodes so that I could make PDF transcripts available on the website. I was looking for a reason to upgrade anyway…

  11. Comment by randall martin:

    My results from a recorded voice over were just as bad…and for me it is worthless….had similar issues with Dragon Naturally Speaking which I thought I could use instead of manually transcribing interviews. Not so…I can type faster!!!!

  12. Comment by David McW:

    It worked reasonably well with a quicktime MOV file but cannot get it work with an MXF file….the transcribe button is greyed out when presented with an MXF file….any thoughts anyone? Thanks

  13. Comment by J Fairweather:

    It took 2:35 to transcribe a 44:12 piece. For whatever reason, Encoder appeared behind the Premiere window, which was confusing as nothing appeared to be happening, but this was a minor issue. If you have (as I do) eight hours of interviews, the thought of running a 28 hour batch doesn’t translate (excuse the pun) into a timely workflow. This was a studio shoot with subjects from all over the country, including NY, TX, Chicago, and latin accents. The quality varied, but I don’t think this feature is ready for prime time.
    For my money and my needs, it would be faster to drop in markers on the fly and get on with the editing.

TrackBacks / PingBacks

  1. Pingback
    A look at Adobe CS4 speech transcription at FreshDV
  2. Pingback
    CS4 – Questions, Doubts, Impressions and Confessions of an FCP Cutter « adam schreck
  3. Pingback
    Jonathan Stray » Lifelogging + Machine Transcription = Public Reporters’ Notebooks