Anyone getting into audio/video editing for the first time is almost immediately struck with the sheer enormity and complexity of it all. Even if you have the physical hardware, the proper software, and the creative spark to produce media, that doesn’t make the process of editing it all into a cohesive product any less daunting. For those of us struggling under the sisyphean weight of complicated editing workflows, a new product aims to relieve us all of this struggle. Enter Descript, an automatic transcription tool.
Descript uses machine-learning to transcribe your raw audio and video files into a dialogue script. This in itself is an incredibly valuable tool for anyone looking to transcribe podcasts, youtube videos, or whatever kind of media you produce. But this is just the beginning of what makes this app so special.
Descript is the world’s first audio word processor. Using the transcript the app creates from your audio, you can edit the text script to change the media itself. Removing the “umms” and “ahhs” from your speech — or removing whole sentences at a time — is as simple as using the backspace key on a word processor.
As a would-be podcaster, I played around with the app over the weekend, so I can tell you my initial impressions of the app. While it’s not for me (not yet, anyway), it is incredibly easy and fun and quite frankly mindblowing to use.
First things first, let’s talk about the cost.
The app works on a subscription model that pays by the minute. New users are able to upload up to 30 minutes of audio for free, but anything past that will require paying 15 cents per minute or signing up for a monthly subscription. Keep in mind these costs apply to total raw audio uploaded, not finished product audio produced. So if you’re the type (like me) to record several hours of audio per week only to trim it down to a single hour of product, this may be a bit on the wasteful side.
As for the transcription itself, the program’s machine-learning transcription transcribed my dulcet tones into the appropriate written words with nearly complete accuracy. I did have a few issues with the program understanding other speakers, but I believe that may have been a fault on my end that I’ll go into later. If the machine-learning transcription isn’t accurate enough for you, you can also choose to pay extra in order to have your audio specially transcribed by real human professionals.
The app can divide audio between different people speaking, but not automatically. If you have different audio files for each speaker, then each audio file will be labeled separately from the start. If multiple speakers are on the same audio track (like mine), then you’ll have to notate these differing speakers in the script yourself. I believe this is why the program had difficulty transcribing other speakers on the audio than myself. Being on the same audio track, the machine attuned itself to my voice (the first speaker on the recording) and was trying to interpret other people’s words as if I were the one saying them.
As for the audio editing aspect of this program, well, it really needs to be experienced to be believed. I was told what the program could do beforehand, but actually editing audio just by changing words around on a script is something else entirely. Cutting out non sequitur sentences, removing unnecessary articles, or even changing the order of words around to better suit the flow of conversation — through a literal word processor — will make you feel like an arcane grammar wizard.
Will this replace your entire audio/video workflow? Probably not. At least not yet. In addition to the cost factor which may be prohibitive to some users, there are some issues of editing that aren’t based on word choice. I found myself frustrated at my inability to change the timing of spaces between words, sometimes leaving gaps between sentences (or not enough space between words). Of course, I only had the program for a weekend, so this could very well be attributed to user error.
Whatever flaws real or imagined this program may have, it’s very important to keep in mind that Descript is the first of its kind.
It can only improve from here, not to mention potentially inspire a wave of similar programs that may very well function better. Whether or not Descript is right for you, what’s undeniable is that this program is the start of something amazing.