How Journalists and Researchers Transcribe Interviews Instantly

The Hidden Cost of Manual Interview Transcription

The standard rule of thumb in journalism is that transcribing an interview takes three to four times the length of the recording. A 45-minute interview becomes three hours of painstaking keyboard work. For researchers conducting multiple interviews per week, transcription can consume more time than any other single task in the project — more than analysis, more than writing.

Professional transcription services solve the time problem but add cost ($1–2 per minute is common) and introduce a third party to confidential or sensitive source material. Automated upload services are faster and cheaper but still require the recording to exist first, adding a full workflow step before you can start reviewing anything.

Real-Time Transcription Changes the Workflow

With live transcription, the text exists by the time the interview ends. There is no upload step, no waiting, and no separate processing stage. You walk out of the interview with a complete draft transcript ready for annotation and quotation extraction.

For phone or video interviews — the standard in most modern journalism and qualitative research — Voxxpen captures the audio directly from the browser tab running the call, with no recording required and nothing visible to the interviewee.

How to Transcribe an Interview in Real Time

Open your interview call in a browser tab — Zoom, Google Meet, Teams, or any web-based calling tool.
Open Voxxpen in a second tab before the call begins.
Click Start Session, select the call tab, and enable Share tab audio.
Conduct the interview as normal. The transcript builds as both speakers talk.
When the call ends, download the transcript as .docx for editing.

Speaker diarisation (automatically labelling who said what) is on the Voxxpen roadmap but not yet available. Currently, the transcript is a single continuous stream — sufficient for quotation extraction, but speakers need to be manually labelled if a structured transcript is required.

Extracting Quotes and Key Passages

The immediate advantage of a searchable text transcript over a recording is quotation extraction. Instead of rewinding audio to find an exact phrasing, you use Ctrl+F with keywords to locate the passage instantly. This saves significant time when writing under deadline pressure.

For researchers conducting thematic analysis, a text transcript integrates directly into qualitative analysis software (NVivo, ATLAS.ti, or even a structured Google Doc) without any conversion step.

Accuracy and Review

Automatic transcription achieves high accuracy on clearly spoken English in standard audio conditions. For journalism and research purposes, the transcript should always be reviewed before quotation — particularly for proper nouns, technical terminology, and any passage you intend to quote directly. The review of an automated transcript typically takes 10–20% of the interview duration, compared to 300–400% for manual transcription.

A Note on Consent and Legal Considerations

Whether you need explicit consent to transcribe a conversation depends on your jurisdiction and your profession's ethical standards. In most cases, if recording the conversation is permissible, transcribing it in real time carries the same legal and ethical status. Confirm with your editor, institution, or legal counsel for any ambiguous situations. This is a general note and not legal advice.

Archiving and Searching Across Interviews

One of the underrated benefits of building a transcript archive is the ability to search across multiple interviews simultaneously. When preparing a follow-up piece or revisiting research six months later, a folder of text transcripts searched with Windows Search, Spotlight, or grep returns results across every interview you've ever conducted — something no audio archive can offer.