Speech to text

Comparison of speech to text (transcription) software

Speech to text software[edit]

NotebookLM [Last visited: 2026-06-06]

Input file format: Audio file (max 200 MB)^[1]
Supported languages: 80+ languages^[2]
Speaker identification: Prompt-based
Price: Free and paid tiers available
Output file format: TXT (Prompt-based)
Notes: (1) Timestamp formatting is not supported (2) Mixed-language audio (e.g. Mandarin Chinese/English): Automatically translated into the target language as specified in the user prompt

雅婷逐字稿

Input file format: Audio file or video file
Supported languages: (1) Mandarin Chinese & English, (2) Mandarin Chinese, English & Taiwanese (3) English
Speaker identification: Yes
Price: Free and paid tiers available
Output file format: PDF, TXT, ODT, DOCX, SRT, CSV
Notes:

Gemini

Input file format: Audio file or video file^[3] The Gemini app doesn't support direct audio file uploads larger than 20 MB — you'll need to either use the File API or upload the file to Google Drive first and then link it from within the Gemini app.^[4]
Supported languages:
Speaker identification: Prompt-based
Price: Free and paid tiers available
Output file format: TXT (Prompt-based)
Notes:

Clipchamp [Last visited: 2026-06-06]

Input file format: audio or video file
Support Language: 80+ languages^[5]^[6]
Speaker identification:
Output file format: SRT
Comments: The free version seems to have no limitation on video duration, and you can also use AI to convert videos or audio into transcripts for free. However, during testing, the subtitles displayed for each time code were not complete sentences.

Meeting Ink - AI notetaker to transcribe and summarize your meetings and recordings.

Input file: Audio files
Support Language:
Speaker identification: Yes
Real-Time Subtitles or Translation: Pro plan only $
Free limit: 30 minutes max

Whisper Web - a Hugging Face Space by Xenova

Input file: Audio files
Support Language: English
Speaker identification: No
Output file format: TXT or JSON (contains timestamp info.)

影片要產生文字，可利用 youtube 的 Use automatic captioning - YouTube Help，約需要半天時間 [Last visited: 2018-09-04] 教學: YouTube超佛心，自動幫你加入字幕！ | T客邦

Input: Video
Language:
Sample code:
Related:

Web Speech to Text 教學: 免費！中文影片語音轉文字字幕，支援超大影片與長時間錄音

物件: 電腦影像、聲音、YouTube 網址
語言: 中文、英文、日文、韓文

Voicetapp - AI Voice to Text Transcription

Language: 中文、英文等多種語言
Sample code:
Related:
Free limit: 5 minutes

Good Tape

Support Language:
Input file: Audio files
Speaker identification: Available
Real-Time Subtitles or Translation: Not Available
Free limit: 20 minutes max

Lark | Business Chat & Collaboration Tool (飞书 - 維基百科，自由的百科全書)

Language:
Sample code:
Related:
Free limit:

iTranscribe: Transcribe Audio & Video to Text

Language:
Sample code:
Related:
Free limit:

剪映官網-全能易用的桌面端剪輯軟體-輕而易剪上演大幕中國軟體

Language:
Sample code:
Related:
Free limit:

$ MacWhisper on macOS

Input file format: Audio file or video file
Supported languages:
Speaker identification: Yes
Price: Free or Pro plan
Output file format: TXT, DOCX, SRT, VTT, JSON and more
Notes:

Speech to text API[edit]

openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

Support Language: 99 languages
Input file: Audio files
Speaker identification: Need to integrate with (1) m-bain/whisperX: WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization) or (2) pyannote/pyannote-audio: Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Real-Time Subtitles or Translation: Not Available
Related:
- aaaddress1/Whisper.py: 白癡喔還要下 pip install 誰會用啦—隨開即用 Windows 版 OpenAI Whisper 逐字稿產生器 on Win 介紹：WhisperDesktop 語音轉文字免費單機軟體，AI 影片字幕實測比較

Speech API - 語音辨識 | Google Cloud 「語音轉文字採用機器學習技術」，免費版語音辨識的額度 60 分鐘，詳定價 | Cloud Speech API Documentation | Google Cloud。 [Last visited: 2018-09-04]

Input: microphone & audio file (For audio file which longer than 1 minute, upload files to Google cloud storage.
Language: 120 languages ^[7]
Sample code:
Related: Troubleshooting of Google cloud speech to text)

Bing 語音 API - 語音辨識軟體 | Microsoft Azure

Input: Audio file. Format: wav & ogg^[8]
Language: Traditional Chinese, Simplified Chinese & English and more on the list^[9]
Sample code: Azure-Samples/SpeechToText-REST: REST Samples of Speech To Text API
Related:

OLAMI 中文語音辨識 API｜歐拉蜜人工智慧開放平台（威盛電子） [Last visited: 2018-09-05]

Input: Audio file. Format: wav & speex ^[10]
Language: Traditional Chinese & Simplified Chinese ^[11]
Sample code: olami-developers/olami-api-quickstart-curl-samples
Related: Troubleshooting of Olami speech to text

语音识别 - 讯飞开放平台 [Last visited: 2018-09-06]

Input: speex audio file less than 1 minute ^[12]
Language: 中文（普通话）、英文、中文（粤语）、中文（四川话）
Sample code:
Related:

Amazon Transcribe – 自動語音辨識 – AWS (API documentation: What Is Amazon Transcribe? - Amazon Transcribe) [Last visited: 2018-09-05]

Input: Audio file (Stored in S3 bucket). "Valid formats for the audio are mp3, mp4, wav and flac. ^[13]"
Language: English, Spanish
Sample code:
Related:

SYSTRAN/faster-whisper: Faster Whisper transcription with CTranslate2

Language: Fork from OpenAI Whisper
Sample code: [1]
Related:
Free limit:
Instruction: 雄::gsyan: 以 Faster Whisper 將影音辨識為文字檔案(字幕或逐字稿)

Const-me/Whisper: High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model on Win

Language:
Sample code:
Related:
Free limit:

Related pages[edit]

↑ Frequently asked questions - NotebookLM Help: "The current limit is 500,000 words per source or up to 200MB for local uploads. There's no page limit."
↑ Change output language in NotebookLM - Computer - NotebookLM Help
↑ Upload & analyze files in Gemini Apps - Computer - Gemini Apps Help
↑ Audio understanding - generateContent API | Google AI for Developers
↑ How to use autocaptions in Clipchamp - Microsoft Support
↑ 語言支援 - 語音服務 - Azure AI services | Microsoft Learn
↑ Language Support | Cloud Speech-to-Text API | Google Cloud
↑ 語音轉換文字 API 參考（REST）-語音服務 - Azure Cognitive Services | Microsoft Docs
↑ 語言支援-語音服務 - Azure Cognitive Services | Microsoft Docs
↑ 文件中心 - OLAMI - 歐拉蜜人工智慧開放平台
↑ olami-api-quickstart-curl-samples/cloud-speech-recognition at master · olami-developers/olami-api-quickstart-curl-samples
↑ 语音听写 · 科大讯飞REST_API开发指南
↑ StartTranscriptionJob - Amazon Transcribe For best results, use a lossless format, such as FLAC or WAV with PCM 16-bit encoding.Your audio input can be sampled at any rate between 8000 and 48000 Hz. We suggest that you use 8000 Hz for low-quality audio and 16000 Hz for high-quality audio.

[1] Frequently asked questions - NotebookLM Help: "The current limit is 500,000 words per source or up to 200MB for local uploads. There's no page limit."

[2] Change output language in NotebookLM - Computer - NotebookLM Help

[3] Upload & analyze files in Gemini Apps - Computer - Gemini Apps Help

[4] Audio understanding - generateContent API | Google AI for Developers

[5] How to use autocaptions in Clipchamp - Microsoft Support

[6] 語言支援 - 語音服務 - Azure AI services | Microsoft Learn

[7] Language Support | Cloud Speech-to-Text API | Google Cloud

[8] 語音轉換文字 API 參考（REST）-語音服務 - Azure Cognitive Services | Microsoft Docs

[9] 語言支援-語音服務 - Azure Cognitive Services | Microsoft Docs

[10] 文件中心 - OLAMI - 歐拉蜜人工智慧開放平台

[11] -api-quickstart-curl-samples/cloud-speech-recognition at master · olami-developers/olami-api-quickstart-curl-samples

[12] 语音听写 · 科大讯飞REST_API开发指南

[13] StartTranscriptionJob - Amazon Transcribe For best results, use a lossless format, such as FLAC or WAV with PCM 16-bit encoding.Your audio input can be sampled at any rate between 8000 and 48000 Hz. We suggest that you use 8000 Hz for low-quality audio and 16000 Hz for high-quality audio.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Speech to text

Speech to text software[edit]

Speech to text API[edit]

Related pages[edit]

Navigation menu

Search