Speech to text

From LemonWiki共筆
Jump to navigation Jump to search

Comparison of speech to text (transcription) software


Speech to text software[edit]

NotebookLM [Last visited: 2026-06-06]

  • Input file format: Audio file (max 200 MB)[1]
  • Supported languages: 80+ languages[2]
  • Speaker identification: Prompt-based
  • Price: Free and paid tiers available
  • Output file format: TXT (Prompt-based)
  • Notes: (1) Timestamp formatting is not supported Icon_exclaim.gif (2) Mixed-language audio (e.g. Mandarin Chinese/English): Automatically translated into the target language as specified in the user prompt


雅婷逐字稿

  • Input file format: Audio file or video file
  • Supported languages: (1) Mandarin Chinese & English, (2) Mandarin Chinese, English & Taiwanese (3) English
  • Speaker identification: Yes Good.gif
  • Price: Free and paid tiers available
  • Output file format: PDF, TXT, ODT, DOCX, SRT, CSV
  • Notes:

Gemini

  • Input file format: Audio file or video file[3] The Gemini app doesn't support direct audio file uploads larger than 20 MB — you'll need to either use the File API or upload the file to Google Drive first and then link it from within the Gemini app.[4]
  • Supported languages:
  • Speaker identification: Prompt-based
  • Price: Free and paid tiers available
  • Output file format: TXT (Prompt-based)
  • Notes:

Clipchamp [Last visited: 2026-06-06]

  • Input file format: audio or video file
  • Support Language: 80+ languages[5][6]
  • Speaker identification:
  • Output file format: SRT
  • Comments: The free version seems to have no limitation on video duration, and you can also use AI to convert videos or audio into transcripts for free. However, during testing, the subtitles displayed for each time code were not complete sentences.

Meeting Ink - AI notetaker to transcribe and summarize your meetings and recordings.

  • Input file: Audio files
  • Support Language:
  • Speaker identification: Yes Good.gif
  • Real-Time Subtitles or Translation: Pro plan only $
  • Free limit: 30 minutes max

Whisper Web - a Hugging Face Space by Xenova

  • Input file: Audio files
  • Support Language: English
  • Speaker identification: No Icon_exclaim.gif
  • Output file format: TXT or JSON (contains timestamp info.)

影片要產生文字,可利用 youtube 的 Use automatic captioning - YouTube Help,約需要半天時間 [Last visited: 2018-09-04] 教學: YouTube超佛心,自動幫你加入字幕! | T客邦

  • Input: Video
  • Language:
  • Sample code:
  • Related:

Web Speech to Text 教學: 免費!中文影片語音轉文字字幕,支援超大影片與長時間錄音

  • 物件: 電腦影像、聲音、YouTube 網址
  • 語言: 中文、英文、日文、韓文

Voicetapp - AI Voice to Text Transcription

  • Language: 中文、英文等多種語言
  • Sample code:
  • Related:
  • Free limit: 5 minutes

Good Tape

  • Support Language:
  • Input file: Audio files
  • Speaker identification: Available Good.gif
  • Real-Time Subtitles or Translation: Not Available
  • Free limit: 20 minutes max


Lark | Business Chat & Collaboration Tool (飞书 - 維基百科,自由的百科全書)

  • Language:
  • Sample code:
  • Related:
  • Free limit:

iTranscribe: Transcribe Audio & Video to Text

  • Language:
  • Sample code:
  • Related:
  • Free limit:

剪映官網-全能易用的桌面端剪輯軟體-輕而易剪 上演大幕 中國軟體 Icon_exclaim.gif

  • Language:
  • Sample code:
  • Related:
  • Free limit:

$ MacWhisper on macOS icon_os_mac.png

  • Input file format: Audio file or video file
  • Supported languages:
  • Speaker identification: Yes Good.gif
  • Price: Free or Pro plan
  • Output file format: TXT, DOCX, SRT, VTT, JSON and more
  • Notes:

Speech to text API[edit]

Good.gif openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

Speech API - 語音辨識  |  Google Cloud 「語音轉文字採用機器學習技術」,免費版語音辨識的額度 60 分鐘,詳 定價  |  Cloud Speech API Documentation  |  Google Cloud[Last visited: 2018-09-04]

Bing 語音 API - 語音辨識軟體 | Microsoft Azure

OLAMI 中文語音辨識 API|歐拉蜜人工智慧開放平台(威盛電子) [Last visited: 2018-09-05]

语音识别 - 讯飞开放平台 [Last visited: 2018-09-06]

  • Input: speex audio file less than 1 minute [12]
  • Language: 中文(普通话)、英文、中文(粤语)、中文(四川话)
  • Sample code:
  • Related:

Amazon Transcribe – 自動語音辨識 – AWS (API documentation: What Is Amazon Transcribe? - Amazon Transcribe) [Last visited: 2018-09-05]

  • Input: Audio file (Stored in S3 bucket). "Valid formats for the audio are mp3, mp4, wav and flac. [13]"
  • Language: English, Spanish
  • Sample code:
  • Related:

SYSTRAN/faster-whisper: Faster Whisper transcription with CTranslate2

Const-me/Whisper: High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model on Win  

  • Language:
  • Sample code:
  • Related:
  • Free limit:


Related pages[edit]