Editing Speech to text (section)

== Speech to text API ==
{{Gd}} [https://github.com/openai/whisper openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision]
* Support Language: 99 languages
* Input file: Audio files
* Speaker identification: Need to integrate with (1) [https://github.com/m-bain/whisperX m-bain/whisperX: WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)] or (2) [https://github.com/pyannote/pyannote-audio pyannote/pyannote-audio: Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding]
* Real-Time Subtitles or Translation: Not Available
* Related:
** [https://github.com/aaaddress1/Whisper.py?fbclid=IwAR1rwZH-USj2NIt8pLYRGhIqWvQWUj1FQTx83qpBncno3ANWDUBI_duWr9M aaaddress1/Whisper.py: 白癡喔還要下 pip install 誰會用啦—隨開即用 Windows 版 OpenAI Whisper 逐字稿產生器] on {{Win}} 介紹：[https://www.playpcesor.com/2023/04/whisperdesktop-ai.html WhisperDesktop 語音轉文字免費單機軟體，AI 影片字幕實測比較]

[https://cloud.google.com/speech/?hl=zh-tw Speech API - 語音辨識  |  Google Cloud] 「語音轉文字採用機器學習技術」，免費版語音辨識的額度 60 分鐘，詳 [https://cloud.google.com/speech-to-text/pricing 定價  |  Cloud Speech API Documentation  |  Google Cloud]。 {{access | date = 2018-09-04}}
* Input: microphone & audio file (For audio file which longer than 1 minute, upload files to Google cloud storage.
* Language: 120 languages <ref>[https://cloud.google.com/speech-to-text/docs/languages?hl=zh-tw Language Support  |  Cloud Speech-to-Text API  |  Google Cloud]</ref>
* Sample code:
* Related: [[Troubleshooting of Google cloud speech to text]])

[https://azure.microsoft.com/zh-tw/services/cognitive-services/speech/ Bing 語音 API - 語音辨識軟體 | Microsoft Azure]
* Input: Audio file. Format: wav & ogg<ref>[https://docs.microsoft.com/zh-tw/azure/cognitive-services/speech-service/rest-speech-to-text 語音轉換文字 API 參考（REST）-語音服務 - Azure Cognitive Services | Microsoft Docs]</ref>
* Language: Traditional Chinese, Simplified Chinese & English and more on the list<ref>[https://docs.microsoft.com/zh-tw/azure/cognitive-services/speech-service/language-support#speech-to-text 語言支援-語音服務 - Azure Cognitive Services | Microsoft Docs]</ref>
* Sample code: [https://github.com/Azure-Samples/SpeechToText-REST Azure-Samples/SpeechToText-REST: REST Samples of Speech To Text API]
* Related:

[https://tw.olami.ai/open/website/apiandsolution/api_solution OLAMI 中文語音辨識 API｜歐拉蜜人工智慧開放平台（威盛電子）] {{access | date = 2018-09-05}}
* Input: Audio file. Format: wav & speex <ref>[https://tw.olami.ai/wiki/?mp=api_asr&content=api_asr1.html 文件中心 - OLAMI - 歐拉蜜人工智慧開放平台]</ref>
* Language: Traditional Chinese & Simplified Chinese <ref>[https://github.com/olami-developers/olami-api-quickstart-curl-samples/tree/master/cloud-speech-recognition olami-api-quickstart-curl-samples/cloud-speech-recognition at master · olami-developers/olami-api-quickstart-curl-samples]</ref>
* Sample code: [https://github.com/olami-developers/olami-api-quickstart-curl-samples/tree/master/cloud-speech-recognition olami-developers/olami-api-quickstart-curl-samples]
* Related: [[Troubleshooting of Olami speech to text]]

[https://www.xfyun.cn/doccenter/asr 语音识别 - 讯飞开放平台] {{access | date=2018-09-06}}
* Input: speex audio file less than 1 minute <ref>[https://doc.xfyun.cn/rest_api/%E8%AF%AD%E9%9F%B3%E5%90%AC%E5%86%99.html 语音听写 · 科大讯飞REST_API开发指南]</ref>
* Language: 中文（普通话）、英文、中文（粤语）、中文（四川话）
* Sample code:
* Related:

[https://aws.amazon.com/tw/transcribe/ Amazon Transcribe – 自動語音辨識 – AWS] (API documentation: [https://docs.aws.amazon.com/transcribe/latest/dg/what-is-transcribe.html What Is Amazon Transcribe? - Amazon Transcribe]) {{access | date=2018-09-05}}
* Input: Audio file (Stored in S3 bucket). "Valid formats for the audio are mp3, mp4, wav and flac. <ref>[https://docs.aws.amazon.com/transcribe/latest/dg/API_StartTranscriptionJob.html StartTranscriptionJob - Amazon Transcribe] For best results, use a lossless format, such as FLAC or WAV with PCM 16-bit encoding.Your audio input can be sampled at any rate between 8000 and 48000 Hz. We suggest that you use 8000 Hz for low-quality audio and 16000 Hz for high-quality audio.</ref>"
* Language: English, Spanish
* Sample code:
* Related:

[https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file SYSTRAN/faster-whisper: Faster Whisper transcription with CTranslate2]
* Language: Fork from OpenAI Whisper
* Sample code: [https://colab.research.google.com/drive/1TqmzTY5ZXcYBoBGbwSVBtwxlFajMIcRc?usp=sharing]
* Related:
* Free limit:
* Instruction: [https://gsyan888.blogspot.com/2023/11/faster-whisper.html 雄::gsyan: 以 Faster Whisper 將影音辨識為文字檔案(字幕或逐字稿)]

[https://github.com/Const-me/Whisper Const-me/Whisper: High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model] on {{Win}}
* Language:
* Sample code:
* Related:
* Free limit: