Editing Troubleshooting of Google cloud speech to text

{{Template:Speech to text}}

Troubleshooting of Google [https://cloud.google.com/speech-to-text/ Cloud Speech-to-Text - 語音辨識]

== Brief instruction ==
* If the audio file's duration is '''longer''' than 1 minute, (1) Use the uri: {{kbd | key=<nowiki>speech:longrunningrecognize</nowiki>}} NOT {{kbd | key=<nowiki>speech:recognize</nowiki>}} (2) Upload files to [https://console.cloud.google.com/storage/ Google cloud storage] (gcs)
* If the audio file's duration is '''shorter''' than 1 minute, (1) Use the uri: {{kbd | key=<nowiki>speech:longrunningrecognize</nowiki>}}. Or choose to use {{kbd | key=<nowiki>speech:recognize</nowiki>}} (2) Use the files are located on the computer. Or choose to upload files to [https://console.cloud.google.com/storage/ Google cloud storage] (gcs)

== ERROR: (gcloud.auth.application-default.print-access-token) The Application Default Credentials are not available ==
input & output
<pre>
$ gcloud auth application-default print-access-token
ERROR: (gcloud.auth.application-default.print-access-token) The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
</pre>

Solution<ref>[https://cloud.google.com/docs/authentication/production Setting Up Authentication for Server to Server Production Applications  |  Authentication  |  Google Cloud]</ref><ref>[https://cloud.google.com/speech-to-text/docs/quickstart-protocol Quickstart: Using the Command Line  |  Cloud Speech API Documentation  |  Google Cloud]</ref>: Key-in the following command. And then the browser will be opened automatically. Follow the steps on the web page.
<pre>
$ gcloud auth application-default login
</pre>

== Invalid audio channel count ==
error output
<pre>
  {
    "error": {
      "code": 400,
      "message": "Invalid audio channel count",
      "status": "INVALID_ARGUMENT"
    }
  }
</pre>

Solution: convert the audio file from stereo to mono

== Invalid recognition 'config': bad encoding ==
error output
<pre>
  {
    "error": {
      "code": 400,
      "message": "Invalid recognition 'config': bad encoding..",
      "status": "INVALID_ARGUMENT"
    }
  }
</pre>

Solution: Specify the encoding of audio file. For details, see [https://cloud.google.com/speech-to-text/docs/encoding Introduction to Audio Encoding  |  Cloud Speech-to-Text API  |  Google Cloud] & [https://cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig#AudioEncoding RecognitionConfig  |  Cloud Speech-to-Text API  |  Google Cloud]. You may use VLC player to view the encoding of audio file<ref>[https://forum.videolan.org/viewtopic.php?t=95136#p315198 How to view audio bitrate in VLC - The VideoLAN Forums]</ref>. If the codec (encoding) of audio file is not in the allowed list on [https://cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig#AudioEncoding page], the codec (encoding) of audio file should be converted by [[Audio converter | audio converter]].

== If the audio file's duration is longer than 1 minute use LongRunningRecognize with a 'uri' parameter ==
input
<pre>
$ curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    https://speech.googleapis.com/v1p1beta1/speech:recognize \
    -d @sync-request.json

</pre>

file content of sync-request.json<ref>[https://cloud.google.com/speech-to-text/docs/languages Language Support  |  Cloud Speech-to-Text API  |  Google Cloud]</ref>
<pre>
{
  "config": {
      "encoding":"FLAC",
      "sampleRateHertz": 44100,
      "languageCode": "cmn-Hant-TW",
      "alternativeLanguageCodes": ["en-US"],
      "enableWordTimeOffsets": false
  },
  "audio": {
      "uri":"gs://<bucket_name>/<audio file name>"
  }
}
</pre>

error message<ref>[http://volkanpaksoy.com/archive/2017/12/12/Playing-with-Google-Speech-API/ Playing with Google Speech API - Playground for the mind]</ref>
<pre>
{
  "error": {
    "code": 400,
    "message": "Sync input too long. For audio longer than 1 min use LongRunningRecognize with a 'uri' parameter.",
    "status": "INVALID_ARGUMENT"
  }
}
</pre>

Solution: (1) If the audio file's duration is shorter than 1 min, use the uri: {{kbd | key=<nowiki>speech:recognize</nowiki>}}. (2) If the audio file's duration is longer than 1 min. Upload files to [https://console.cloud.google.com/storage/ Google cloud storage] (gcs). Modify the uri from {{kbd | key=<nowiki>speech:recognize</nowiki>}} to {{kbd | key=<nowiki>speech:longrunningrecognize</nowiki>}}.
<pre>
$ curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    https://speech.googleapis.com/v1p1beta1/speech:longrunningrecognize \
    -d @sync-request.json

</pre>

== Request payload size exceeds the limit: 10485760 bytes ==
error output
<pre>

  {
    "error": {
      "code": 400,
      "message": "Request payload size exceeds the limit: 10485760 bytes.",
      "status": "INVALID_ARGUMENT"
    }
  }
</pre>

Solution: (1) Use the short audio file which shorter than 1 min or (2) Modify the uri from {{kbd | key=<nowiki>speech:recognize</nowiki>}} to {{kbd | key=<nowiki>speech:longrunningrecognize</nowiki>}} for long audio file which longer than 1 min

== sample_rate_hertz (16000) in RecognitionConfig must either be unspecified or match the value in the FLAC header ==
input & output
<pre>
$ curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    https://speech.googleapis.com/v1/speech:longrunningrecognize \
    -d @sync-request.json

{
  "error": {
    "code": 400,
    "message": "sample_rate_hertz (16000) in RecognitionConfig must either be unspecified or match the value in the FLAC header (44100).",
    "status": "INVALID_ARGUMENT"
  }
}
</pre>

Solution: verify the sample rate of audio file

== Invalid JSON payload received. Unknown name \"alternative_language_codes\" at 'config': Cannot find field ==
input & output
<pre>
$ curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    https://speech.googleapis.com/v1/speech:longrunningrecognize \
    -d @sync-request.json

{
  "error": {
    "code": 400,
    "message": "Invalid JSON payload received. Unknown name \"alternative_language_codes\" at 'config': Cannot find field.",
    "status": "INVALID_ARGUMENT",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.BadRequest",
        "fieldViolations": [
          {
            "field": "config",
            "description": "Invalid JSON payload received. Unknown name \"alternative_language_codes\" at 'config': Cannot find field."
          }
        ]
      }
    ]
  }
}
</pre>

Solution: modify the uri from {{kbd | key=<nowiki>https://speech.googleapis.com/v1/speech:longrunningrecognize</nowiki>}} to {{kbd | key=<nowiki>https://speech.googleapis.com/v1p1beta1/speech:longrunningrecognize</nowiki>}}

== Related ==

* {{Gd}} [https://cloud.google.com/speech-to-text/docs/best-practices Best Practices  |  Cloud Speech API Documentation  |  Google Cloud]
* official document: [https://cloud.google.com/speech-to-text/docs/troubleshooting Troubleshooting of Google Speech-to-text API]
* [https://groups.google.com/forum/#!forum/cloud-speech-discuss cloud-speech-discuss - Google Group]
* [https://github.com/GoogleCloudPlatform/php-docs-samples/tree/master/speech/ php-docs-samples/speech at master · GoogleCloudPlatform/php-docs-samples]
* [[Text to speech]]

== References ==

<references/>

{{Template:Troubleshooting}}

[[Category:Google]] [[Category:NLP]] [[Category:Tool]]