Troubleshooting of Google cloud speech to text

From LemonWiki共筆
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Troubleshooting of Google Cloud Speech-to-Text - 語音辨識

ERROR: (gcloud.auth.application-default.print-access-token) The Application Default Credentials are not available

input & output

$ gcloud auth application-default print-access-token
ERROR: (gcloud.auth.application-default.print-access-token) The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.

Solution[1][2]: Key-in the following command. And then the browser will be opened automatically.

$ gcloud auth application-default login


Invalid audio channel count

error output

  {
    "error": {
      "code": 400,
      "message": "Invalid audio channel count",
      "status": "INVALID_ARGUMENT"
    }
  }

Solution: convert the audio file from stereo to mono

Invalid recognition 'config': bad encoding

error output

  {
    "error": {
      "code": 400,
      "message": "Invalid recognition 'config': bad encoding..",
      "status": "INVALID_ARGUMENT"
    }
  }

Solution: Specify the encoding of audio file. For details, see Introduction to Audio Encoding  |  Cloud Speech-to-Text API  |  Google Cloud & RecognitionConfig  |  Cloud Speech-to-Text API  |  Google Cloud. You may use VLC player to view the encoding of audio file[3]. If the codec (encoding) of audio file is not in the allowed list on page, the codec (encoding) of audio file should be converted.

For audio longer than 1 min use LongRunningRecognize with a 'uri' parameter

input

$ curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    https://speech.googleapis.com/v1p1beta1/speech:recognize \
    -d @sync-request.json

file content of sync-request.json[4]

{
  "config": {
      "encoding":"FLAC",
      "sampleRateHertz": 44100,
      "languageCode": "cmn-Hant-TW",
      "alternativeLanguageCodes": ["en-US"],
      "enableWordTimeOffsets": false
  },
  "audio": {
      "uri":"gs://<bucket_name>/<audio file name>"
  }
}

error message[5]

{
  "error": {
    "code": 400,
    "message": "Sync input too long. For audio longer than 1 min use LongRunningRecognize with a 'uri' parameter.",
    "status": "INVALID_ARGUMENT"
  }
}

Solution: (1) Use the short audio file which shorter than 1 min or (2) Modify the uri from speech:recognize to speech:longrunningrecognize for long audio file which longer than 1 min

$ curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    https://speech.googleapis.com/v1p1beta1/speech:longrunningrecognize \
    -d @sync-request.json

Request payload size exceeds the limit: 10485760 bytes

error output


  {
    "error": {
      "code": 400,
      "message": "Request payload size exceeds the limit: 10485760 bytes.",
      "status": "INVALID_ARGUMENT"
    }
  }

Solution: (1) Use the short audio file which shorter than 1 min or (2) Modify the uri from speech:recognize to speech:longrunningrecognize for long audio file which longer than 1 min


sample_rate_hertz (16000) in RecognitionConfig must either be unspecified or match the value in the FLAC header

input & output

$ curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    https://speech.googleapis.com/v1/speech:longrunningrecognize \
    -d @sync-request.json

{
  "error": {
    "code": 400,
    "message": "sample_rate_hertz (16000) in RecognitionConfig must either be unspecified or match the value in the FLAC header (44100).",
    "status": "INVALID_ARGUMENT"
  }
}

Solution: verify the sample rate of audio file

Invalid JSON payload received. Unknown name \"alternative_language_codes\" at 'config': Cannot find field

input & output

$ curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    https://speech.googleapis.com/v1/speech:longrunningrecognize \
    -d @sync-request.json

{
  "error": {
    "code": 400,
    "message": "Invalid JSON payload received. Unknown name \"alternative_language_codes\" at 'config': Cannot find field.",
    "status": "INVALID_ARGUMENT",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.BadRequest",
        "fieldViolations": [
          {
            "field": "config",
            "description": "Invalid JSON payload received. Unknown name \"alternative_language_codes\" at 'config': Cannot find field."
          }
        ]
      }
    ]
  }
}

Solution: modify the uri from https://speech.googleapis.com/v1/speech:longrunningrecognize to https://speech.googleapis.com/v1p1beta1/speech:longrunningrecognize

Related forum

References


Troubleshooting of ...

Template