Troubleshooting of Google cloud speech to text

From LemonWiki共筆
Jump to: navigation, search

Category:NLP > Speech to text > Troubleshooting of Google cloud speech to text

Troubleshooting of Google Cloud Speech-to-Text - 語音辨識

Brief instruction[edit]

  • For audio file which longer than 1 min, (1) Use the uri: speech:longrunningrecognize NOT speech:recognize (2) Upload files to Google cloud storage (gcs)
  • For audio file which shorter than 1 min, (1) Use the uri: speech:longrunningrecognize. Or choose to use speech:recognize (2) Use the files are located on the computer. Or choose to upload files to Google cloud storage (gcs)

ERROR: (gcloud.auth.application-default.print-access-token) The Application Default Credentials are not available[edit]

input & output

$ gcloud auth application-default print-access-token
ERROR: (gcloud.auth.application-default.print-access-token) The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.

Solution[1][2]: Key-in the following command. And then the browser will be opened automatically. Follow the steps on the web page.

$ gcloud auth application-default login

Invalid audio channel count[edit]

error output

  {
    "error": {
      "code": 400,
      "message": "Invalid audio channel count",
      "status": "INVALID_ARGUMENT"
    }
  }

Solution: convert the audio file from stereo to mono

Invalid recognition 'config': bad encoding[edit]

error output

  {
    "error": {
      "code": 400,
      "message": "Invalid recognition 'config': bad encoding..",
      "status": "INVALID_ARGUMENT"
    }
  }

Solution: Specify the encoding of audio file. For details, see Introduction to Audio Encoding  |  Cloud Speech-to-Text API  |  Google Cloud & RecognitionConfig  |  Cloud Speech-to-Text API  |  Google Cloud. You may use VLC player to view the encoding of audio file[3]. If the codec (encoding) of audio file is not in the allowed list on page, the codec (encoding) of audio file should be converted by audio converter.

For audio longer than 1 min use LongRunningRecognize with a 'uri' parameter[edit]

input

$ curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    https://speech.googleapis.com/v1p1beta1/speech:recognize \
    -d @sync-request.json

file content of sync-request.json[4]

{
  "config": {
      "encoding":"FLAC",
      "sampleRateHertz": 44100,
      "languageCode": "cmn-Hant-TW",
      "alternativeLanguageCodes": ["en-US"],
      "enableWordTimeOffsets": false
  },
  "audio": {
      "uri":"gs://<bucket_name>/<audio file name>"
  }
}

error message[5]

{
  "error": {
    "code": 400,
    "message": "Sync input too long. For audio longer than 1 min use LongRunningRecognize with a 'uri' parameter.",
    "status": "INVALID_ARGUMENT"
  }
}

Solution: (1) For audio file which shorter than 1 min, use the uri: speech:recognize. (2) For audio file which longer than 1 min. Upload files to Google cloud storage (gcs). Modify the uri from speech:recognize to speech:longrunningrecognize.

$ curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    https://speech.googleapis.com/v1p1beta1/speech:longrunningrecognize \
    -d @sync-request.json

Request payload size exceeds the limit: 10485760 bytes[edit]

error output

  {
    "error": {
      "code": 400,
      "message": "Request payload size exceeds the limit: 10485760 bytes.",
      "status": "INVALID_ARGUMENT"
    }
  }

Solution: (1) Use the short audio file which shorter than 1 min or (2) Modify the uri from speech:recognize to speech:longrunningrecognize for long audio file which longer than 1 min


sample_rate_hertz (16000) in RecognitionConfig must either be unspecified or match the value in the FLAC header[edit]

input & output

$ curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    https://speech.googleapis.com/v1/speech:longrunningrecognize \
    -d @sync-request.json

{
  "error": {
    "code": 400,
    "message": "sample_rate_hertz (16000) in RecognitionConfig must either be unspecified or match the value in the FLAC header (44100).",
    "status": "INVALID_ARGUMENT"
  }
}

Solution: verify the sample rate of audio file

Invalid JSON payload received. Unknown name \"alternative_language_codes\" at 'config': Cannot find field[edit]

input & output

$ curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    https://speech.googleapis.com/v1/speech:longrunningrecognize \
    -d @sync-request.json

{
  "error": {
    "code": 400,
    "message": "Invalid JSON payload received. Unknown name \"alternative_language_codes\" at 'config': Cannot find field.",
    "status": "INVALID_ARGUMENT",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.BadRequest",
        "fieldViolations": [
          {
            "field": "config",
            "description": "Invalid JSON payload received. Unknown name \"alternative_language_codes\" at 'config': Cannot find field."
          }
        ]
      }
    ]
  }
}

Solution: modify the uri from https://speech.googleapis.com/v1/speech:longrunningrecognize to https://speech.googleapis.com/v1p1beta1/speech:longrunningrecognize

Related[edit]

References[edit]


Troubleshooting of ...

Template