How to optimize your OpenAI API token usage

From LemonWiki共筆
Jump to navigation Jump to search

How to optimize your OpenAI API token usage

Methods for enhancing the efficiency of your OpenAI API token usage.[edit]

Prevent Sending Duplicate Content to the OpenAI API[edit]

Cache the API result

If the results of a task calling the OpenAI API do not need to frequently change, consider storing the results for reuse. For example, you can store the results of short article translations, and before calling the API to translate again, check if it has already been translated before.

Constraining Bot Responses to Predefined Options[edit]

While it's acceptable to pose open-ended questions to explore the capabilities of ChatGPT, keep in mind that such questions can lead to longer, more creative responses that might increase costs. To achieve concise and cost-effective answers, (1) consider refining your question by providing specific and limited options for the AI to select from.

(2) if not applicable, respond with N/A. This prevents responses with text outside the options. For example, when analyzing news articles to find key technology-related news summaries, if we don't predefine how the AI should respond when no relevant information is found, the AI might respond with variations like:

  • NA
  • None
  • No major technology news today
  • No major technology news reports today
  • Due to lack of news content, unable to provide technology news summary

This variation in responses which not only uses additional tokens but also increases the effort needed for subsequent data cleaning.

For example:

  • Initial question for exploration:
Please offer five keywords for the following articles:

```
Long text
``` 
  • Refined question:
Please select one of the following keywords: keyword1, keyword2, keyword3, keyword4, keyword5, for the subsequent articles:

```
Long text
``` 

Handling Multiple Article Packages[edit]

original promot

Please select the of the keywords for the subsequent articles: keyword1, keyword2, keyword3, keyword4, keyword5.
```
short text of article No.1
``` 

another prompt:

Please select the keywords for the subsequent articles: keyword1, keyword2, keyword3, keyword4, keyword5.
```
short text of article No.2
``` 

Refined prompt:

Each row is the article number and content. For each article, select the keywords: keyword1, keyword2, keyword3, keyword4, keyword5. Provide your answer in the CSV format: "article number", "comma_separated_keywords"

```
No1. short text of article No.1 (without return symbol)
No2. short text of article No.2 (without return symbol)
...
No5. short text of article No.5 (without return symbol)
``` 

No Additional Explanation Needed[edit]

While GPT-4 often attempts to provide explanations for its answers, if you have already explored the topic, you can frame your questions in a way that skips the elaboration. For example:

For the subsequent articles, please select from the keywords: keyword1, keyword2, keyword3, keyword4, keyword5. No further explanation required.
```
Long text
``` 

Select the appropriate model[edit]

For complex tasks, GPT-4 is recommended, while simpler tasks like translation can utilize GPT-3.5. The same task can start with the xx-mini model. For example, if the o1-mini model can already handle general tasks adequately, there's no need to use the more costly and time-consuming o1 model. For more information, please refer to the following article: Models - OpenAI API.

Enable the JSON mode[edit]

  • "Compatible with gpt-4-1106-preview and gpt-3.5-turbo-1106.[1]"

Choose Long Article Splitting Strategy (chunk)[edit]

Different language models have token limits that affect how much text they can process. The max_length API parameter accounts for both input and output. Models like gpt3.5-turbo and gpt-4 have specific token limits like 4,097 and 8,192, respectively. Exceeding these limits requires you to split articles into smaller pieces for processing.

You can choose not to split articles, but this restricts you to processing only portions of them. Choices include focusing on the beginning and end or just the final paragraphs.

For article splitting, tools like LangChain's text-split-explorer can help, offering options for delimiters, chunk size, and chunk overlap.

Preparation Before API Result Verification[edit]

Prepare several sample texts and verify the API results to ensure they are as expected before processing a large number of articles.

Using OpenAI Batch API for Non-Real-Time Processing[edit]

If real-time response is not required for the output data, you can consider using the OpenAI Batch API

Another version of this article[edit]

Related pages[edit]

Further reading[edit]

References[edit]