LLMs Usage FAQ: Difference between revisions

Jump to navigation Jump to search
2,354 bytes added ,  8 June 2025
Line 80: Line 80:


{{Tip | tip= Using [https://platform.openai.com/docs/models/o3 OpenAI o3] model as an example: (1) Context Window (200,000): total quota for input + output, (2) Max Output Tokens (100,000): single response limit. Actual input space: 200,000 - expected output length}}
{{Tip | tip= Using [https://platform.openai.com/docs/models/o3 OpenAI o3] model as an example: (1) Context Window (200,000): total quota for input + output, (2) Max Output Tokens (100,000): single response limit. Actual input space: 200,000 - expected output length}}
== How to Make AI Process Long Articles ==
📝 Problem: Context Length Limitations
LLM models are constrained by context window length limitations. Taking long article translation as an example, since we cannot process the entire content at once, we need to segment the article for processing.
💬 Processing Methods: Chunking and Maintaining Context Coherence Strategies
When processing long texts, we need to adopt chunking technical strategies. To help the model understand the context of previous chapters when processing subsequent paragraphs, an effective approach is:
# First summarize the previous chapters
# Input the summary together with the full text of the next chapter to be processed to the AI
# This maintains context coherence while saving token usage
'''Overlapping Chunking Strategy'''
Another chunking strategy is suitable for processing transcript editing. Transcript formats typically include timestamps and corresponding subtitle content:
<pre>
1
00:00:00,001 --> 00:00:02,000
So you answer me first
2
00:00:02,000 --> 00:00:06,000
Which country has left you with such a long constitutional gap
3
00:00:06,000 --> 00:00:10,000
Then tell me which country doesn't have such provisions
</pre>
If segment 3 is sent directly to AI for editing, errors are likely to occur due to lack of previous dialogue context. In this case, we can adopt a content chunking strategy that "allows partial overlap." Here's an example prompt for improving Chinese transcripts<ref>[https://medium.com/@planetoid/how-to-add-punctuation-to-whisper-transcripts-using-ai-619362c9160c How to Add Punctuation to Whisper Transcripts Using AI | Medium]</ref>:
<pre>
Your task is to improve Chinese spoken interview transcript paragraphs. You need to add punctuation, ensure paragraph coherence, maintain the original meaning, and rewrite portions of text as needed. Please use Traditional Chinese commonly used in Taiwan.
This is the previous paragraph:
<previous_paragraph>
{PREVIOUS_PARAGRAPH}
</previous_paragraph>
This is the current paragraph:
<current_paragraph>
{CURRENT_PARAGRAPH}
</current_paragraph>
This is the following paragraph:
<next_paragraph>
{NEXT_PARAGRAPH}
</next_paragraph>
</pre>
This method allows AI to reference both preceding and following context simultaneously, ensuring coherence and accuracy in processing results.


== How to Solve AI Forgetting Training Content ==
== How to Solve AI Forgetting Training Content ==

Navigation menu