Editing LLMs Usage FAQ (section)

== How to Make AI Process Long Articles ==

📝 Problem: Context Length Limitations

LLM models are constrained by context window length limitations. Taking long article translation as an example, since we cannot process the entire content at once, we need to segment the article for processing.

💬 Processing Methods: 

Method 1: Switch to models that support longer context windows, such as Google Gemini:

# GPT-4o: "16,384 max output tokens"<ref>[https://platform.openai.com/docs/models/gpt-4o Model - OpenAI API]</ref> equivalent to approximately 5,461 Chinese characters (16,384/3)
# gemini-2.5-pro: "65,536 max output tokens"<ref>[https://ai.google.dev/gemini-api/docs/models#gemini-2.5-pro Gemini 2.5 Pro]</ref> equivalent to approximately 21,845 Chinese characters (65,536/3)
# GPT-5: "128,000 max output tokens"<ref>[https://platform.openai.com/docs/models/gpt-5 Model - OpenAI API]</ref> equivalent to approximately 42,666 Chinese characters (128,000/3)

Method 2: Start a new conversation and transfer the conversation content to the new dialogue. For existing conversations, you can try using this prompt:

<pre>
As the first prompt for a new conversation, please organize our previous dialogue into:
1. Clear operational steps
2. Instructions to verify the success of each prerequisite step
</pre>

Method 3: Chunking strategy with context continuity maintenance

When processing long texts, we need to adopt chunking technical strategies<ref>[https://ihower.tw/blog/archives/12373 使用繁體中文評測 RAG 的 Chunking 切塊策略 – ihower { blogging }]</ref>. To help the model understand the context of previous chapters when processing subsequent paragraphs, an effective approach is '''Chunking Strategy with Previous Article Summarization''':
# First summarize the previous chapters
# Input the summary together with the full text of the next chapter to be processed to the AI
# This maintains context coherence while saving token usage

'''Overlapping Chunking Strategy'''

Another chunking strategy is suitable for processing transcript editing. Transcript formats typically include timestamps and corresponding subtitle content:

<pre>
1
00:00:00,001 --> 00:00:02,000
So you answer me first

2
00:00:02,000 --> 00:00:06,000
Which country has left you with such a long constitutional gap

3
00:00:06,000 --> 00:00:10,000
Then tell me which country doesn't have such provisions
</pre>

If segment 3 is sent directly to AI for editing, errors are likely to occur due to lack of previous dialogue context. In this case, we can adopt a content chunking strategy that "allows partial overlap." Here's an example prompt for improving Chinese transcripts<ref>[https://medium.com/@planetoid/how-to-add-punctuation-to-whisper-transcripts-using-ai-619362c9160c How to Add Punctuation to Whisper Transcripts Using AI | Medium]</ref>:

<pre>
Your task is to improve Chinese spoken interview transcript paragraphs. You need to add punctuation, ensure paragraph coherence, maintain the original meaning, and rewrite portions of text as needed. Please use Traditional Chinese commonly used in Taiwan.

This is the previous paragraph:
<previous_paragraph>
{PREVIOUS_PARAGRAPH}
</previous_paragraph>

This is the current paragraph:
<current_paragraph>
{CURRENT_PARAGRAPH}
</current_paragraph>

This is the following paragraph:
<next_paragraph>
{NEXT_PARAGRAPH}
</next_paragraph>
</pre>

This method allows AI to reference both preceding and following context simultaneously, ensuring coherence and accuracy in processing results.