Editing
LLMs Usage FAQ
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== How to Make AI Process Long Articles == 📝 Problem: Context Length Limitations LLM models are constrained by context window length limitations. Taking long article translation as an example, since we cannot process the entire content at once, we need to segment the article for processing. 💬 Processing Methods: Method 1: Switch to models that support longer context windows, such as Google Gemini: # GPT-4o: "16,384 max output tokens"<ref>[https://platform.openai.com/docs/models/gpt-4o Model - OpenAI API]</ref> equivalent to approximately 5,461 Chinese characters (16,384/3) # gemini-2.5-pro: "65,536 max output tokens"<ref>[https://ai.google.dev/gemini-api/docs/models#gemini-2.5-pro Gemini 2.5 Pro]</ref> equivalent to approximately 21,845 Chinese characters (65,536/3) # GPT-5: "128,000 max output tokens"<ref>[https://platform.openai.com/docs/models/gpt-5 Model - OpenAI API]</ref> equivalent to approximately 42,666 Chinese characters (128,000/3) Method 2: Start a new conversation and transfer the conversation content to the new dialogue. For existing conversations, you can try using this prompt: <pre> As the first prompt for a new conversation, please organize our previous dialogue into: 1. Clear operational steps 2. Instructions to verify the success of each prerequisite step </pre> Method 3: Chunking strategy with context continuity maintenance When processing long texts, we need to adopt chunking technical strategies<ref>[https://ihower.tw/blog/archives/12373 使用繁體中文評測 RAG 的 Chunking 切塊策略 – ihower { blogging }]</ref>. To help the model understand the context of previous chapters when processing subsequent paragraphs, an effective approach is '''Chunking Strategy with Previous Article Summarization''': # First summarize the previous chapters # Input the summary together with the full text of the next chapter to be processed to the AI # This maintains context coherence while saving token usage '''Overlapping Chunking Strategy''' Another chunking strategy is suitable for processing transcript editing. Transcript formats typically include timestamps and corresponding subtitle content: <pre> 1 00:00:00,001 --> 00:00:02,000 So you answer me first 2 00:00:02,000 --> 00:00:06,000 Which country has left you with such a long constitutional gap 3 00:00:06,000 --> 00:00:10,000 Then tell me which country doesn't have such provisions </pre> If segment 3 is sent directly to AI for editing, errors are likely to occur due to lack of previous dialogue context. In this case, we can adopt a content chunking strategy that "allows partial overlap." Here's an example prompt for improving Chinese transcripts<ref>[https://medium.com/@planetoid/how-to-add-punctuation-to-whisper-transcripts-using-ai-619362c9160c How to Add Punctuation to Whisper Transcripts Using AI | Medium]</ref>: <pre> Your task is to improve Chinese spoken interview transcript paragraphs. You need to add punctuation, ensure paragraph coherence, maintain the original meaning, and rewrite portions of text as needed. Please use Traditional Chinese commonly used in Taiwan. This is the previous paragraph: <previous_paragraph> {PREVIOUS_PARAGRAPH} </previous_paragraph> This is the current paragraph: <current_paragraph> {CURRENT_PARAGRAPH} </current_paragraph> This is the following paragraph: <next_paragraph> {NEXT_PARAGRAPH} </next_paragraph> </pre> This method allows AI to reference both preceding and following context simultaneously, ensuring coherence and accuracy in processing results.
Summary:
Please note that all contributions to LemonWiki共筆 are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
LemonWiki:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Current events
Recent changes
Random page
Help
Categories
Tools
What links here
Related changes
Special pages
Page information