LLMs Usage FAQ: Difference between revisions

LLMs Usage FAQ (edit)

Revision as of 23:24, 8 June 2025

2,354 bytes added , 8 June 2025

→‎Generating Longer Article Content

Planetoid

Bureaucrats, Administrators

14,970

edits

@@ Line 80: / Line 80: @@
 {{Tip | tip= Using [https://platform.openai.com/docs/models/o3 OpenAI o3] model as an example: (1) Context Window (200,000): total quota for input + output, (2) Max Output Tokens (100,000): single response limit. Actual input space: 200,000 - expected output length}}
+== How to Make AI Process Long Articles ==
+📝 Problem: Context Length Limitations
+LLM models are constrained by context window length limitations. Taking long article translation as an example, since we cannot process the entire content at once, we need to segment the article for processing.
+💬 Processing Methods: Chunking and Maintaining Context Coherence Strategies
+When processing long texts, we need to adopt chunking technical strategies. To help the model understand the context of previous chapters when processing subsequent paragraphs, an effective approach is:
+# First summarize the previous chapters
+# Input the summary together with the full text of the next chapter to be processed to the AI
+# This maintains context coherence while saving token usage
+'''Overlapping Chunking Strategy'''
+Another chunking strategy is suitable for processing transcript editing. Transcript formats typically include timestamps and corresponding subtitle content:
+<pre>
+:00:00,001 --> 00:00:02,000
+So you answer me first
+:00:02,000 --> 00:00:06,000
+Which country has left you with such a long constitutional gap
+:00:06,000 --> 00:00:10,000
+Then tell me which country doesn't have such provisions
+</pre>
+If segment 3 is sent directly to AI for editing, errors are likely to occur due to lack of previous dialogue context. In this case, we can adopt a content chunking strategy that "allows partial overlap." Here's an example prompt for improving Chinese transcripts<ref>[https://medium.com/@planetoid/how-to-add-punctuation-to-whisper-transcripts-using-ai-619362c9160c How to Add Punctuation to Whisper Transcripts Using AI | Medium]</ref>:
+<pre>
+Your task is to improve Chinese spoken interview transcript paragraphs. You need to add punctuation, ensure paragraph coherence, maintain the original meaning, and rewrite portions of text as needed. Please use Traditional Chinese commonly used in Taiwan.
+This is the previous paragraph:
+<previous_paragraph>
+{PREVIOUS_PARAGRAPH}
+</previous_paragraph>
+This is the current paragraph:
+<current_paragraph>
+{CURRENT_PARAGRAPH}
+</current_paragraph>
+This is the following paragraph:
+<next_paragraph>
+{NEXT_PARAGRAPH}
+</next_paragraph>
+</pre>
+This method allows AI to reference both preceding and following context simultaneously, ensuring coherence and accuracy in processing results.
 == How to Solve AI Forgetting Training Content ==

LLMs Usage FAQ: Difference between revisions

LLMs Usage FAQ (edit)

Revision as of 23:24, 8 June 2025

Navigation menu

Search