14,953
edits
m (→S) |
m (→S) |
||
| Line 16: | Line 16: | ||
== S == | == S == | ||
* AI Sycophancy(AI | * AI Sycophancy(AI 諂媚):「模型在預訓練階段即呈現出迎合使用者立場的傾向,而『基於人類回饋的強化學習』(Reinforcement Learning from Human Feedback, RLHF) 可能進一步放大此行為。由於偏好模型往往將『與使用者觀點一致』視為高品質回應,模型因此更可能選擇迎合而非糾正,即使使用者的立場並不正確。」(Sharma et al., 2025<ref>Sharma et al., "Towards Understanding Sycophancy in Language Models", Anthropic — https://arxiv.org/abs/2310.13548</ref>) | ||
== T == | == T == | ||