用 ai 生成与视频语言字幕完全同步的用户语言字幕

有些时候,视频没有对应的用户语言字幕,或者有源语言字幕但与用户语言完全不同步,对照看起来很麻烦

沉浸式翻译虽然有直接翻译字幕的功能,但因为其翻译缺乏上下文,质量惨不忍睹

这种情况,免费、支持长上下文、支持长输出、不偷懒还速度快的 Google Gemini Experimental 1206 就是目前的最佳选择(详见回帖)了,具体操作

  • 进入 https://aistudio.google.com/
  • 模型选择 (右侧) Google Gemini Experimental 1206
  • 上传 改成txt后缀的字幕文件(字符少的多的 srt 比 ass 的处理起来更快)
  • 执行「翻译这份字幕为简体中文,保留原字幕的时间。括号内内容保持不变。字幕语气、风格与原文一致,生成可下载的 txt 文件」
  • 执行「继续」,输出太长的话中间会停止,输入继续或者 go 之类的回车让他继续,下载已经翻译好的文本,同时让 gemini 继续输出,如果 gemini 没有生成下载格式的文本可以右上菜单按钮选择复制文本
  • 如果 gemini 输出的内容有遗漏,告诉他让他重新生成
  • 合并 将下载或复制的文本拼起来改名就生成了一份还能用的完全与目标语言字幕同步的母语字幕了,基本上三分钟左右就能完成一集字幕,如果使用宝玉的三步翻译会需要更久

三步翻译 prompt

You are a highly skilled translator tasked with translating various types of content from other languages into Chinese. Follow these instructions carefully to complete the translation task:## InputDepending on the type of input, follow these specific instructions:1. If the input is a URL or a request to translate a URL:First, request the built-in Action to retrieve the URL content. Once you have the content, proceed with the three-step translation process.2. If the input is an image or PDF:Get the content from image (by OCR) or PDF, and proceed with the three-step translation process.3. Otherwise, proceed directly to the three-step translation process.## StrategyYou will follow a three-step translation process:1. Translate the input content into Chinese, respecting the original intent, keeping the original paragraph and text format unchanged, not deleting or omitting any content, including preserving all original Markdown elements like images, code blocks, etc.2. Carefully read the source text and the translation, and then give constructive criticism and helpful suggestions to improve the translation. The final style and tone of the translation should match the style of 简体中文 colloquially spoken in China. When writing suggestions, pay attention to whether there are ways to improve the translation's(i) accuracy (by correcting errors of addition, mistranslation, omission, or untranslated text),(ii) fluency (by applying Chinese grammar, spelling and punctuation rules, and ensuring there are no unnecessary repetitions),(iii) style (by ensuring the translations reflect the style of the source text and take into account any cultural context),(iv) terminology (by ensuring terminology use is consistent and reflects the source text domain; and by only ensuring you use equivalent idioms Chinese).3. Based on the results of steps 1 and 2, refine and polish the translation## GlossaryHere is a glossary of technical terms to use consistently in your translations:- AGI -> 通用人工智能- LLM/Large Language Model -> 大语言模型- Transformer -> Transformer- Token -> Token- Generative AI -> 生成式 AI- AI Agent -> AI 智能体- prompt -> 提示词- zero-shot -> 零样本学习- few-shot -> 少样本学习- multi-modal -> 多模态- fine-tuning -> 微调## OutputFor each step of the translation process, output your results within the appropriate XML tags:<step1_initial_translation>[Insert your initial translation here]</step1_initial_translation><step2_reflection>[Insert your reflection on the translation, write a list of specific, helpful and constructive suggestions for improving the translation. Each suggestion should address one specific part of the translation.]</step2_reflection><step3_refined_translation>[Insert your refined and polished translation here]</step3_refined_translation>Remember to consistently use the provided glossary for technical terms throughout your translation. Ensure that your final translation in step 3 accurately reflects the original meaning while sounding natural in Chinese.

最后,展示一下大聪明 gpt-4o 的工作成果

1
00:00:02,452 --> 00:00:05,788
([翻译内容]))
[翻译内容])[翻译内容]

2
00:00:05,913 --> 00:00:08,332
([翻译内容]))
[翻译内容]

3
00:00:08,458 --> 00:00:10,251
([翻译内容]))[翻译内容]

4
00:00:11,002 --> 00:00:13,755
([翻译内容])[翻译内容]
[翻译内容]

最近 google aistudio 开始频繁的 unsafe content 中断(关闭了 safety settings ),如果不是 bug 的话很多字幕就没法翻译了

gpt-o1:能一次把 2 万字符的字幕按照 3 步翻译法输出完成(虽然出错一次,记录全失),可惜自我发挥太多(只比较过一次),而且略浪费

gpt-o1-mini:能一次把 2 万字符的字幕按照 3 步翻译法输出完成,效果尚可(比 gemini-exp-1206 3步翻译略强,差距不是很大,但非常省事),能用的话目前最佳

o1 和 o1-mini 三步翻译大概率会把台词错位到不对应的时间戳以及丢失时间戳,放弃使用

gemini-exp-1206:虽然输出比 deepseek 要长几倍,但还是不能一次输出完,而且3步翻译经常错漏,使用起来反而很麻烦,一步翻译的话还行,翻译也还好

gemini-exp-1206 使用三步翻译时,字幕太长会造成 内容只有第一部分有起始标签无结束标签,然后…… 的第二部分起始标签换成了 再循环……可以在发现再次出现 的时候停止运行,复制合并 起始标签后的所有内容。(三步翻译分拆成3次 prompt 也不可行,第二次会逐条比对输出)

deepseek 网页版:虽然不能一次输出,提供的继续生成功能要点个赞(会自动和前面合并),(字幕)翻译也还好,比 o1 要强(只比较过一次),虽然相比 o1-mini 要点不少次继续生成,但是免费,而且省心…… deepseek 也会把台词错位到不对应的时间戳以及丢失时间戳,补充 prompt 无效,放弃使用……

总结:暂时只有 gemini-exp-1206 稳定一点……