Gemini Text-to-Speech 使用指南

如何使用 AI 语音工具练习英语

🎯 目标

使用 Gemini 和其他 TTS 工具，将 CS50 学习材料转换为音频，帮助你：

✅ 学习专业术语的正确发音
✅ 提高英语听力理解能力
✅ 练习口语表达和教学能力
✅ 熟悉技术英语的节奏和语调

📱 方法 1: Google Cloud Text-to-Speech（在线测试）

免费在线 Demo

网址: https://cloud.google.com/text-to-speech

使用步骤

访问网站
- 打开上述链接
- 找到 “Try the demo” 或 “Demo” 按钮
选择语言和声音
- Language: English (United States) / en-US
- Voice name:
  - 男声推荐: en-US-Neural2-D (深沉、权威)
  - 女声推荐: en-US-Neural2-F (清晰、友好)
  - 其他选择: en-US-Wavenet-D/F (也很自然)
输入文本
- 从 pronunciation-guide.md 复制术语和例句
- 从 role-play-dialogue.md 复制对话片段
- 一次输入不要太长（建议 1000 字符以内）
调整设置（可选）
- Speaking rate: 1.0 (正常速度)，初学者可设置 0.75 (慢速)
- Pitch: 0.0 (默认)
- Audio encoding: MP3 (方便下载)
播放和下载
- 点击 “SPEAK” 按钮
- 听完整发音
- 点击下载按钮保存为 MP3

示例文本（直接复制使用）

基础术语练习

Algorithm. This is a step-by-step procedure for solving a problem.
Function. This is a reusable block of code.
Variable. This stores a value that can change.
Integer. This is a whole number without decimals.
Array. This is a collection of elements stored in memory.
Loop. This allows us to repeat code multiple times.

代码朗读练习

For loop. Initialize i to zero. While i is less than height, increment i.
Open brace.
Print a hash symbol.
Close brace.

对话片段练习

Professor: Can you see a pattern here?
Student: Yes! As we go down each row, the number of spaces decreases.
Professor: Exactly right! Now, can you express that mathematically?

📱 方法 2: Gemini Advanced（如果你有订阅）

使用 Gemini 生成音频

网址: https://gemini.google.com

方法 A: 直接对话

登录 Gemini
- 访问 gemini.google.com
- 使用你的 Google 账号登录

请求 TTS 内容

提示词示例:

Please read the following text aloud in a clear, professional teaching voice:
   
[粘贴你的文本]

调整和重新生成

如果发音不对，可以：

That was good, but please:
- Slow down a bit
- Emphasize the word "algorithm"
- Pause between sentences

方法 B: 生成 SSML 标记

SSML (Speech Synthesis Markup Language) 可以精确控制发音。

提示词:

Convert this text into SSML format for text-to-speech, with appropriate pauses and emphasis:

[你的文本]

示例 SSML 输出:

<speak>
  <p>
    <s>Let's understand the <emphasis level="strong">greedy algorithm</emphasis>.</s>
    <break time="500ms"/>
    <s>At each step, we make the <emphasis>best</emphasis> choice available.</s>
  </p>
</speak>

然后将 SSML 用于 Google Cloud TTS。

📱 方法 3: 浏览器扩展（最方便）

Chrome / Edge 扩展推荐

1. Read Aloud (推荐⭐⭐⭐⭐⭐)

安装: Chrome Web Store 搜索 “Read Aloud: A Text to Speech Voice Reader”
优点:
- 免费
- 多种声音选择（包括 Google 的高质量声音）
- 可以直接朗读网页上的文本
- 支持调速

使用方法:

安装扩展
在你的 .md 文件（用浏览器打开）中选中文本
右键 → “Read Aloud”
或点击工具栏图标朗读整个页面

设置:

点击扩展图标 → Settings
Voice: 选择 Google US English 或 Microsoft 系列
Rate: 1.0 (可调整)
Pitch: 1.0

2. Natural Reader

安装: Chrome Web Store 搜索 “Natural Reader”
优点: 声音自然，支持导出音频
缺点: 免费版有限制

3. Microsoft Edge 内置功能（如果用 Edge）

直接在浏览器中右键 → “Read aloud”
使用 Microsoft 的高质量 TTS
完全免费

📱 方法 4: 本地工具（离线使用）

macOS

使用内置 `say` 命令

终端命令:

# 基础用法
say "Hello, this is a test"

# 使用特定声音
say -v Samantha "Algorithm is a step-by-step procedure"

# 查看所有可用声音
say -v '?'

# 保存为音频文件
say -v Samantha "Your text here" -o output.aiff

# 转换为 MP3（需要 ffmpeg）
ffmpeg -i output.aiff output.mp3

推荐声音:

Samantha (美式英语，女声)
Alex (美式英语，男声)
Karen (澳式英语，女声)

批量转换脚本:

#!/bin/bash
# 保存为 tts.sh

# 从文件读取并转换为音频
cat pronunciation-guide.md | say -v Samantha -o pronunciation.aiff

# 转换为 MP3
ffmpeg -i pronunciation.aiff pronunciation.mp3

Windows

使用 PowerShell

PowerShell 脚本:

# 基础用法
Add-Type -AssemblyName System.Speech
$speak = New-Object System.Speech.Synthesis.SpeechSynthesizer
$speak.Speak("Hello, this is a test")

# 保存为音频文件
$speak.SetOutputToWaveFile("output.wav")
$speak.Speak("Your text here")
$speak.SetOutputToDefaultAudioDevice()

完整脚本 (tts.ps1):

# TTS Script for Windows
param(
    [string]$InputFile = "text.txt",
    [string]$OutputFile = "output.wav"
)

Add-Type -AssemblyName System.Speech
$speak = New-Object System.Speech.Synthesis.SpeechSynthesizer

# 读取文件
$text = Get-Content $InputFile -Raw

# 设置输出
$speak.SetOutputToWaveFile($OutputFile)
$speak.Speak($text)
$speak.SetOutputToDefaultAudioDevice()

Write-Host "Audio saved to $OutputFile"

使用:

.\tts.ps1 -InputFile "pronunciation-guide.md" -OutputFile "pronunciation.wav"

Linux

使用 `espeak` 或 `festival`

安装:

# Ubuntu/Debian
sudo apt install espeak espeak-ng

# 或者安装 festival
sudo apt install festival

使用:

# espeak
espeak "Hello, this is a test"

# 保存为音频
espeak "Your text here" -w output.wav

# 从文件读取
espeak -f text.txt -w output.wav

# festival
echo "Hello, this is a test" | festival --tts

📱 方法 5: Python 脚本（程序员方式）

使用 Google Text-to-Speech (gTTS) 库

安装

pip install gtts

基础脚本

from gtts import gTTS
import os

# 简单示例
text = "Algorithm is a step-by-step procedure for solving a problem."
tts = gTTS(text=text, lang='en', slow=False)
tts.save("output.mp3")
os.system("start output.mp3")  # Windows
# os.system("open output.mp3")  # macOS
# os.system("xdg-open output.mp3")  # Linux

完整脚本（从文件读取）

# tts_converter.py
from gtts import gTTS
import os
import sys

def text_to_speech(input_file, output_file, slow=False):
    """
    Convert text file to speech
    
    Args:
        input_file: Path to input text file
        output_file: Path to output MP3 file
        slow: Whether to speak slowly (default: False)
    """
    try:
        # Read the text file
        with open(input_file, 'r', encoding='utf-8') as f:
            text = f.read()
        
        # Create TTS object
        tts = gTTS(text=text, lang='en', slow=slow)
        
        # Save as MP3
        tts.save(output_file)
        
        print(f"✅ Audio saved to {output_file}")
        
        # Optionally play it
        play = input("Play the audio? (y/n): ")
        if play.lower() == 'y':
            os.system(f"start {output_file}")  # Windows
            # os.system(f"open {output_file}")  # macOS
            # os.system(f"xdg-open {output_file}")  # Linux
            
    except FileNotFoundError:
        print(f"❌ Error: File {input_file} not found")
    except Exception as e:
        print(f"❌ Error: {e}")

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python tts_converter.py <input_file> [output_file] [slow]")
        print("Example: python tts_converter.py pronunciation-guide.md output.mp3")
        sys.exit(1)
    
    input_file = sys.argv[1]
    output_file = sys.argv[2] if len(sys.argv) > 2 else "output.mp3"
    slow = sys.argv[3].lower() == "true" if len(sys.argv) > 3 else False
    
    text_to_speech(input_file, output_file, slow)

使用:

# 正常速度
python tts_converter.py pronunciation-guide.md pronunciation.mp3

# 慢速（适合初学者）
python tts_converter.py pronunciation-guide.md pronunciation-slow.mp3 true

高级脚本（分段处理，避免文件太大）

# tts_advanced.py
from gtts import gTTS
import os
import re

def split_text(text, max_length=5000):
    """Split text into chunks at sentence boundaries"""
    sentences = re.split(r'(?<=[.!?])\s+', text)
    chunks = []
    current_chunk = ""
    
    for sentence in sentences:
        if len(current_chunk) + len(sentence) < max_length:
            current_chunk += sentence + " "
        else:
            chunks.append(current_chunk.strip())
            current_chunk = sentence + " "
    
    if current_chunk:
        chunks.append(current_chunk.strip())
    
    return chunks

def convert_long_text(input_file, output_dir="audio_output"):
    """Convert long text file to multiple MP3 files"""
    # Create output directory
    os.makedirs(output_dir, exist_ok=True)
    
    # Read file
    with open(input_file, 'r', encoding='utf-8') as f:
        text = f.read()
    
    # Split into chunks
    chunks = split_text(text)
    
    print(f"📝 Split into {len(chunks)} chunks")
    
    # Convert each chunk
    for i, chunk in enumerate(chunks, 1):
        output_file = os.path.join(output_dir, f"part_{i:03d}.mp3")
        print(f"🔊 Converting chunk {i}/{len(chunks)}...")
        
        tts = gTTS(text=chunk, lang='en', slow=False)
        tts.save(output_file)
    
    print(f"✅ All audio files saved to {output_dir}/")

if __name__ == "__main__":
    import sys
    if len(sys.argv) < 2:
        print("Usage: python tts_advanced.py <input_file> [output_dir]")
        sys.exit(1)
    
    input_file = sys.argv[1]
    output_dir = sys.argv[2] if len(sys.argv) > 2 else "audio_output"
    
    convert_long_text(input_file, output_dir)

使用:

# 转换长文本为多个音频文件
python tts_advanced.py role-play-dialogue.md audio_parts/

# 结果: audio_parts/part_001.mp3, part_002.mp3, ...

🎯 推荐学习流程

Week 1: 熟悉工具

Day 1-2: 设置工具

测试 Google Cloud TTS Demo
安装浏览器扩展（Read Aloud）
测试本地 TTS 工具

Day 3-4: 基础术语

用 TTS 朗读 pronunciation-guide.md 的 Level 1 术语
跟读并录音
对比原音和自己的发音

Day 5-7: 运算符和符号

朗读 Level 2 术语
练习完整代码行的朗读
用 TTS 检查发音

Week 2: 对话练习

Day 1-3: 听对话

用 TTS 生成 role-play-dialogue.md 的音频
听 3-5 遍，理解内容
跟读教授的部分

Day 4-5: 角色扮演

扮演学生，对着 TTS 的教授回应
扮演教授，录制自己的讲解
对比 TTS 和自己的表达

Day 6-7: 自由发挥

根据 teaching-script-problem-set-1.md 自己讲解
不看文本，只看代码来讲解
录制 5-10 分钟的讲解视频

Week 3-4: 创作和输出

Day 1-7: 制作自己的教学音频

🔧 实用技巧

1. 优化 TTS 输出质量

使用标点控制节奏

Bad:
"for int i equals zero while i less than n increment i"

Good:
"For loop. Initialize i to zero. While i is less than n, increment i."

使用 SSML 控制语速和停顿

<speak>
  <prosody rate="slow">This is spoken slowly.</prosody>
  <break time="1s"/>
  <prosody rate="fast">This is spoken quickly.</prosody>
</speak>

拼写出难读的词

Bad: "Luhn's algorithm" (可能读错)

Good: "Loon's algorithm" 或 "L-U-H-N apostrophe S algorithm"

2. 创建个性化学习材料

技巧 A: 问答卡片

Question: What is a greedy algorithm?
[pause for thinking]
Answer: A greedy algorithm makes the best choice at each step.
[pause]

Question: Give an example of a greedy algorithm.
[pause]
Answer: The coin change problem uses a greedy algorithm.

技巧 B: 填空练习

Listen and fill in the blanks:

"For loop. Initialize i to _____ . While i is less than _____, 
increment _____ ."

[pause for 5 seconds]

Answer: zero, height, i.

技巧 C: 听写练习

Listen carefully and write down the code:

[TTS reads: "int height equals get_int, open paren, quote Height colon 
space quote, close paren, semicolon"]

[pause]

Correct answer: int height = get_int("Height: ");

3. 批量处理

Bash 脚本（Linux/macOS）

#!/bin/bash
# batch_tts.sh

# 将多个 markdown 文件转换为音频
for file in *.md; do
    output="${file%.md}.mp3"
    echo "Converting $file to $output..."
    python tts_converter.py "$file" "$output"
done

echo "All files converted!"

使用方法:

chmod +x batch_tts.sh
./batch_tts.sh

🎵 音频文件管理建议

文件命名规范

cs50-ps1-pronunciation-guide.mp3
cs50-ps1-role-play-professor.mp3
cs50-ps1-role-play-student.mp3
cs50-ps1-teaching-script-full.mp3
cs50-ps1-teaching-script-part1.mp3

组织结构

audio/
├── pronunciation/
│   ├── basics.mp3
│   ├── operators.mp3
│   └── keywords.mp3
├── dialogues/
│   ├── ps1-intro.mp3
│   ├── ps1-mario.mp3
│   ├── ps1-cash.mp3
│   └── ps1-credit.mp3
└── lectures/
    ├── week1-full.mp3
    ├── week1-part1.mp3
    └── week1-part2.mp3

播放列表（Playlist）

创建 .m3u 文件:

# cs50-ps1-complete.m3u
#EXTM3U
#EXTINF:300,PS1 Pronunciation Guide
pronunciation/basics.mp3
#EXTINF:600,PS1 Role Play Dialogue
dialogues/ps1-intro.mp3
#EXTINF:1200,PS1 Teaching Script
lectures/week1-full.mp3

📊 进度追踪

学习日志模板

# TTS Learning Log

## Date: 2025-12-14

### Today's Focus
- [ ] Pronunciation Guide - Section 1-5
- [ ] Role Play - Scene 1-3

### Time Spent
- Listening: 30 minutes
- Speaking practice: 45 minutes
- Recording: 15 minutes

### New Words Learned
1. Algorithm - /ˈælɡəˌrɪðəm/
2. Iterate - /ˈɪtəreɪt/
3. Modulo - /ˈmɒdjʊloʊ/

### Challenges
- Struggled with "Luhn's" pronunciation
- Need more practice with nested loop explanation

### Next Steps
- Review Luhn's algorithm section
- Practice explaining bubble sort
- Record myself teaching Mario problem

💡 故障排除

常见问题

Q: TTS 发音不准确怎么办？

尝试改写句子，使用更简单的词
使用音标或拼写形式（如 “L-U-H-N”）
换一个 TTS 引擎试试
使用 SSML 指定发音

Q: 音频文件太大怎么办？

分段处理，每个文件 5-10 分钟
降低比特率（如 64kbps 而不是 128kbps）
使用压缩格式（MP3 而不是 WAV）

Q: 如何让 TTS 听起来更自然？

选择 Neural/Wavenet 类型的声音
添加标点符号控制停顿
使用对话式语言而不是书面语
尝试不同的语速（0.9-1.1 倍速）

Q: 免费 TTS 有字符限制怎么办？

使用批量处理脚本分段转换
使用本地工具（macOS say，Windows PowerShell）
使用浏览器扩展（Read Aloud 无限制）

🌟 高级技巧

1. 创建交互式学习体验

使用 Python 创建命令行学习工具:

# interactive_learning.py
from gtts import gTTS
import os
import time

def quiz_question(question, answer):
    """Ask a question with TTS and wait for answer"""
    print(f"\n❓ {question}")
    
    # Generate and play question audio
    tts = gTTS(text=question, lang='en')
    tts.save("temp_question.mp3")
    os.system("start temp_question.mp3")  # Windows
    
    # Wait for user input
    time.sleep(3)  # Give time for audio to play
    user_answer = input("\n💭 Your answer: ")
    
    # Check answer
    if user_answer.lower().strip() == answer.lower().strip():
        print("✅ Correct!")
        tts = gTTS(text="Correct! Well done.", lang='en')
    else:
        print(f"❌ Incorrect. The answer is: {answer}")
        tts = gTTS(text=f"The correct answer is {answer}", lang='en')
    
    tts.save("temp_feedback.mp3")
    os.system("start temp_feedback.mp3")
    time.sleep(2)

# Example usage
questions = [
    ("What operator do we use for modulo in C?", "percent sign"),
    ("What does i plus plus do?", "increments i"),
    ("What function validates credit cards?", "Luhn's algorithm"),
]

for q, a in questions:
    quiz_question(q, a)

2. 语音识别反馈（Speech-to-Text）

如果你想检查自己的发音:

# speech_recognition_practice.py
import speech_recognition as sr
from gtts import gTTS
import os

def pronunciation_practice(text):
    """Practice pronunciation with feedback"""
    # Play the text
    print(f"🔊 Listen: {text}")
    tts = gTTS(text=text, lang='en')
    tts.save("target.mp3")
    os.system("start target.mp3")
    
    # Wait for user to speak
    input("\n🎤 Press Enter when ready to speak...")
    
    # Record user
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("🔴 Recording... Speak now!")
        audio = r.listen(source)
    
    # Recognize
    try:
        result = r.recognize_google(audio)
        print(f"📝 You said: {result}")
        
        if result.lower() == text.lower():
            print("✅ Perfect pronunciation!")
        else:
            print("⚠️ Close, but not quite. Try again.")
    except:
        print("❌ Couldn't understand. Please try again.")

# Example
pronunciation_practice("Algorithm is a step-by-step procedure.")

🎓 总结

学习优先级

听力优先 (40%) - 多听标准发音
跟读练习 (30%) - 模仿语音语调
独立表达 (20%) - 不看文本讲解
反馈改进 (10%) - 录音并对比

📞 需要帮助？

如果在使用过程中遇到问题：

查看本文档的”故障排除”部分
搜索工具的官方文档
在 CS50 Ed Discussion 提问
使用 AI 助手（ChatGPT, Claude, Gemini）调试

祝学习顺利！🎉

最后提示:

🎯 坚持每天练习 30 分钟
📝 记录学习日志
🎙️ 不要害怕录音（没人会听到！）
🤝 找伙伴一起练习效果更好
🎉 享受学习过程！