Internationalization Workflow
This document defines the standard process for completing English translations whenever new documentation or blog posts are added to this repository.
The site uses Docusaurus built-in i18n: zh-Hans is the default locale, and English is implemented through the i18n/en/ directory. The root-level I18N.md serves as the complete reference manual; this document focuses on what to do each time new content is created.
Core Principles
1. Chinese goes in docs/ and blog/, English goes in the mirrored i18n/en/ path
Chinese is the default locale and lives directly under the project root's docs/ and blog/ directories. English translations live under i18n/en/ in the corresponding plugin directories, with a strict 1:1 path mirror.
Docs mirroring rule:
docs/Linux/new-article.md
→ i18n/en/docusaurus-plugin-content-docs/current/Linux/new-article.md
docs/DataStructer/sorting-algorithms/sorting-basics.md
→ i18n/en/docusaurus-plugin-content-docs/current/DataStructer/sorting-algorithms/sorting-basics.md
Blog mirroring rule:
blog/2026-05-01-new-post.md
→ i18n/en/docusaurus-plugin-content-blog/2026-05-01-new-post.md
Filenames must match exactly, including case and extension.
2. Only translate human-readable text, not code or technical identifiers
Translation scope:
| Content | Translate? |
|---|---|
title, description | Yes |
sidebar_position, tags, authors, slug | Keep identical |
| Prose paragraphs, headings | Yes |
Code blocks (```) | No |
Inline code (`) | No |
Image URLs, JSX components, import / export | No |
| LaTeX formulas | No |
3. New directories must have a mirrored _category_.json
If a new directory is created (whether for docs or a topic), the English mirror path must contain a corresponding _category_.json with the same position and translated label and description.
// docs/NewDir/_category_.json
{
"label": "新目录",
"position": 12,
"link": {
"type": "generated-index",
"description": "中文描述。"
}
}
// i18n/en/.../NewDir/_category_.json
{
"label": "New Directory",
"position": 12,
"link": {
"type": "generated-index",
"description": "English description."
}
}
4. New tags must update tags.yml
If an article uses a new tag, the English tags.yml must also be updated. Keep permalink identical; translate label.
5. Every documentation update must include the English translation
Whether adding or modifying a document under docs/, the corresponding English version under i18n/en/ must be updated in sync. This applies to all .md and .mdx files under docs/, including workflow documents, LabNotes, and technical guides.
Specific requirements:
- If you modify a paragraph in the Chinese document, the corresponding English paragraph must be updated
- If you add a new section, the English version must add the same section
- If you delete content, the English version must also delete it
- Do not update only Chinese and leave the English version stale -- this is worse than not translating, because readers will see inconsistent content across locales
Workflow documents (like this file and life-blog-writing-workflow.md) are especially critical, because they serve as reference for future work. If the Chinese version updates a process but the English version does not, English readers will follow an outdated workflow.
# i18n/en/docusaurus-plugin-content-docs/current/tags.yml
New Tag:
label: New Tag
permalink: new-tag
Standard Workflow: Adding Documentation
Step 1: Create the Chinese document
Create the Chinese .md or .mdx file under docs/ as usual.
Step 2: Create the English translation
Create a file with the same name and path under i18n/en/docusaurus-plugin-content-docs/current/, translating the frontmatter and body.
Frontmatter comparison:
# Chinese
---
title: 自定义镜像与 Dockerfile 实践
sidebar_position: 6
description: 把临时调试、Dockerfile 构建整理成一套更稳妥的工作流。
---
# English
---
title: Custom Images and Dockerfile Practices
sidebar_position: 6
description: "Turning ad-hoc debugging and Dockerfile builds into a more reliable workflow."
---
Note: If description contains a colon, it must be quoted. Otherwise YAML parsing will fail.
Step 3: If it is a new directory, create the English _category_.json
See Core Principle 3 above.
Step 4: If new tags are used, update the English tags.yml
See Core Principle 4 above.
Step 5: If new tag names are in Chinese, refresh sidebar translations
npm run write-translations:en
Then edit i18n/en/docusaurus-plugin-content-docs/current.json to translate the new sidebar entries.
Standard Workflow: Adding Blog Posts
Step 1: Create the Chinese blog post
Create the Chinese .md or .mdx file under blog/.
Step 2: Create the English translation
Create a file with the same name and path under i18n/en/docusaurus-plugin-content-blog/:
blog/2026-05-01-new-post.md
→ i18n/en/docusaurus-plugin-content-blog/2026-05-01-new-post.md
Blog authors.yml and tags.yml do not need translation (already in English or using universal identifiers).
For MDX blog posts, export const data objects need their text values translated (alt, caption, etc.), but image URLs remain unchanged.
Step 3: Generate Chinese audio
Use the standalone project tts-blog-generator (located in the parent directory D:\Code\tts-blog-generator\):
cd ../tts-blog-generator
python generate.py
The script automatically scans all .md / .mdx files under blog/, extracts prose, calls the TTS API, converts to MP3, and uploads to OSS. Already-generated audio is skipped; use --force to regenerate.
Step 4: Generate English audio
python generate.py --lang en --blog-dir "../Dev-Knowledge-Base/i18n/en/docusaurus-plugin-content-blog"
English audio uses the Chloe voice, filenames are prefixed with en_, and stored under OSS Audio/blog/en/. Manifest keys are prefixed with en/ (e.g., en/agent-harness).
Step 5: Copy the manifest file to the project
cp output/blog-manifest.json ../Dev-Knowledge-Base/src/data/blogAudioManifest.json
cp output/blog-manifest.json ../Dev-Knowledge-Base/static/audio/blog/manifest.json
src/data/blogAudioManifest.json is the static import data source for the player component. static/audio/blog/manifest.json is the fallback access path. Both must be updated.
Do not copy output/manifest.json into the blog player. That file is the combined blog/docs manifest; blog playback should use output/blog-manifest.json.
Step 6: Test
cd ../Dev-Knowledge-Base
npm start
Visit the blog page and confirm:
- The play button is clickable
- Audio plays normally
- Multi-chunk audio transitions seamlessly (if applicable)
- The progress bar shows the correct total duration
- Switching locale uses the corresponding voice (Molly / Chloe)
Docs Audio (Docs TTS)
Documentation categories such as docs/LookAround/ also support TTS audio playback. Unlike blog audio, the docs audio player is automatically injected in DocItem/Layout, so MDX files do not need to import a player manually.
See Docs Audio Workflow for the complete maintenance procedure. This section keeps the i18n-specific checklist for adding or updating bilingual docs.
Architecture
DocsAudioPlayerreusesBlogAudioPlayer, but readssrc/data/docsAudioManifest.jsonand passeskeyPrefix="docs/"- Blog audio is also auto-injected via
src/theme/BlogPostPage/index.js, no manual import needed in MDX src/theme/DocItem/Layout/index.jsinjects the player for LookAround docs below breadcrumbs and above content- Chinese manifest keys are
docs/{slug}and English keys areen/docs/{slug} - Docs audio is hosted under
Audio/docs/andAudio/docs/en/, separate from blog audio underAudio/blog/
Generate docs audio
cd ../tts-blog-generator
# Chinese LookAround
python generate.py --type docs --lang zh --force --article-jobs 2
# English LookAround
python generate.py --type docs --lang en \
--blog-dir "../Dev-Knowledge-Base/i18n/en/docusaurus-plugin-content-docs/current/LookAround" \
--force --article-jobs 2
Concurrency is article-level only. Chunks inside one article always run serially as _001, _002, and so on, so audio order follows text order.
Single-article repair
python generate.py --type docs --lang zh \
--include mercedes-benz-g-class-history-category-industry-position \
--force --article-jobs 1 --chunk-char-limit 1200
For long Chinese articles with suspiciously short audio, truncation, or garbled-sounding narration at a specific timestamp, inspect the exact UTF-8 text from generate.extract_text() first, then reduce --chunk-char-limit and regenerate. The G-Class Chinese audio was repaired by replacing 7 long chunks with 15 smaller chunks around 1200 characters each.
Copy docs manifest
cp output/docs-manifest.json ../Dev-Knowledge-Base/src/data/docsAudioManifest.json
cp output/docs-manifest.json ../Dev-Knowledge-Base/static/audio/docs/manifest.json
Do not copy output/manifest.json into the docs player. That file is the combined blog/docs manifest; docs playback should use output/docs-manifest.json.
Test docs audio
Visit a LookAround article and confirm:
- The audio player appears below breadcrumbs
- The play button works
- Multi-chunk audio follows the URL order in the manifest
- The progress bar shows a reasonable total duration
- Non-LookAround docs do not show the player
- English locale loads
en/docs/{slug}audio
MDX Compatibility Pitfalls
The most common sources of build errors in English translations:
1. Bare < in prose
MDX interprets < as the start of a JSX tag. Mathematical comparisons must be escaped:
❌ Small data (<1M elements)
✅ Small data (≤1M elements)
✅ Small data (<1M elements)
2. Bare { in prose
MDX interprets { as a JSX expression. Set notation must be wrapped in LaTeX:
❌ The set {a, b, c} has 3 elements.
✅ The set $\{a, b, c\}$ has 3 elements.
3. Colons in YAML frontmatter
description containing a colon must be quoted:
❌ description: Three patterns: temporary, persistent, and troubleshooting.
✅ description: "Three patterns: temporary, persistent, and troubleshooting."
4. Curly braces in LaTeX
LaTeX \{...\} is still parsed as JSX by MDX. Ensure it is inside $...$ or $$...$$ delimiters:
❌ $[a]_R = \{x \in A \mid (a,x) \in R\}$ ← single $ may fail
✅ $$[a]_R = \{x \in A \mid (a,x) \in R\}$$ ← $$ is safer
5. Chinese comments inside code blocks
Comments inside code blocks should also be translated:
// ❌ int visited[MAX]; // 访问标记数组
// ✅ int visited[MAX]; // Visited marker array
Custom Pages and i18n
Docusaurus does not automatically translate custom React pages or theme overrides. These pages must read useDocusaurusContext().i18n.currentLocale and maintain their own localized copy.
Currently affected pages:
- Docs intro page (
src/components/DocsIntro/) - Travel home page
- Blog overview page
- Site homepage
If you modify or add visible text to these pages, you must handle both Chinese and English.
Build Verification
After every new translation, verify:
npm run build # Build all locales (zh-Hans + en)
npm run build:en # Build English only, for faster verification
The build must complete with zero errors. Common error sources:
- Unescaped colons in YAML frontmatter
- Unescaped
<and{in MDX prose - Missing
_category_.jsonin the English mirror path - Filename case mismatches (Windows is case-insensitive, but CI environments are not)
Pre-publish Checklist
After adding new content with synchronized translations, verify at minimum:
Docs and blog general:
- Chinese files are under
docs/orblog/ - English files are under the corresponding
i18n/en/mirror path - Filenames match exactly (including extension)
sidebar_positionis identical in both localestitleanddescriptionare translatedtags,authors,slugare kept identical- Code blocks are not translated
- New directories have a corresponding English
_category_.json - New tags have a corresponding English
tags.ymlentry - Colons in YAML
descriptionare quoted - Bare
<and{in prose are escaped or wrapped in LaTeX
Blog audio:
- Chinese audio generated via
python generate.py - English audio generated via
python generate.py --lang en --blog-dir "..." - Manifest file
blogAudioManifest.jsoncopied to bothsrc/data/andstatic/audio/blog/ - OSS anti-hotlinking whitelist includes both production domain and localhost
Docs audio:
- Chinese audio generated via
python generate.py --type docs --lang zh ... - English audio generated via
python generate.py --type docs --lang en --blog-dir "..." - Extracted TTS text was inspected with UTF-8 for long or repaired articles
- Long Chinese articles use a smaller
--chunk-char-limitwhen needed - Manifest file
docsAudioManifest.jsoncopied fromoutput/docs-manifest.jsonto bothsrc/data/andstatic/audio/docs/ DocItem/Layoutinjects the player for the target docs category- Target OSS audio URLs are reachable and return audio content
Build verification:
npm run buildpassesnpm run build:enpasses (quick English-only verification)