Internationalization Workflow

This document defines the standard process for completing English translations whenever new documentation or blog posts are added to this repository.

The site uses Docusaurus built-in i18n: zh-Hans is the default locale, and English is implemented through the i18n/en/ directory. The root-level I18N.md serves as the complete reference manual; this document focuses on what to do each time new content is created.

Core Principles

1. Chinese goes in `docs/` and `blog/`, English goes in the mirrored `i18n/en/` path

Chinese is the default locale and lives directly under the project root's docs/ and blog/ directories. English translations live under i18n/en/ in the corresponding plugin directories, with a strict 1:1 path mirror.

Docs mirroring rule:

docs/Linux/new-article.md
  → i18n/en/docusaurus-plugin-content-docs/current/Linux/new-article.md

docs/DataStructer/sorting-algorithms/sorting-basics.md
  → i18n/en/docusaurus-plugin-content-docs/current/DataStructer/sorting-algorithms/sorting-basics.md

Blog mirroring rule:

blog/2026-05-01-new-post.md
  → i18n/en/docusaurus-plugin-content-blog/2026-05-01-new-post.md

Filenames must match exactly, including case and extension.

2. Only translate human-readable text, not code or technical identifiers

Translation scope:

Content	Translate?
`title`, `description`	Yes
`sidebar_position`, `tags`, `authors`, `slug`	Keep identical
Prose paragraphs, headings	Yes
Code blocks (```)	No
Inline code (`)	No
Image URLs, JSX components, `import` / `export`	No
LaTeX formulas	No

3. New directories must have a mirrored `_category_.json`

If a new directory is created (whether for docs or a topic), the English mirror path must contain a corresponding _category_.json with the same position and translated label and description.

// docs/NewDir/_category_.json
{
  "label": "新目录",
  "position": 12,
  "link": {
    "type": "generated-index",
    "description": "中文描述。"
  }
}

// i18n/en/.../NewDir/_category_.json
{
  "label": "New Directory",
  "position": 12,
  "link": {
    "type": "generated-index",
    "description": "English description."
  }
}

4. New tags must update `tags.yml`

If an article uses a new tag, the English tags.yml must also be updated. Keep permalink identical; translate label.

5. Every documentation update must include the English translation

Whether adding or modifying a document under docs/, the corresponding English version under i18n/en/ must be updated in sync. This applies to all .md and .mdx files under docs/, including workflow documents, LabNotes, and technical guides.

Specific requirements:

If you modify a paragraph in the Chinese document, the corresponding English paragraph must be updated
If you add a new section, the English version must add the same section
If you delete content, the English version must also delete it
Do not update only Chinese and leave the English version stale -- this is worse than not translating, because readers will see inconsistent content across locales

Workflow documents (like this file and life-blog-writing-workflow.md) are especially critical, because they serve as reference for future work. If the Chinese version updates a process but the English version does not, English readers will follow an outdated workflow.

# i18n/en/docusaurus-plugin-content-docs/current/tags.yml
New Tag:
  label: New Tag
  permalink: new-tag

Standard Workflow: Adding Documentation

Step 1: Create the Chinese document

Create the Chinese .md or .mdx file under docs/ as usual.

Step 2: Create the English translation

Create a file with the same name and path under i18n/en/docusaurus-plugin-content-docs/current/, translating the frontmatter and body.

Frontmatter comparison:

# Chinese
---
title: 自定义镜像与 Dockerfile 实践
sidebar_position: 6
description: 把临时调试、Dockerfile 构建整理成一套更稳妥的工作流。
---

# English
---
title: Custom Images and Dockerfile Practices
sidebar_position: 6
description: "Turning ad-hoc debugging and Dockerfile builds into a more reliable workflow."
---

Note: If description contains a colon, it must be quoted. Otherwise YAML parsing will fail.

Step 3: If it is a new directory, create the English `_category_.json`

See Core Principle 3 above.

Step 4: If new tags are used, update the English `tags.yml`

See Core Principle 4 above.

npm run write-translations:en

Then edit i18n/en/docusaurus-plugin-content-docs/current.json to translate the new sidebar entries.

Standard Workflow: Adding Blog Posts

Step 1: Create the Chinese blog post

Create the Chinese .md or .mdx file under blog/.

Step 2: Create the English translation

Create a file with the same name and path under i18n/en/docusaurus-plugin-content-blog/:

blog/2026-05-01-new-post.md
  → i18n/en/docusaurus-plugin-content-blog/2026-05-01-new-post.md

Blog authors.yml and tags.yml do not need translation (already in English or using universal identifiers).

For MDX blog posts, export const data objects need their text values translated (alt, caption, etc.), but image URLs remain unchanged.

Step 3: Generate Chinese audio

Use the standalone project tts-blog-generator (located in the parent directory D:\Code\tts-blog-generator\):

cd ../tts-blog-generator
python generate.py

The script automatically scans all .md / .mdx files under blog/, extracts prose, calls the TTS API, converts to MP3, and uploads to OSS. Already-generated audio is skipped; use --force to regenerate.

Step 4: Generate English audio

python generate.py --lang en --blog-dir "../Dev-Knowledge-Base/i18n/en/docusaurus-plugin-content-blog"

English audio uses the Chloe voice, filenames are prefixed with en_, and stored under OSS Audio/blog/en/. Manifest keys are prefixed with en/ (e.g., en/agent-harness).

Step 5: Copy the manifest file to the project

cp output/blog-manifest.json ../Dev-Knowledge-Base/src/data/blogAudioManifest.json
cp output/blog-manifest.json ../Dev-Knowledge-Base/static/audio/blog/manifest.json

src/data/blogAudioManifest.json is the static import data source for the player component. static/audio/blog/manifest.json is the fallback access path. Both must be updated. Do not copy output/manifest.json into the blog player. That file is the combined blog/docs manifest; blog playback should use output/blog-manifest.json.

Step 6: Test

cd ../Dev-Knowledge-Base
npm start

Visit the blog page and confirm:

The play button is clickable
Audio plays normally
Multi-chunk audio transitions seamlessly (if applicable)
The progress bar shows the correct total duration
Switching locale uses the corresponding voice (Molly / Chloe)

Docs Audio (Docs TTS)

Documentation categories such as docs/LookAround/ also support TTS audio playback. Unlike blog audio, the docs audio player is automatically injected in DocItem/Layout, so MDX files do not need to import a player manually.

See Docs Audio Workflow for the complete maintenance procedure. This section keeps the i18n-specific checklist for adding or updating bilingual docs.

Architecture

DocsAudioPlayer reuses BlogAudioPlayer, but reads src/data/docsAudioManifest.json and passes keyPrefix="docs/"
Blog audio is also auto-injected via src/theme/BlogPostPage/index.js, no manual import needed in MDX
src/theme/DocItem/Layout/index.js injects the player for LookAround docs below breadcrumbs and above content
Chinese manifest keys are docs/{slug} and English keys are en/docs/{slug}
Docs audio is hosted under Audio/docs/ and Audio/docs/en/, separate from blog audio under Audio/blog/

Generate docs audio

cd ../tts-blog-generator

# Chinese LookAround
python generate.py --type docs --lang zh --force --article-jobs 2

# English LookAround
python generate.py --type docs --lang en \
  --blog-dir "../Dev-Knowledge-Base/i18n/en/docusaurus-plugin-content-docs/current/LookAround" \
  --force --article-jobs 2

Concurrency is article-level only. Chunks inside one article always run serially as _001, _002, and so on, so audio order follows text order.

Single-article repair

python generate.py --type docs --lang zh \
  --include mercedes-benz-g-class-history-category-industry-position \
  --force --article-jobs 1 --chunk-char-limit 1200

For long Chinese articles with suspiciously short audio, truncation, or garbled-sounding narration at a specific timestamp, inspect the exact UTF-8 text from generate.extract_text() first, then reduce --chunk-char-limit and regenerate. The G-Class Chinese audio was repaired by replacing 7 long chunks with 15 smaller chunks around 1200 characters each.

Copy docs manifest

cp output/docs-manifest.json ../Dev-Knowledge-Base/src/data/docsAudioManifest.json
cp output/docs-manifest.json ../Dev-Knowledge-Base/static/audio/docs/manifest.json

Do not copy output/manifest.json into the docs player. That file is the combined blog/docs manifest; docs playback should use output/docs-manifest.json.

Test docs audio

Visit a LookAround article and confirm:

The audio player appears below breadcrumbs
The play button works
Multi-chunk audio follows the URL order in the manifest
The progress bar shows a reasonable total duration
Non-LookAround docs do not show the player
English locale loads en/docs/{slug} audio

MDX Compatibility Pitfalls

The most common sources of build errors in English translations:

1. Bare `<` in prose

MDX interprets < as the start of a JSX tag. Mathematical comparisons must be escaped:

❌ Small data (<1M elements)
✅ Small data (≤1M elements)
✅ Small data (&lt;1M elements)

2. Bare `{` in prose

MDX interprets { as a JSX expression. Set notation must be wrapped in LaTeX:

❌ The set {a, b, c} has 3 elements.
✅ The set $\{a, b, c\}$ has 3 elements.

3. Colons in YAML frontmatter

description containing a colon must be quoted:

❌ description: Three patterns: temporary, persistent, and troubleshooting.
✅ description: "Three patterns: temporary, persistent, and troubleshooting."

4. Curly braces in LaTeX

LaTeX \{...\} is still parsed as JSX by MDX. Ensure it is inside $...$ or $$...$$ delimiters:

❌ $[a]_R = \{x \in A \mid (a,x) \in R\}$   ← single $ may fail
✅ $$[a]_R = \{x \in A \mid (a,x) \in R\}$$  ← $$ is safer

5. Chinese comments inside code blocks

Comments inside code blocks should also be translated:

// ❌ int visited[MAX]; // 访问标记数组
// ✅ int visited[MAX]; // Visited marker array

Custom Pages and i18n

Docusaurus does not automatically translate custom React pages or theme overrides. These pages must read useDocusaurusContext().i18n.currentLocale and maintain their own localized copy.

Currently affected pages:

Docs intro page (src/components/DocsIntro/)
Travel home page
Blog overview page
Site homepage

If you modify or add visible text to these pages, you must handle both Chinese and English.

Build Verification

After every new translation, verify:

npm run build       # Build all locales (zh-Hans + en)
npm run build:en    # Build English only, for faster verification

The build must complete with zero errors. Common error sources:

Unescaped colons in YAML frontmatter
Unescaped < and { in MDX prose
Missing _category_.json in the English mirror path
Filename case mismatches (Windows is case-insensitive, but CI environments are not)

Pre-publish Checklist

After adding new content with synchronized translations, verify at minimum:

Docs and blog general:

Chinese files are under docs/ or blog/
English files are under the corresponding i18n/en/ mirror path
Filenames match exactly (including extension)
sidebar_position is identical in both locales
title and description are translated
tags, authors, slug are kept identical
Code blocks are not translated
New directories have a corresponding English _category_.json
New tags have a corresponding English tags.yml entry
Colons in YAML description are quoted
Bare < and { in prose are escaped or wrapped in LaTeX

Blog audio:

Chinese audio generated via python generate.py
English audio generated via python generate.py --lang en --blog-dir "..."
Manifest file blogAudioManifest.json copied to both src/data/ and static/audio/blog/
OSS anti-hotlinking whitelist includes both production domain and localhost

Docs audio:

Chinese audio generated via python generate.py --type docs --lang zh ...
English audio generated via python generate.py --type docs --lang en --blog-dir "..."
Extracted TTS text was inspected with UTF-8 for long or repaired articles
Long Chinese articles use a smaller --chunk-char-limit when needed
Manifest file docsAudioManifest.json copied from output/docs-manifest.json to both src/data/ and static/audio/docs/
DocItem/Layout injects the player for the target docs category
Target OSS audio URLs are reachable and return audio content

Build verification:

npm run build passes
npm run build:en passes (quick English-only verification)

Core Principles​

1. Chinese goes in docs/ and blog/, English goes in the mirrored i18n/en/ path​

2. Only translate human-readable text, not code or technical identifiers​

3. New directories must have a mirrored _category_.json​

4. New tags must update tags.yml​

5. Every documentation update must include the English translation​

Standard Workflow: Adding Documentation​

Step 1: Create the Chinese document​

Step 2: Create the English translation​

Step 3: If it is a new directory, create the English _category_.json​

Step 4: If new tags are used, update the English tags.yml​

Step 5: If new tag names are in Chinese, refresh sidebar translations​

Standard Workflow: Adding Blog Posts​

Step 1: Create the Chinese blog post​

Step 2: Create the English translation​

Step 3: Generate Chinese audio​

Step 4: Generate English audio​

Step 5: Copy the manifest file to the project​

Step 6: Test​

Docs Audio (Docs TTS)​

Architecture​

Generate docs audio​

Single-article repair​

Copy docs manifest​

Test docs audio​

MDX Compatibility Pitfalls​

1. Bare < in prose​

2. Bare { in prose​

3. Colons in YAML frontmatter​

4. Curly braces in LaTeX​

5. Chinese comments inside code blocks​

Custom Pages and i18n​

Build Verification​

Pre-publish Checklist​

Core Principles

1. Chinese goes in `docs/` and `blog/`, English goes in the mirrored `i18n/en/` path

2. Only translate human-readable text, not code or technical identifiers

3. New directories must have a mirrored `_category_.json`

4. New tags must update `tags.yml`

5. Every documentation update must include the English translation

Standard Workflow: Adding Documentation

Step 1: Create the Chinese document

Step 2: Create the English translation

Step 3: If it is a new directory, create the English `_category_.json`

Step 4: If new tags are used, update the English `tags.yml`

Step 5: If new tag names are in Chinese, refresh sidebar translations

Standard Workflow: Adding Blog Posts

Step 1: Create the Chinese blog post

Step 2: Create the English translation

Step 3: Generate Chinese audio

Step 4: Generate English audio

Step 5: Copy the manifest file to the project

Step 6: Test

Docs Audio (Docs TTS)

Architecture

Generate docs audio

Single-article repair

Copy docs manifest

Test docs audio

MDX Compatibility Pitfalls

1. Bare `<` in prose

2. Bare `{` in prose

3. Colons in YAML frontmatter

4. Curly braces in LaTeX

5. Chinese comments inside code blocks

Custom Pages and i18n

Build Verification

Pre-publish Checklist