Skip to main content

Internationalization Workflow

This document defines the standard process for completing English translations whenever new documentation or blog posts are added to this repository.

The site uses Docusaurus built-in i18n: zh-Hans is the default locale, and English is implemented through the i18n/en/ directory. The root-level I18N.md serves as the complete reference manual; this document focuses on what to do each time new content is created.

Core Principles

1. Chinese goes in docs/ and blog/, English goes in the mirrored i18n/en/ path

Chinese is the default locale and lives directly under the project root's docs/ and blog/ directories. English translations live under i18n/en/ in the corresponding plugin directories, with a strict 1:1 path mirror.

Docs mirroring rule:

docs/Linux/new-article.md
→ i18n/en/docusaurus-plugin-content-docs/current/Linux/new-article.md

docs/DataStructer/sorting-algorithms/sorting-basics.md
→ i18n/en/docusaurus-plugin-content-docs/current/DataStructer/sorting-algorithms/sorting-basics.md

Blog mirroring rule:

blog/2026-05-01-new-post.md
→ i18n/en/docusaurus-plugin-content-blog/2026-05-01-new-post.md

Filenames must match exactly, including case and extension.

2. Only translate human-readable text, not code or technical identifiers

Translation scope:

ContentTranslate?
title, descriptionYes
sidebar_position, tags, authors, slugKeep identical
Prose paragraphs, headingsYes
Code blocks (```)No
Inline code (`)No
Image URLs, JSX components, import / exportNo
LaTeX formulasNo

3. New directories must have a mirrored _category_.json

If a new directory is created (whether for docs or a topic), the English mirror path must contain a corresponding _category_.json with the same position and translated label and description.

// docs/NewDir/_category_.json
{
"label": "新目录",
"position": 12,
"link": {
"type": "generated-index",
"description": "中文描述。"
}
}

// i18n/en/.../NewDir/_category_.json
{
"label": "New Directory",
"position": 12,
"link": {
"type": "generated-index",
"description": "English description."
}
}

4. New tags must update tags.yml

If an article uses a new tag, the English tags.yml must also be updated. Keep permalink identical; translate label.

5. Every documentation update must include the English translation

Whether adding or modifying a document under docs/, the corresponding English version under i18n/en/ must be updated in sync. This applies to all .md and .mdx files under docs/, including workflow documents, LabNotes, and technical guides.

Specific requirements:

  • If you modify a paragraph in the Chinese document, the corresponding English paragraph must be updated
  • If you add a new section, the English version must add the same section
  • If you delete content, the English version must also delete it
  • Do not update only Chinese and leave the English version stale -- this is worse than not translating, because readers will see inconsistent content across locales

Workflow documents (like this file and life-blog-writing-workflow.md) are especially critical, because they serve as reference for future work. If the Chinese version updates a process but the English version does not, English readers will follow an outdated workflow.

# i18n/en/docusaurus-plugin-content-docs/current/tags.yml
New Tag:
label: New Tag
permalink: new-tag

Standard Workflow: Adding Documentation

Step 1: Create the Chinese document

Create the Chinese .md or .mdx file under docs/ as usual.

Step 2: Create the English translation

Create a file with the same name and path under i18n/en/docusaurus-plugin-content-docs/current/, translating the frontmatter and body.

Frontmatter comparison:

# Chinese
---
title: 自定义镜像与 Dockerfile 实践
sidebar_position: 6
description: 把临时调试、Dockerfile 构建整理成一套更稳妥的工作流。
---

# English
---
title: Custom Images and Dockerfile Practices
sidebar_position: 6
description: "Turning ad-hoc debugging and Dockerfile builds into a more reliable workflow."
---

Note: If description contains a colon, it must be quoted. Otherwise YAML parsing will fail.

Step 3: If it is a new directory, create the English _category_.json

See Core Principle 3 above.

Step 4: If new tags are used, update the English tags.yml

See Core Principle 4 above.

Step 5: If new tag names are in Chinese, refresh sidebar translations

npm run write-translations:en

Then edit i18n/en/docusaurus-plugin-content-docs/current.json to translate the new sidebar entries.

Standard Workflow: Adding Blog Posts

Step 1: Create the Chinese blog post

Create the Chinese .md or .mdx file under blog/.

Step 2: Create the English translation

Create a file with the same name and path under i18n/en/docusaurus-plugin-content-blog/:

blog/2026-05-01-new-post.md
→ i18n/en/docusaurus-plugin-content-blog/2026-05-01-new-post.md

Blog authors.yml and tags.yml do not need translation (already in English or using universal identifiers).

For MDX blog posts, export const data objects need their text values translated (alt, caption, etc.), but image URLs remain unchanged.

Step 3: Generate Chinese audio

Use the standalone project tts-blog-generator (located in the parent directory D:\Code\tts-blog-generator\):

cd ../tts-blog-generator
python generate.py

The script automatically scans all .md / .mdx files under blog/, extracts prose, calls the TTS API, converts to MP3, and uploads to OSS. Already-generated audio is skipped; use --force to regenerate.

Step 4: Generate English audio

python generate.py --lang en --blog-dir "../Dev-Knowledge-Base/i18n/en/docusaurus-plugin-content-blog"

English audio uses the Chloe voice, filenames are prefixed with en_, and stored under OSS Audio/blog/en/. Manifest keys are prefixed with en/ (e.g., en/agent-harness).

Step 5: Copy the manifest file to the project

cp output/blog-manifest.json ../Dev-Knowledge-Base/src/data/blogAudioManifest.json
cp output/blog-manifest.json ../Dev-Knowledge-Base/static/audio/blog/manifest.json

src/data/blogAudioManifest.json is the static import data source for the player component. static/audio/blog/manifest.json is the fallback access path. Both must be updated. Do not copy output/manifest.json into the blog player. That file is the combined blog/docs manifest; blog playback should use output/blog-manifest.json.

Step 6: Test

cd ../Dev-Knowledge-Base
npm start

Visit the blog page and confirm:

  • The play button is clickable
  • Audio plays normally
  • Multi-chunk audio transitions seamlessly (if applicable)
  • The progress bar shows the correct total duration
  • Switching locale uses the corresponding voice (Molly / Chloe)

Docs Audio (Docs TTS)

Documentation categories such as docs/LookAround/ also support TTS audio playback. Unlike blog audio, the docs audio player is automatically injected in DocItem/Layout, so MDX files do not need to import a player manually.

See Docs Audio Workflow for the complete maintenance procedure. This section keeps the i18n-specific checklist for adding or updating bilingual docs.

Architecture

  • DocsAudioPlayer reuses BlogAudioPlayer, but reads src/data/docsAudioManifest.json and passes keyPrefix="docs/"
  • Blog audio is also auto-injected via src/theme/BlogPostPage/index.js, no manual import needed in MDX
  • src/theme/DocItem/Layout/index.js injects the player for LookAround docs below breadcrumbs and above content
  • Chinese manifest keys are docs/{slug} and English keys are en/docs/{slug}
  • Docs audio is hosted under Audio/docs/ and Audio/docs/en/, separate from blog audio under Audio/blog/

Generate docs audio

cd ../tts-blog-generator

# Chinese LookAround
python generate.py --type docs --lang zh --force --article-jobs 2

# English LookAround
python generate.py --type docs --lang en \
--blog-dir "../Dev-Knowledge-Base/i18n/en/docusaurus-plugin-content-docs/current/LookAround" \
--force --article-jobs 2

Concurrency is article-level only. Chunks inside one article always run serially as _001, _002, and so on, so audio order follows text order.

Single-article repair

python generate.py --type docs --lang zh \
--include mercedes-benz-g-class-history-category-industry-position \
--force --article-jobs 1 --chunk-char-limit 1200

For long Chinese articles with suspiciously short audio, truncation, or garbled-sounding narration at a specific timestamp, inspect the exact UTF-8 text from generate.extract_text() first, then reduce --chunk-char-limit and regenerate. The G-Class Chinese audio was repaired by replacing 7 long chunks with 15 smaller chunks around 1200 characters each.

Copy docs manifest

cp output/docs-manifest.json ../Dev-Knowledge-Base/src/data/docsAudioManifest.json
cp output/docs-manifest.json ../Dev-Knowledge-Base/static/audio/docs/manifest.json

Do not copy output/manifest.json into the docs player. That file is the combined blog/docs manifest; docs playback should use output/docs-manifest.json.

Test docs audio

Visit a LookAround article and confirm:

  • The audio player appears below breadcrumbs
  • The play button works
  • Multi-chunk audio follows the URL order in the manifest
  • The progress bar shows a reasonable total duration
  • Non-LookAround docs do not show the player
  • English locale loads en/docs/{slug} audio

MDX Compatibility Pitfalls

The most common sources of build errors in English translations:

1. Bare < in prose

MDX interprets < as the start of a JSX tag. Mathematical comparisons must be escaped:

❌ Small data (<1M elements)
✅ Small data (≤1M elements)
✅ Small data (&lt;1M elements)

2. Bare { in prose

MDX interprets { as a JSX expression. Set notation must be wrapped in LaTeX:

❌ The set {a, b, c} has 3 elements.
✅ The set $\{a, b, c\}$ has 3 elements.

3. Colons in YAML frontmatter

description containing a colon must be quoted:

❌ description: Three patterns: temporary, persistent, and troubleshooting.
✅ description: "Three patterns: temporary, persistent, and troubleshooting."

4. Curly braces in LaTeX

LaTeX \{...\} is still parsed as JSX by MDX. Ensure it is inside $...$ or $$...$$ delimiters:

❌ $[a]_R = \{x \in A \mid (a,x) \in R\}$ ← single $ may fail
✅ $$[a]_R = \{x \in A \mid (a,x) \in R\}$$ ← $$ is safer

5. Chinese comments inside code blocks

Comments inside code blocks should also be translated:

// ❌ int visited[MAX]; // 访问标记数组
// ✅ int visited[MAX]; // Visited marker array

Custom Pages and i18n

Docusaurus does not automatically translate custom React pages or theme overrides. These pages must read useDocusaurusContext().i18n.currentLocale and maintain their own localized copy.

Currently affected pages:

  • Docs intro page (src/components/DocsIntro/)
  • Travel home page
  • Blog overview page
  • Site homepage

If you modify or add visible text to these pages, you must handle both Chinese and English.

Build Verification

After every new translation, verify:

npm run build # Build all locales (zh-Hans + en)
npm run build:en # Build English only, for faster verification

The build must complete with zero errors. Common error sources:

  • Unescaped colons in YAML frontmatter
  • Unescaped < and { in MDX prose
  • Missing _category_.json in the English mirror path
  • Filename case mismatches (Windows is case-insensitive, but CI environments are not)

Pre-publish Checklist

After adding new content with synchronized translations, verify at minimum:

Docs and blog general:

  • Chinese files are under docs/ or blog/
  • English files are under the corresponding i18n/en/ mirror path
  • Filenames match exactly (including extension)
  • sidebar_position is identical in both locales
  • title and description are translated
  • tags, authors, slug are kept identical
  • Code blocks are not translated
  • New directories have a corresponding English _category_.json
  • New tags have a corresponding English tags.yml entry
  • Colons in YAML description are quoted
  • Bare < and { in prose are escaped or wrapped in LaTeX

Blog audio:

  • Chinese audio generated via python generate.py
  • English audio generated via python generate.py --lang en --blog-dir "..."
  • Manifest file blogAudioManifest.json copied to both src/data/ and static/audio/blog/
  • OSS anti-hotlinking whitelist includes both production domain and localhost

Docs audio:

  • Chinese audio generated via python generate.py --type docs --lang zh ...
  • English audio generated via python generate.py --type docs --lang en --blog-dir "..."
  • Extracted TTS text was inspected with UTF-8 for long or repaired articles
  • Long Chinese articles use a smaller --chunk-char-limit when needed
  • Manifest file docsAudioManifest.json copied from output/docs-manifest.json to both src/data/ and static/audio/docs/
  • DocItem/Layout injects the player for the target docs category
  • Target OSS audio URLs are reachable and return audio content

Build verification:

  • npm run build passes
  • npm run build:en passes (quick English-only verification)