Skip to main content

Hotlink Protection, CORS, and Host Validation Explained

When you put images, CSS, JS, and other static resources on OSS or CDN, there are three mechanisms that each control "who can access" and "how to access" from different dimensions.

Their names look similar, their use cases overlap, and they are easy to confuse, but each handles its own responsibility and they cannot replace each other.

Three Sentences to Clarify

MechanismWhat It PreventsWho EnforcesIn One Sentence
Host ValidationWrong target domain in the requestServer"Is the name you gave me one that exists on this machine?"
Hotlink Protection (Referer)Other sites using <img> to steal your imagesServer / CDN"Which page did you come from? Are you one of us?"
Cross-Origin (CORS)Other sites using JS to read your resource dataBrowser"Does the browser allow this cross-site JS request?"

The Host Request Header

What Is Host

Every time a user visits a website, the browser includes a Host header in the HTTP request:

GET / HTTP/1.1
Host: www.baidu.com

A single server may host hundreds or thousands of websites. The server uses the Host header to determine "which site you want to access."

The Host Problem During CDN Origin Pull

When you use a CDN to proxy an origin:

  1. The user visits your domain
  2. The CDN goes to the origin to fetch content
  3. The CDN sends Host as your domain by default
  4. The origin sees: I don't have a site called your domain -> 403

Solution: In the CDN origin pull configuration, set the origin Host to the origin's domain name. This way the CDN sends the origin's own domain as Host, the origin recognizes it, and returns content normally.

Why GitHub Pages Origin Pull Works

GitHub Pages uses only the Host header for routing -- it dispatches requests to the repository content matching xxx.github.io. It has no Host blocklist and does not reject requests just because Host is a non-whitelisted domain. As long as the origin Host points to a real Pages address, it responds normally.

That is why a direct CNAME to github.io works: when GitHub receives Host: yourdomain.com, it can still find your repository and return the page through SNI or default routing. It does not intercept unknown Hosts like large commercial sites do.

Why Large Commercial Sites Do Not Work

Sites like Baidu, Bilibili, Taobao, and others use more than just Host validation. They have multi-layer combined protection:

Protection LayerCan It Be BypassedNotes
Host ValidationYes, via origin Host configSet origin Host to the origin's domain
IP BlockingNoCDN node IP pools are already flagged as data center / proxy
TLS Fingerprint CheckNoSSL/TLS handshake characteristics reveal non-normal browsers
Global Risk ControlNoTraffic patterns, frequency, Cookies/Sessions are all monitored

So even if you set the origin to baidu.com and the origin Host to baidu.com, Baidu will still 403 reject the request. Host is just the first gate. There are three more walls behind it.

Which Sites Can Be Proxied Through a CDN

Yes (all conditions met: Host is only used for routing with no blocklist, no IP blocking, no hotlink protection risk control, purely static):

  • GitHub Pages
  • Vercel / Netlify static sites
  • Personal blogs and open-source documentation sites
  • Servers or virtual hosts you own

No:

  • Baidu, Alibaba, Tencent, NetEase, Bilibili, Douyin, and all other large commercial sites
  • E-commerce, payment, banking, and government websites
  • Any site with login, membership, or API authentication
  • Video, music, or image sites with copyright hotlink protection

Principle

When a browser loads images, CSS, JS, and other resources referenced by a page, it automatically includes a Referer header that says "I came from this page":

GET /logo.png HTTP/1.1
Host: cdn.example.com
Referer: https://www.zhihu.com/question/xxx

The server checks whether the Referer is in the whitelist:

  • Yes -> return normally
  • No -> 403 Forbidden

What it covers:

  • Others using <img>, <video>, <audio>, and similar tags on their sites to reference your resources
  • Others using <link> or @import to reference your CSS files
  • Any resource reference where the browser automatically sends a Referer in the request

What it does not cover:

  • Others opening the image link directly in the browser address bar (the Referer is empty; this is handled by the "empty Referer" setting)
  • Others downloading with command-line tools like curl (no Referer by default unless manually specified)
  • Others referencing from local HTML files (file:// protocol) -- the Referer is usually empty or incomplete

Whitelist Referer (one per line, separated by line breaks):

http://nevergpdzy.cn
https://nevergpdzy.cn
http://*.nevergpdzy.cn
https://*.nevergpdzy.cn

http://nevergpdzy.com
https://nevergpdzy.com
http://*.nevergpdzy.com
https://*.nevergpdzy.com

http://nevergpdzy.github.io
https://nevergpdzy.github.io
http://*.nevergpdzy.github.io
https://*.nevergpdzy.github.io

Configuration key points:

  • You must include both http:// and https://; the two protocols are validated independently
  • *.domain is a wildcard that covers all subdomains (and the main domain itself)
  • Multiple domains are separated by line breaks

Empty Referer setting:

  • Allow (recommended): direct browser access to image URLs, curl, and local debugging all work normally
  • Do not allow: only references from whitelisted domain pages can open images; copying the link directly into the address bar will return 403

For a personal blog, it is recommended to keep it set to "Allow" so it does not affect your daily use and debugging.

Truncate QueryString: keep the default "Allow" setting. OSS hotlink protection only checks the domain part and does not inspect URL parameters.

  1. Wait 1-2 minutes after saving the configuration
  2. Load an OSS image on your own site -> displays normally
  3. Use <img> to reference it from another domain (e.g., a local HTML file using file://) -> should return 403
  4. Open the OSS image link directly in the browser address bar -> depends on the "empty Referer" setting

Cross-Origin (CORS)

Principle

Cross-origin is a browser restriction, not a server restriction.

Browser security policy: when a page on one domain (a.com) uses JS to request resources from another domain (b.com), the browser checks whether the response from b.com includes an Access-Control-Allow-Origin header. If it does not, the browser blocks it directly and the JS code receives a network error.

For simple requests like GET / HEAD, the browser sends the request directly and then checks the response headers. For non-simple requests with custom headers, PUT / DELETE, etc., the browser first sends an OPTIONS preflight request asking the server "do you allow this cross-origin request?" Only after getting permission does it send the actual request.

// Simple request (e.g., GET):
a.com JS -> fetch('https://b.com/data.json')
-> Browser sends GET directly, checks response for Access-Control-Allow-Origin
-> Missing -> blocked with error

// Non-simple request (e.g., with custom header):
a.com JS -> fetch('https://b.com/data.json', { headers: { 'X-Custom': '1' } })
-> Browser sends OPTIONS preflight first: "Is X-Custom header allowed?"
-> Not allowed -> blocked, the actual GET is never sent

What CORS Covers and What It Does Not

What it covers (only JS running in a browser):

  • fetch() / axios / XMLHttpRequest cross-origin requests
  • Canvas drawImage() reading cross-origin image pixels (getImageData / toDataURL)
  • JS downloading, reading, or processing cross-origin files

What it does not cover:

  • <img src="..."> tags loading images -- the browser does not check cross-origin
  • CSS background-image -- no cross-origin check
  • Opening directly in the browser address bar -- no cross-origin check
  • curl / Postman / server-to-server requests -- no browser involved, so cross-origin does not exist

When You Need to Configure CORS

Must configure:

  • Your site uses JS to load or read files from OSS
  • Canvas processes OSS images (watermarking, cropping, color picking, etc.)
  • Frontend previews use JS to read image info before uploading

No need to configure -- if you only use <img> tags to display images, you do not need CORS at all. In this case CORS can be left off, or set to * with no impact.

Aliyun OSS CORS Configuration

Permissive version (allows all domains to make cross-origin requests, suitable for display-only scenarios):

SettingValue
Origin*
Allowed MethodsGET, HEAD
Allowed Headers*
Cache Time86400 (seconds)

Secure version (only allows JS from your own domains to read resources, recommended for future use):

SettingValue
OriginEnter domains line by line (see below)
Allowed MethodsGET, HEAD
Allowed Headers*
Cache Time86400

Secure version Origin list (one per line):

http://nevergpdzy.cn
https://nevergpdzy.cn
http://*.nevergpdzy.cn
https://*.nevergpdzy.cn

http://nevergpdzy.com
https://nevergpdzy.com
http://*.nevergpdzy.com
https://*.nevergpdzy.com

http://nevergpdzy.github.io
https://nevergpdzy.github.io
http://*.nevergpdzy.github.io
https://*.nevergpdzy.github.io

Note: Whether the * wildcard is supported in Aliyun OSS CORS depends on the specific product version. If *.nevergpdzy.cn does not work, you need to write out each main domain and subdomain separately.

ScenarioHotlink ProtectionCORS
<img> tag displaying an imageActiveNot triggered
CSS background-imageActiveNot triggered
JS fetch reading a resourceNot triggeredActive
Canvas reading pixelsNot triggeredActive
Opening a link directly in the browserDetermined by empty Referer settingNot triggered
curl downloadNot triggeredNot triggered

CNAME Direct vs CDN Reverse Proxy

This is another easy-to-confuse concept.

DNS CNAME: Only Changes IP, Not HTTP

You configure a yourdomain.com CNAME -> baidu.com:

  • At the DNS layer: yourdomain.com resolves to Baidu's IP, and TCP can connect
  • At the HTTP layer: the browser still sends Host: yourdomain.com
  • Baidu's server sees an unfamiliar Host -> 403

CNAME only "points the road there," but "the name you gave (Host)" did not change. GitHub Pages works only because it does not check the name.

CDN Reverse Proxy: Can Modify the Host

A CDN can forcibly modify the Host header when forwarding requests:

  • User -> CDN: Host: yourdomain.com
  • CDN -> Origin: Host: baidu.com (changed)

So a CDN reverse proxy can bypass Host validation -- but that is only the first gate. Large commercial sites still have IP blocklists and other defenses behind it.


Combined Configuration: Best Practices

Using a personal blog scenario as an example (GitHub Pages + Aliyun OSS image hosting):

Prevents others from using <img> on their sites to steal your images. The whitelist should only contain your own domains. Keep the empty Referer setting as "Allow" for easier debugging.

CORS (Enable as Needed)

If you are only using <img> to display images, CORS does not need to be configured. If you need JS to process images in the future, configure it with the secure whitelist.

Host and Origin Pull

The CDN origin pull Host must match the origin's domain name. Do not fill in something random. GitHub Pages is compatible with any Host, so this step is easy to configure.

Final Security Effect

Attack ScenarioBlocked?By What
Another site uses <img> to reference your OSS imagesBlockedHotlink protection
Someone uses JS to read your image dataBlocked (with secure CORS)Cross-origin
Someone uses a CDN to mirror your siteBlocked, unless the origin is GitHub Pages-likeYou need to protect it yourself
You open it yourself with curl / browserAllowedEmpty Referer
You use it normally on your own blogAllowedWhitelist