I know that before, Unicode and locale aware systems were supposed to use unicode tags (U+E0000..U+E007F) to invisibly and "for all plaintext purposes" mark text for such han unification handling but that use is now deprecated.
What I am supposed to use those days? HTML-encoded in utf-8, with lang attributes, so <span lang="ja-JA"> and <bdi lang="zh-Hans"> infested text?
The geniuses behind Unicode managed to make it mandatory anyways, at least if you want correct CJK text rendering :)