Skip to product information
1 of 7

PayPal, credit cards. Download editable-PDF & invoice in 1 second!

GB 18030-2022 English PDF (GB18030-2022)

GB 18030-2022 English PDF (GB18030-2022)

Regular price $5,005.00 USD
Regular price Sale price $5,005.00 USD
Sale Sold out
Shipping calculated at checkout.
Delivery: 3 seconds. Download true-PDF + Invoice.
Get QUOTATION in 1-minute: Click GB 18030-2022
Historical versions: GB 18030-2022
Preview True-PDF (Reload/Scroll if blank)

GB 18030-2022: Information technology - Chinese coded character set
GB
NATIONAL STANDARD OF THE
PEOPLE’S REPUBLIC OF CHINA
ICS 35.040
CCS L 71
GB 18030-2022
Replacing GB 18030-2005
Information technology - Chinese coded character set
ISSUED ON: JULY 19, 2022
IMPLEMENTED ON: AUGUST 01, 2023
Issued by: State Administration for Market Regulation;
Standardization Administration of the People's Republic of China.
Table of Contents
Foreword ... i
1 Scope ... 0
2 Normative references ... 0
3 Terms and definitions ... 0
4 Repertoire ... 1
5 Overall structure ... 2
6 Sequence of characters ... 4
7 Code point allocation ... 4
8 Explanation of some characters and codes ... 7
9 Implementation level ... 7
Annex A (normative) Character table of double-byte ... 9
Annex B (normative) Ideographic descriptors ... 91
Annex C (normative) Character table of four-byte ... 92
Annex D (informative) Explanation of some characters and codes ... 546
Annex E (informative) Code positions of Chinese characters in "General Standard
Chinese Character List" ... 549
Bibliography ... 742
ii
Foreword
This document was drafted in accordance with the rules given in GB/T 1.1-2020
"Directives for standardization - Part 1: Rules for the structure and drafting of
standardizing documents".
This document replaces GB 18030-2005 "Information technology - Chinese coded
character set". Compared with GB 18030-2005, in addition to the structural
modifications and editorial changes, the main technical changes in this document are
as follows:
a) Add the applicable objects of this document (see Chapter 1 of this Edition);
b) In the double-byte coding area, change the GB/T 13000 code positions
corresponding to 10 vertical punctuation marks and 8 Chinese character
components. Delete 6 repeated coded Chinese character components and 9
repeated coded Chinese characters (see Annex D of this Edition, Annex A of
Edition 2005);
c) In the four-byte coding area, change 18 GB/T 13000 code positions (see Annex D
of this Edition, Annex D of Edition 2005);
d) In the part of four-byte code 0x82358F33~0x82359636, add 66 new Chinese
characters added by CJK unified Chinese characters (see Annex C of this Edition);
e) In the part of four-byte code 0x9835F738~0x98399E36, add 4149 Chinese
characters of CJK unified Chinese character extension C (see Annex C of this
Edition);
f) In the part of four-byte code 0x98399F38~0x9839B539, add 222 Chinese
characters of CJK unified Chinese character expansion D (see Annex C of this
Edition);
g) In the part of four-byte code 0x9839B632~0x9933FE33, add 5762 Chinese
characters of CJK unified Chinese character extension E (see Annex C of this
Edition);
h) In the part of four-byte code 0x99348138~0x9939F730, add 7473 Chinese
characters of CJK unified Chinese character expansion F (see Annex C of this
Edition);
i) In the part of four-byte code 0x81398B32~0x8139A035, add 214 Kangxi radicals
(see Annex C of this Edition);
j) In the part of four-byte code 0x8134F932~0x81358437, add 83 Xishuangbanna
New Dai characters (see Annex C of this Edition);
iii
Information technology - Chinese coded character set
1 Scope
This document specifies the hexadecimal representation of Chinese graphic characters
and their binary codes used in information technology.
This document applies to the processing, exchange, storage, transmission, presentation,
input and output of Chinese and other graphic character information.
This document is applicable to technical products with information processing and
exchange functions of Chinese and other text and graphic characters, including but not
limited to the software products represented by input methods, optical character
recognition (OCR), editing and proofreading, machine translation, speech synthesis,
text transcription, intelligent writing, etc., as well as the hardware products represented
by computers, communication terminal equipment, e-book readers, learning machines,
etc.
2 Normative references
The following referenced documents are indispensable for the application of this
document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
GB/T 2312-1980, Code of Chinese graphic character set for information
interchange - Primary set
GB/T 11383-1989, Information process in 8-bit code for information interchange -
Structure and rules for implementation
GB/T 13000, Information technology - Universal multiple - Octet coded character
set (UCS)
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1 character
An element in a collection of elements used to organize, control, or represent data.
3.2 coded character
Character (3.1) and its coded representation.
3.3 private use area
An area that can be specified by the user of a product conforming to this document.
3.4 repertoire
A specified set of characters (3.1) represented by a coded character (3.2) set.
3.5 reserved zone
Areas reserved for future specified by this document.
4 Repertoire
4.1 Overview
The characters included in this document are coded in single-byte, double-byte or four-
byte.
4.2 Part of single-byte
In this document, the part of single-byte includes all 128 characters from 0x00 to 0x7F
of GB/T 11383-1989.
4.3 Part of double-byte
The part of double-byte includes all graphic characters in GB/T 2312-1980, CJK unified
Chinese characters and some graphic characters in GB/T 13000. The characters in the
part of double-byte are in accordance with the provisions in Annex A. Among them, the
graphics, code positions and functions of ideographic descriptors shall comply with the
provisions of Annex B.
NOTE: GB/T 13000 uniformly encodes Chinese characters used in China, Japan, South Korea,
Vietnam and other countries and regions. Chinese characters with unique abstract glyphs are
assigned a separate code position. Chinese characters with different sources but the same abstract
glyphs are given a common code position. The encoded Chinese characters are called CJK unified
Chinese characters (CJK Unified Ideographs), where CJK means China, Japan, and Korea.
4.4 Part of four-byte
The part of four-byte includes 66 CJK unified Chinese characters (9FA6~9FEF,
excluding 9FB4~9FBB) in GB/T 13000 other than the above-mentioned double-byte
characters, CJK unified Chinese character extension A, CJK unified Chinese character
extension B, CJK unified Chinese character extension C, CJK unified Chinese character
extension D, CJK unified Chinese character extension E, CJK unified Chinese character
extension F and the characters of ethnic minorities that have been coded in GB/T 13000.
The characters in the part of four-byte follow the provisions of Annex C.
5 Overall structure
In the text, all numbers marked with 0x are in hexadecimal. Those not marked with 0x
are in decimal. All coded representations in the appendix are expressed in hexadecimal.
All other numbers are expressed in decimal.
The part of single-byte adopts the encoding structure of GB/T 11383-1989. Use code
points 0x00~0x7F.
The part of double-byte adopts two octet strings to represent a character. Its first byte
code point is from 0x81~0xFE. The tail byte code points are 0x40~0x7E and
0x80~0xFE respectively.
The part of four-byte adopts 0x30~0x39 not used in GB/T 11383-1989 as the suffix to
expand the double...
View full details