Skip to main content

Appendix C The ASCII and Unicode Character Sets

Java uses the Unicode character set for representing character data. The Unicode set represents each character as a 16-bit unsigned integer. It can, therefore, represent 2\(^{16}\) \(=\) 65,536 different characters. This enables Unicode to represent characters from not only English but also a wide range of international languages.

Unicode supersedes the ASCII character set (American Standard Code for Information Interchange). The ASCII code represents each character as a 7-bit or 8-bit unsigned integer. A 7-bit code can represent only 2\(^7\) \(=\) 128 characters. In order to make Unicode backward compatible with ASCII, the first 128 characters of Unicode have the same integer representation as the ASCII characters.

The following table shows the integer representations for the printable subset of ASCII characters. The characters with codes 0 through 31 and code 127 are nonprintable characters, many of which are associated with keys on a standard keyboard. For example, the delete key is represented by 127, the backspace by 8, and the return key by 13.

Code   32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
Char   SP !  "  #  $  %  &  '  (  )   *  +  ,  -  .  /
Code   48 49 50 51 52 53 54 55 56 57
Char   0  1  2  3  4  5  6  7  8  9
Code   58 59 60 61 62 63 64
Char   :  ;  <  =  >  ?  @
Code   65 66 67 68 69 70 71 72 73 74 75 76 77
Char   A  B  C  D  E  F  G  H  I  J  K  L  M
Code   78 79 80 81 82 83 84 85 86 87 88 89 90
Char   N  O  P  Q  R  S  T  U  V  W  X  Y  Z
Code   91 92 93 94 95 96
Char   [  \  ]  ^  _  `
Code   97 98 99 100 101 102 103 104 105 106 107 108 109
Char   a  b  c  d   e   f   g   h   i   j   k   l   m
Code   110 111 112 113 114 115 116 117 118 119 120 121 122
Char   n   o   p   q   r   s   t   u   v   w   x   y   z
Code   123 124 125 126
Char   {   |   }   ~
Figure C.0.1. ASCII codes for selected characters