Sunday, December 22, 2013

UCS2 0x81 Encoding


The encoding for '81' format is as follow
   1. The first octet is '0x81'
   2. The second octet is the number of UCS2 characters
   3. The third octet is Base Pointer for bit15 to bit8 for the UCS2: 0xxxxxxxx0000000
   4. The following octets are the coded characters with the following rule: 
        - If the MSB (most significant bit) is zero, 
          the remaining 7 bits contain GSM Default Alphabet.
        If the MSB is one, the remaining 7 bits are offset value added to Base Pointer 
          which the result defines the UCS2 character.

Example:
We have 3 UCS2: Sকদ
The characters in bytes are: '0x0053' for "S", '0x0995' for "", and '0x09A6' for "".
The coding for Alpha field for this format is: '81 03 13 53 95 A6'.

How can we get that value?
First, the first octet is '0x81'.
The second octet shall be '03' since we have 3 UCS2.
The third octet is the Base Pointer. If we look at all UCS2 characters which high byte (two first digits) is not '00', then we get '0995' and '09A6'. In binaries we get:
                16                           1 (bit position)
     '0995' = 0000 1001 1001 0101
     '09A6' = 0000 1001 1010 0110

     >> 0000 1001 1000 0000 : Base pointer '0980' coded as '0x13'
So, the Base pointer value is 0001 0011 or '0x13'.
The fourth octet is the first character "S".
Since it is default alphabet, we simply set bit 7 with zero, and get 7-bits of "S":
"S" = '0053' = 0000 0000 0101 0011
                                    (0 + 1010011) = 0101 0011 = '53'
TIPS: when you get '00XX', then the octet is always the low byte XX.

The fifth octet is for character "
" ('0x0995').
To encode this character, we calculate the additional offset from the Base Pointer.
Additional value = '
0x0995' - '0x0980' = '0x15' = 001 0101 (only 7 bit)
The coded character has MSB set to 1. Hence the value is (1001
 0101) = '0x95'.

The sixth octet is the character for "
" ('0x09A6').
By doing the same way as fifth octet, we get '
0xA6'.

No comments:

Post a Comment