SMS Message Encoding

Most text messages will use the more space efficient 7 bit GSM character set as it covers most Latin based language characters. The GSM character set was extended to cover a few more commonly used characters and these 10 characters are known as the Extended GSM characters and include:

[ \ ] ^ { | } ~

Characters included in the Extended GSM character set will use 2 characters (7 bit GSM) instead of the usual 1 as they require an escape character prefix.

The GSM 7-bit character set is supported default by all GSM handsets and network elements, but characters in languages such as Arabic, Chinese, Korean, Japanese or Cyrillic alphabet languages (e.g. Russian, Serbian, Bulgarian, etc.) must be encoded using the 16-bit UTF-16 character encoding otherwise known as Unicode.

📘

GSM character set

See here for more info on the GSM character set

📘

Using SMS with different languages

See the following knowledge base article for more information on how SMS can be used to send messages in different languages

Note: If your message contains any characters that are not included in the GSM or Extended GSM character sets then the message will be encoded entirely as Unicode (UTF-16) and therefore each character will take 16 bits instead of 7 bits more than halving the amount of characters per SMS and potentially doubling your costs!