SMS encoding

Most text messages use the more space-efficient 7 bit GSM character set as it covers most Latin-based language characters.

The GSM character set was extended to cover a few more commonly used characters and these 10 characters are known as Extended GSM characters and include: € [ \ ] ^ { | } ~

Characters included in the Extended GSM character set use two characters (7 bit GSM) instead of the usual one as they require an escape character prefix.

The GSM 7-bit character set is supported default by all GSM handsets and network elements, but characters in languages such as Arabic, Chinese, Korean, Japanese or Cyrillic alphabet languages (for example, Russian, Serbian, or Bulgarian, ) must be encoded using the 16-bit UTF-16 character encoding otherwise known as Unicode.

Learn more about the GSM character set.

📘

Encoding

If your message contains any characters that are not included in the GSM or Extended GSM character sets then the message is encoded entirely as Unicode (UTF-16). Therefore each character takes 16 bits instead of 7 bits, more than halving the amount of characters per SMS and potentially doubling your costs.