The default GSM character set is defined in GSM 03.38 . Assuming you are looking at decoded text and not at the 7-bit packed format that is actually used, a regular expression like the one below should limit you to valid characters
"@£$¥èéùìòÇ\fØø\nÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !\"#¤%&'()*+,-./[0-9]:;<=>\?¡[A-Z]ÄÖÑܧ¿[a-z]äöñüà\^\{\}\[~\]\|€"
Please note that you can send Unicode UCS-2 messages, after which the phone receiving the message must have suitable glyphs for presentation to the user, Unicode itself is not a limiting factor.