@kaeedo I mean, yeah. Ideograph languages (e.g. CJK - Chinese/Japanese/Korean) have a distinct information density advantage.
Birdsite's limit is based on cell carrier limits*. Cell carrier limits were purely technical (payload size) and not at all about permitted/restricting expression.
* Note that cell carrier limits were octet based, while birdsite's limits are grapheme based. That's important since an english letter is one octet**. An accented latin character is (typically) two octets. CJK characters are 3 or 4 octets, and that's ignoring combining sequences. Emojis can run as high as 20 octets for a single "character".
** Numbers based on using UTF-8. Early cell carries in CJK regions often used two octet multi-byte local encodings like SJIS.
@kaeedo I should say,. Birdsite's limits *WERE* based on cell carrier limits. Two things have changed since 2007.
1/ Cell phones have, broadly speaking, all managed to support multi-part text messaging. Also, the rise of native clients has made the text component moot in many areas.
2/ Birdsite's limits are now entirely codepoint based. This fits between octets and graphemes.
A codepoint is a single unicode element (from U+0000 to U+1FFFF) and thus will take up at most 4 octets when expressed as UTF-8. Combining sequences (which Emojis use generously) allow a single grapheme (visually indivisible character) to be made up of many codepoints. So a "normal" CJK character (which can be expressed as a codepoint) is the same as an english letter for the sake of birdsite limits. Emojis can cost multiple codepoints, so are more "expensive" in applying limits.
Thanks for the technical explanation. I usually rely on the runtime implementation of strings, and then don't worry about encoding until it forces me to.
@kaeedo I've had to spend entirely too long thinking about encodings so that other people don't have to. :)
This Mastodon instance is for people interested in technology. Discussions aren't limited to technology, because tech folks shouldn't be limited to technology either!