那个"é" 不行的原因应该不是因为它占用三个字节, 而是因为它是由两个code point组成的, 得用两个char来表示.
比如"道"也占有三个字节, 但是可以是一个char.
assert_eq!(1, String::from("道").chars().count()); // let c = '道'; // can compile
assert_eq!(2, String::from("é").chars().count()); // let c = 'é'; // can not compile
assert_eq!(1, String::from("é").chars().count()); // let c = 'é'; // can compile, 这个和上面那个看上去一样但其实不是. 好假
assert_eq!(2, String::from("❤️").chars().count()); // the same above, cannot compile
可以把上面代码放到playground里跑跑看������
https://blog.golang.org/strings Golang 里面对code point的解释更到位, 舌音符号的确可以用两个码位表示.
> But what about the lower case grave-accented letter 'A', à? That's a character, and it's also a code point (U+00E0), but it has other representations. For example we can use the "combining" grave accent code point, U+0300, and attach it to the lower case letter a, U+0061, to create the same character à. In general, a character may be represented by a number of different sequences of code points, and therefore different sequences of UTF-8 bytes.