• 刘福洋
    2020-10-31
    那个"é" 不行的原因应该不是因为它占用三个字节, 而是因为它是由两个code point组成的, 得用两个char来表示. 比如"道"也占有三个字节, 但是可以是一个char. assert_eq!(1, String::from("道").chars().count()); // let c = '道'; // can compile assert_eq!(2, String::from("é").chars().count()); // let c = 'é'; // can not compile assert_eq!(1, String::from("é").chars().count()); // let c = 'é'; // can compile, 这个和上面那个看上去一样但其实不是. 好假 assert_eq!(2, String::from("❤️").chars().count()); // the same above, cannot compile 可以把上面代码放到playground里跑跑看������
    共 1 条评论
    10
  • 刘福洋
    2020-10-31
    或者说其实是两个码位就会报错, 必须是一个码位才行. String::len()返回的是字节数不是码位数. 看码位数用String::from("❤️").chars().count() 就可以了.
    
    3
  • icanfly
    2022-09-27 来自重庆
    讲的太晦涩了
    
    2
  • Marvichov
    2021-05-11
    https://blog.golang.org/strings Golang 里面对code point的解释更到位, 舌音符号的确可以用两个码位表示. > But what about the lower case grave-accented letter 'A', à? That's a character, and it's also a code point (U+00E0), but it has other representations. For example we can use the "combining" grave accent code point, U+0300, and attach it to the lower case letter a, U+0061, to create the same character à. In general, a character may be represented by a number of different sequences of code points, and therefore different sequences of UTF-8 bytes.
    
    