Char datatype and Unicode? Can 8 bit chars

https://learninghub.kx.com/forums/topic/char-datatype-and-unicode-can-8-bit-chars

According to this reference

https://code.kx.com/q/kb/unicode/

the `char` datatype can store Unicode characters.

This seems surprising, since elsewhere the size of `char` is specified to be 1 byte.

https://code.kx.com/q/basics/datatypes/

1 byte is not large enough to store all Unicode characters, although it is large enough to store the ASCII/latin1 subset.

Is anyone able to provide clarification on this?

thanks

The unicode characters use more than a single byte

q)s:"tést" / Store text with unicode to variable
q)s / Unicode bytes displayed
"t\303\251st"
q)count s / 2 bytes are used by é
5
q)-1 s; /Print to standard out presents the unicode characters again.
tést



Indeed the doc is inaccurate in that it's not the byte or character type itself that can hold unicode characters, but the lists of these types can by virtue of having a character span multiple elements of the list.

Thank you, that makes sense.

In this sense, `char` is very much like the C or C++ interpretation of "char". In that it is a single byte. It may contain a valid ASCII value, or it may be part of a codepoint of some larger UTF-8 sequence of bytes.