Char datatype and Unicode? Can 8 bit chars

MilanGill · February 18, 2025, 4:42pm

https://learninghub.kx.com/forums/topic/char-datatype-and-unicode-can-8-bit-chars

According to this reference

https://code.kx.com/q/kb/unicode/

the `char` datatype can store Unicode characters.

This seems surprising, since elsewhere the size of `char` is specified to be 1 byte.

https://code.kx.com/q/basics/datatypes/

1 byte is not large enough to store all Unicode characters, although it is large enough to store the ASCII/latin1 subset.

Is anyone able to provide clarification on this?

thanks

rocuinneagain · February 18, 2025, 5:01pm

The unicode characters use more than a single byte

q)s:"tést" / Store text with unicode to variable
q)s / Unicode bytes displayed
"t\303\251st"
q)count s / 2 bytes are used by é
5
q)-1 s; /Print to standard out presents the unicode characters again.
tést

pgyorok · February 18, 2025, 5:32pm

Indeed the doc is inaccurate in that it's not the byte or character type itself that can hold unicode characters, but the lists of these types can by virtue of having a character span multiple elements of the list.

MilanGill · February 19, 2025, 11:51am

Thank you, that makes sense.

In this sense, `char` is very much like the C or C++ interpretation of "char". In that it is a single byte. It may contain a valid ASCII value, or it may be part of a codepoint of some larger UTF-8 sequence of bytes.

Topic		Replies	Views
Char datatype and Unicode? Community Support imported , kdb-and-q	0	3	February 18, 2025
Does KDB have string data type? What is a string in KDB? Community Support imported , kdb-and-q	0	0	February 25, 2025
Unicode character conversion issue Community Support imported , kdb-and-q	1	3	March 29, 2023
Does KDB have string data type? What is a string in KDB? Community Support imported , kdb-and-q	3	7	February 26, 2025
Unicode character conversion issue Community Support kdb-and-q	2	1	March 30, 2023

Char datatype and Unicode? Can 8 bit chars

Related topics