DBCS issue
-
When our application is used on a Japanese system, we are encountering a DBCS problem. When entering text into one of the edit controls, there may be several instances of the shift-out/shift-in pair intermingled with "normal" text. For example: SoXXXXXXSiXSoXXXXXXSi When I call
GetTextLength()
on this text, it returns 13. If this text is saved to a text file (e.g., using Notepad since it's all I can understand on a Japanese system), the size is reported as 13 (I guess the 2 SoSi pair did not get saved). Now for the problem: when I send this text and its length (13) to the AS/400 system for processing, it complains about a mismatched SoSi pair. Debugging on the AS/400 end, we can change the length to 17 and it works fine. I've no clue how to handle this. If the above example were doubled, thenGetTextLength()
would return 26, yet we'd have to change the value to 34 in order for it to work. Any clues? - DC"Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown
"The brick walls are there for a reason...to stop the people who don't want it badly enough." - Randy Pausch
-
When our application is used on a Japanese system, we are encountering a DBCS problem. When entering text into one of the edit controls, there may be several instances of the shift-out/shift-in pair intermingled with "normal" text. For example: SoXXXXXXSiXSoXXXXXXSi When I call
GetTextLength()
on this text, it returns 13. If this text is saved to a text file (e.g., using Notepad since it's all I can understand on a Japanese system), the size is reported as 13 (I guess the 2 SoSi pair did not get saved). Now for the problem: when I send this text and its length (13) to the AS/400 system for processing, it complains about a mismatched SoSi pair. Debugging on the AS/400 end, we can change the length to 17 and it works fine. I've no clue how to handle this. If the above example were doubled, thenGetTextLength()
would return 26, yet we'd have to change the value to 34 in order for it to work. Any clues? - DC"Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown
"The brick walls are there for a reason...to stop the people who don't want it badly enough." - Randy Pausch
GetTextLength() is returning the length in characters. How are you determining the number of bytes to write to the file? For DBCS, length in characters is not the same as length in bytes.
DavidCrow wrote:
Any clues?
You've considered using Unicode, I assume? If not, you may have to write your own
strlen
, hunting from the start to a terminating zero, to get the number of bytes. -
GetTextLength() is returning the length in characters. How are you determining the number of bytes to write to the file? For DBCS, length in characters is not the same as length in bytes.
DavidCrow wrote:
Any clues?
You've considered using Unicode, I assume? If not, you may have to write your own
strlen
, hunting from the start to a terminating zero, to get the number of bytes.Graham Bradshaw wrote:
How are you determining the number of bytes to write to the file?
I'm not. I just paste the text into Notepad and save it.
Graham Bradshaw wrote:
You've considered using Unicode, I assume?
Yes, but I've read nothing thus far that says it would solve the problem. The application is 10+ years old so retooling it for Unicode would be no small undertaking.
"Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown
"The brick walls are there for a reason...to stop the people who don't want it badly enough." - Randy Pausch
-
Graham Bradshaw wrote:
How are you determining the number of bytes to write to the file?
I'm not. I just paste the text into Notepad and save it.
Graham Bradshaw wrote:
You've considered using Unicode, I assume?
Yes, but I've read nothing thus far that says it would solve the problem. The application is 10+ years old so retooling it for Unicode would be no small undertaking.
"Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown
"The brick walls are there for a reason...to stop the people who don't want it badly enough." - Randy Pausch
DavidCrow wrote:
save it.
Save it how? Which encoding did you select?
-
DavidCrow wrote:
save it.
Save it how? Which encoding did you select?
Graham Bradshaw wrote:
Save it how?
The Save option from the File menu.
Graham Bradshaw wrote:
Which encoding did you select?
Once as ANSI (13 bytes) and another as UTF-8 (22 bytes).
"Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown
"The brick walls are there for a reason...to stop the people who don't want it badly enough." - Randy Pausch
-
Graham Bradshaw wrote:
Save it how?
The Save option from the File menu.
Graham Bradshaw wrote:
Which encoding did you select?
Once as ANSI (13 bytes) and another as UTF-8 (22 bytes).
"Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown
"The brick walls are there for a reason...to stop the people who don't want it badly enough." - Randy Pausch
So the text isn't 13 bytes long. It's more than that (UTF-8 encoding is not the same as DCBS, so the length of the ex-Notepad file is not relevant).
DavidCrow wrote:
when I send this text and its length (13) to the AS/400 system for processing, it complains about a mismatched SoSi pair.
And that 13 is surely the problem. You need to send the length in bytes to the AS/400, together with all the text, not the length in characters. This assumes, of course, that the AS/400 understands DBCS encoding.
-
So the text isn't 13 bytes long. It's more than that (UTF-8 encoding is not the same as DCBS, so the length of the ex-Notepad file is not relevant).
DavidCrow wrote:
when I send this text and its length (13) to the AS/400 system for processing, it complains about a mismatched SoSi pair.
And that 13 is surely the problem. You need to send the length in bytes to the AS/400, together with all the text, not the length in characters. This assumes, of course, that the AS/400 understands DBCS encoding.
Graham Bradshaw wrote:
You need to send the length in bytes to the AS/400, together with all the text, not the length in characters.
How do I go about doing this (since
WM_GETTEXTLENGTH
is giving me the latter)?Graham Bradshaw wrote:
This assumes, of course, that the AS/400 understands DBCS encoding.
It does. That's why it works (i.e., no data is lost) when I manually change the text length during debugging.
"Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown
"The brick walls are there for a reason...to stop the people who don't want it badly enough." - Randy Pausch
-
Graham Bradshaw wrote:
You need to send the length in bytes to the AS/400, together with all the text, not the length in characters.
How do I go about doing this (since
WM_GETTEXTLENGTH
is giving me the latter)?Graham Bradshaw wrote:
This assumes, of course, that the AS/400 understands DBCS encoding.
It does. That's why it works (i.e., no data is lost) when I manually change the text length during debugging.
"Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown
"The brick walls are there for a reason...to stop the people who don't want it badly enough." - Randy Pausch
DavidCrow wrote:
How do I go about doing this (since WM_GETTEXTLENGTH is giving me the latter)?
You're working in C++? If so, you must have a pointer to the start of the character buffer, so you can send it to the AS/400. Just hunt through byte by byte until you hit a zero, counting as you go.
-
When our application is used on a Japanese system, we are encountering a DBCS problem. When entering text into one of the edit controls, there may be several instances of the shift-out/shift-in pair intermingled with "normal" text. For example: SoXXXXXXSiXSoXXXXXXSi When I call
GetTextLength()
on this text, it returns 13. If this text is saved to a text file (e.g., using Notepad since it's all I can understand on a Japanese system), the size is reported as 13 (I guess the 2 SoSi pair did not get saved). Now for the problem: when I send this text and its length (13) to the AS/400 system for processing, it complains about a mismatched SoSi pair. Debugging on the AS/400 end, we can change the length to 17 and it works fine. I've no clue how to handle this. If the above example were doubled, thenGetTextLength()
would return 26, yet we'd have to change the value to 34 in order for it to work. Any clues? - DC"Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown
"The brick walls are there for a reason...to stop the people who don't want it badly enough." - Randy Pausch
The only way I know to calculate the number of bytes of DBCS string in a given encoding is: 1. Translate the string to Unicode (MultiByteToWideChar) 2. Translate the Unicode string to the previous encoding (WideCharToMultiByte) this function return the size of DBCS string in bytes. This work (in all the case that I know of) :) . Good luck.