Hi all,
I need some help again.
Which is the preferred function to use when creating TQString from std::string and how can I make sure that I end up with Utf-8.
The thing is that input in std::string can be either UTF-8 or not UTF-8.
What is the standard way of doing this in TDE (TQt)?
I am really confused, because I was looking in some KDE3/TDE code and I see both used.
My problem is that some older phones would most likely lack UTF and newer would do only UTF. So how can I make sure to "speak the right language" with them?
A hint would be appreciated.
regards
2016-03-23 2:08 GMT+03:00 deloptes deloptes@gmail.com:
Hi all,
I need some help again.
Which is the preferred function to use when creating TQString from std::string and how can I make sure that I end up with Utf-8.
The thing is that input in std::string can be either UTF-8 or not UTF-8.
What is the standard way of doing this in TDE (TQt)?
I am really confused, because I was looking in some KDE3/TDE code and I see both used.
My problem is that some older phones would most likely lack UTF and newer would do only UTF. So how can I make sure to "speak the right language" with them?
A hint would be appreciated.
regards
1) If you construct a string from a const char * c-string in your code you better use fromLatin1(). e.g TQString::fromLatin1 ("blabla") 1.1) If the string includes some local symbols or some non-latin1 symbols somewhy, but your source xode is strictly in utf8 you may use TQString::fromUtf8 ("10°") [note the degree sign], But this is kinda dirty practice 2) If you receive a string from OS e.g. a file path from system calls, you would likely should use TQString::fromLocal8bit(), Note that it will decode from utf8 on most modern linux boxes. 3) If you receive a string from some third party module or where ever else you should follow to it's documentation. It may return a text in some other encoding, and you will have to use TQTextCodec (or whatever it's called). 3.1) If you are not sure if it will give you either a latin1 or utf8 string, You are safe to use TQString::fromUtf8()
Note that it's quite safe to use fromUtf8() everywhere instead of fromLatin1(), in most of cases you risk to get just some performance overhead...
On 03/23/2016 09:39 AM, Fat-Zer wrote:
2016-03-23 2:08 GMT+03:00 deloptes deloptes@gmail.com:
Hi all,
I need some help again.
Which is the preferred function to use when creating TQString from std::string and how can I make sure that I end up with Utf-8.
The thing is that input in std::string can be either UTF-8 or not UTF-8.
What is the standard way of doing this in TDE (TQt)?
I am really confused, because I was looking in some KDE3/TDE code and I see both used.
My problem is that some older phones would most likely lack UTF and newer would do only UTF. So how can I make sure to "speak the right language" with them?
A hint would be appreciated.
regards
- If you construct a string from a const char * c-string in your code
you better use fromLatin1(). e.g TQString::fromLatin1 ("blabla") 1.1) If the string includes some local symbols or some non-latin1 symbols somewhy, but your source xode is strictly in utf8 you may use TQString::fromUtf8 ("10°") [note the degree sign], But this is kinda dirty practice 2) If you receive a string from OS e.g. a file path from system calls, you would likely should use TQString::fromLocal8bit(), Note that it will decode from utf8 on most modern linux boxes. 3) If you receive a string from some third party module or where ever else you should follow to it's documentation. It may return a text in some other encoding, and you will have to use TQTextCodec (or whatever it's called). 3.1) If you are not sure if it will give you either a latin1 or utf8 string, You are safe to use TQString::fromUtf8()
Note that it's quite safe to use fromUtf8() everywhere instead of fromLatin1(), in most of cases you risk to get just some performance overhead...
Internally TQString is basically a TQChar array. As Alexander said, TQString::fromUtf8() is probably the safest way to go for most of the cases. Use TQTextCodec::codecForCStrings() if you want to set a specific 8bit to Unicode codec for c-strings. The default is latin1 anyway, so even a simple TQString(<your c-string>) would work in case you know the string is a latin1 string.
I disagree with Alex on point 2). I would still go for TQString::fromUtf8() if I am handling strings from the OS, just in case ;-)
Cheers Michele
2016-03-23 5:31 GMT+03:00 Michele Calgaro michele.calgaro@yahoo.it:
On 03/23/2016 09:39 AM, Fat-Zer wrote:
I disagree with Alex on point 2). I would still go for TQString::fromUtf8() if I am handling strings from the OS, just in case ;-)
Just a thought experiment: Imaging a spherical user in a vacume, on an island free of friction far far away. For simplicity let's say his system has some single imaginary locale (LC_ALL=xx_XX.NON_UTF). A user types in the terminal: "touch 'ટેસ્ટ' ". And it creates a file "ટેસ્ટ" with name encoded in NON_UTF. Exactly the same record will be on the filesystem since nor touch, nor the linux kernel, nor extX driver don't do any encoding conversions. Then he desires to open that file with a tqt program. Somewhere deep inside TQDir it gets a string from readdir() that contains exactly "ટેસ્ટ" encoded in the same NON_UTF encoding... What should be next? TQString::fromUtf8(), which makes so simple string "ટેસ્ટ" look like some gibberish and causing the angry user to loose his belief in humanity and become a serial killer? Or use TQString::fromLocal8Bit() so he could be happy and see the "ટેસ્ટ" in his TQOpenFileDialog.
Luckily, now systems with non-utf8 locales are mostly extinct at least on linux and desktops...
On 2016/03/23 02:51 PM, Fat-Zer wrote:
2016-03-23 5:31 GMT+03:00 Michele Calgaro michele.calgaro@yahoo.it:
On 03/23/2016 09:39 AM, Fat-Zer wrote:
I disagree with Alex on point 2). I would still go for TQString::fromUtf8() if I am handling strings from the OS, just in case ;-)
Just a thought experiment: Imaging a spherical user in a vacume, on an island free of friction far far away. For simplicity let's say his system has some single imaginary locale (LC_ALL=xx_XX.NON_UTF). A user types in the terminal: "touch 'ટેસ્ટ' ". And it creates a file "ટેસ્ટ" with name encoded in NON_UTF. Exactly the same record will be on the filesystem since nor touch, nor the linux kernel, nor extX driver don't do any encoding conversions. Then he desires to open that file with a tqt program. Somewhere deep inside TQDir it gets a string from readdir() that contains exactly "ટેસ્ટ" encoded in the same NON_UTF encoding... What should be next? TQString::fromUtf8(), which makes so simple string "ટેસ્ટ" look like some gibberish and causing the angry user to loose his belief in humanity and become a serial killer? Or use TQString::fromLocal8Bit() so he could be happy and see the "ટેસ્ટ" in his TQOpenFileDialog.
Luckily, now systems with non-utf8 locales are mostly extinct at least on linux and desktops...
Uhm, makes sense, good point. Interestingly my disagreement with you came from the other way around: what if the filesystem is using a 16bit or 32bit encoding? How would TQString::fromLocal8Bit() interpret that? Anyhow it was just my 2 cents :-) Cheers Michele
Michele Calgaro wrote:
On 2016/03/23 02:51 PM, Fat-Zer wrote:
2016-03-23 5:31 GMT+03:00 Michele Calgaro michele.calgaro@yahoo.it:
On 03/23/2016 09:39 AM, Fat-Zer wrote:
I disagree with Alex on point 2). I would still go for TQString::fromUtf8() if I am handling strings from the OS, just in case ;-)
Just a thought experiment: Imaging a spherical user in a vacume, on an island free of friction far far away. For simplicity let's say his system has some single imaginary locale (LC_ALL=xx_XX.NON_UTF). A user types in the terminal: "touch 'ટેસ્ટ' ". And it creates a file "ટેસ્ટ" with name encoded in NON_UTF. Exactly the same record will be on the filesystem since nor touch, nor the linux kernel, nor extX driver don't do any encoding conversions. Then he desires to open that file with a tqt program. Somewhere deep inside TQDir it gets a string from readdir() that contains exactly "ટેસ્ટ" encoded in the same NON_UTF encoding... What should be next? TQString::fromUtf8(), which makes so simple string "ટેસ્ટ" look like some gibberish and causing the angry user to loose his belief in humanity and become a serial killer? Or use TQString::fromLocal8Bit() so he could be happy and see the "ટેસ્ટ" in his TQOpenFileDialog.
Luckily, now systems with non-utf8 locales are mostly extinct at least on linux and desktops...
Uhm, makes sense, good point. Interestingly my disagreement with you came from the other way around: what if the filesystem is using a 16bit or 32bit encoding? How would TQString::fromLocal8Bit() interpret that? Anyhow it was just my 2 cents :-) Cheers Michele
Thank you for the explanations. This confirms my understanding of the matter, but does not explain why I get mangled characters at the end.
The old Nokia phone (5530) seems to be Latin1 (ISO-8859-15). So I get on syncrequest std::string data. I do TQString data = TQString::fromUtf8(item.data(), item.size());
With my N9 or the filesync (TDE filesystem) it works fine, but with the 5530 I get the german ü/ä/ö mangled.
There seems to be something I do not understand correctly - or indeed I should use fromLocal8Bit(). I read about this time ago and compared the way KDE4 handles it. Now they are tricky using QByteArray and I was wondering if TQByteArray could do the work.
I'll ask also the syncevo team, if one could pass the encoding to the config. This way we'll be able to handle it properly.
regards
2016-03-23 10:34 GMT+03:00 deloptes deloptes@gmail.com:
Michele Calgaro wrote:
Thank you for the explanations. This confirms my understanding of the matter, but does not explain why I get mangled characters at the end.
The old Nokia phone (5530) seems to be Latin1 (ISO-8859-15). So I get on syncrequest std::string data. I do TQString data = TQString::fromUtf8(item.data(), item.size());
With my N9 or the filesync (TDE filesystem) it works fine, but with the 5530 I get the german ü/ä/ö mangled.
Sorry about that, my mistake... fromUtf8() is safe to use instead of fromAscii() off coarse. for upper part of latin1 table it will give different result... You are supposed to manually set encoding (and likely let the user to choose it in this case), and use TQTextCodec to decode strings
There seems to be something I do not understand correctly - or indeed I should use fromLocal8Bit(). I read about this time ago and compared the way KDE4 handles it. Now they are tricky using QByteArray and I was wondering if TQByteArray could do the work.
I'll ask also the syncevo team, if one could pass the encoding to the config. This way we'll be able to handle it properly.
regards
To unsubscribe, e-mail: trinity-devel-unsubscribe@lists.pearsoncomputing.net For additional commands, e-mail: trinity-devel-help@lists.pearsoncomputing.net Read list messages on the web archive: http://trinity-devel.pearsoncomputing.net/ Please remember not to top-post: http://trinity.pearsoncomputing.net/mailing_lists/#top-posting
2016-03-23 10:47 GMT+03:00 Fat-Zer fatzer2@gmail.com:
2016-03-23 10:34 GMT+03:00 deloptes deloptes@gmail.com:
Michele Calgaro wrote:
Thank you for the explanations. This confirms my understanding of the matter, but does not explain why I get mangled characters at the end.
The old Nokia phone (5530) seems to be Latin1 (ISO-8859-15). So I get on syncrequest std::string data. I do TQString data = TQString::fromUtf8(item.data(), item.size());
With my N9 or the filesync (TDE filesystem) it works fine, but with the 5530 I get the german ü/ä/ö mangled.
Sorry about that, my mistake... fromUtf8() is safe to use instead of fromAscii() off coarse. for upper part of latin1 table it will give different result... You are supposed to manually set encoding (and likely let the user to choose it in this case), and use TQTextCodec to decode strings
There seems to be something I do not understand correctly - or indeed I should use fromLocal8Bit(). I read about this time ago and compared the way KDE4 handles it. Now they are tricky using QByteArray and I was wondering if TQByteArray could do the work.
I'll ask also the syncevo team, if one could pass the encoding to the config. This way we'll be able to handle it properly.
regards
To unsubscribe, e-mail: trinity-devel-unsubscribe@lists.pearsoncomputing.net For additional commands, e-mail: trinity-devel-help@lists.pearsoncomputing.net Read list messages on the web archive: http://trinity-devel.pearsoncomputing.net/ Please remember not to top-post: http://trinity.pearsoncomputing.net/mailing_lists/#top-posting
Forgot a 1-line snippet:
TQString data = TQTextCodec::codecForName("ISO-8859-15")->toUnicode (item.data(), item.size());
Haven't tested but should work...
Fat-Zer wrote:
2016-03-23 10:47 GMT+03:00 Fat-Zer fatzer2@gmail.com:
2016-03-23 10:34 GMT+03:00 deloptes deloptes@gmail.com:
Michele Calgaro wrote:
Thank you for the explanations. This confirms my understanding of the matter, but does not explain why I get mangled characters at the end.
The old Nokia phone (5530) seems to be Latin1 (ISO-8859-15). So I get on syncrequest std::string data. I do TQString data = TQString::fromUtf8(item.data(), item.size());
With my N9 or the filesync (TDE filesystem) it works fine, but with the 5530 I get the german ü/ä/ö mangled.
Sorry about that, my mistake... fromUtf8() is safe to use instead of fromAscii() off coarse. for upper part of latin1 table it will give different result... You are supposed to manually set encoding (and likely let the user to choose it in this case), and use TQTextCodec to decode strings
There seems to be something I do not understand correctly - or indeed I should use fromLocal8Bit(). I read about this time ago and compared the way KDE4 handles it. Now they are tricky using QByteArray and I was wondering if TQByteArray could do the work.
I'll ask also the syncevo team, if one could pass the encoding to the config. This way we'll be able to handle it properly.
regards
To unsubscribe, e-mail:
trinity-devel-unsubscribe@lists.pearsoncomputing.net
For additional commands, e-mail:
trinity-devel-help@lists.pearsoncomputing.net
Read list messages on the web archive: http://trinity-devel.pearsoncomputing.net/ Please remember not to top-post: http://trinity.pearsoncomputing.net/mailing_lists/#top-posting
Forgot a 1-line snippet:
TQString data = TQTextCodec::codecForName("ISO-8859-15")->toUnicode (item.data(), item.size());
Haven't tested but should work...
Thank you, this was very good direction pointer!
2016-03-23 9:20 GMT+03:00 Michele Calgaro michele.calgaro@yahoo.it:
On 2016/03/23 02:51 PM, Fat-Zer wrote: Uhm, makes sense, good point. Interestingly my disagreement with you came from the other way around: what if the filesystem is using a 16bit or 32bit encoding? How would TQString::fromLocal8Bit() interpret that? Anyhow it was just my 2 cents :-) Cheers Michele
Firstly, TQString::fromLocal8Bit() has const char * argument, but neither utf16 nor utf32 may be stored inside a plain char* array, so it's not an issue (unless we are on some very strange platform with a 16 or 18 bit chars)... For Utf16 there is fromUcs2(), but on system interaction level it's useful only for non-*nix'es... In the unix world all API calls use char* and as a consequence no native unix filesystem uses long-Char encoding. AFAIK the only one semi-supported filesystem in linux that uses utf16 is ntfs, and symbols are converted by the kernel (or ntfs-3g) to the desired encoding (see "nls" and "utf8" mount options for kernel module and "locale" for ntfs-3g).
PS: Sorry to everybody if we are making too much noise on the mail list...
On 2016/03/23 04:35 PM, Fat-Zer wrote:
2016-03-23 9:20 GMT+03:00 Michele Calgaro michele.calgaro@yahoo.it:
On 2016/03/23 02:51 PM, Fat-Zer wrote: Uhm, makes sense, good point. Interestingly my disagreement with you came from the other way around: what if the filesystem is using a 16bit or 32bit encoding? How would TQString::fromLocal8Bit() interpret that? Anyhow it was just my 2 cents :-) Cheers Michele
Firstly, TQString::fromLocal8Bit() has const char * argument, but neither utf16 nor utf32 may be stored inside a plain char* array, so it's not an issue (unless we are on some very strange platform with a 16 or 18 bit chars)... For Utf16 there is fromUcs2(), but on system interaction level it's useful only for non-*nix'es... In the unix world all API calls use char* and as a consequence no native unix filesystem uses long-Char encoding. AFAIK the only one semi-supported filesystem in linux that uses utf16 is ntfs, and symbols are converted by the kernel (or ntfs-3g) to the desired encoding (see "nls" and "utf8" mount options for kernel module and "locale" for ntfs-3g).
PS: Sorry to everybody if we are making too much noise on the mail list...
Thanks for the detailed explanation Alex, always good to learn some more bits of information, since you are quite obviously more knowledgeable than me on this matter ;-) Cheers Michele
Hi all,
after some testing I suspect the problem in or around the parseVCard somehow creating.
The phones report in vcard2.1. It is converted by syncevolution to v3 and passed to the plugin. The attached file shows the output of the syncevolution backend (tdepim).
addressbook: TDE addressbook ENTRY BEFORE - prints std::string item (the vCard)
SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY BEFORE \n%s\n",item.c_str() );
Überdrüber OK
addressbook: TDE addressbook ENTRY FROM UTF - is the std::string value converted to TQString via fromUtf8 as discussed in previous posts
TQString input = TQString::fromUtf8(item.data(),item.size()); std::string input_str(input.utf8(),input.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY FROM UTF \n%s\n",input_str.c_str() );
Überdrüber OK
addressbook: TDE addressbook ENTRY AFTER is the output of addressee after converter.parseVCard(input) is called and converted to std::string
TDEABC::Addressee addressee = converter.parseVCard(input);
/* DEBUG */ TQString data; if (m_type == TDEPIM_CONTACT_V21 ) data = converter.createVCard(addressee, TDEABC::VCardConverter::v2_1); else data = converter.createVCard(addressee, TDEABC::VCardConverter::v3_0); std::string data_str(data.utf8(),data.utf8().length());
SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER \n%s\n",data_str.c_str() );
I could reproduce something similar in a test program by
std::string teststr(input.utf8(),input.utf8().length()); std::cout << teststr << "\n";
result NOK
This however does not explain the above problem as I see the broken äöü in the AddressBook in TDE. This must be coming from parseVCard as after this addressee is added to the AddressBook.
This is consistent BTW with my experience with KDE3 and opensync, where I had same problems, but never had the balls to confront them.
However this works in my test program
std::string teststr(newItem.ascii()); std::cout << teststr << "\n";
and this contradicts the logic of ascii all äöü are there
regards
deloptes wrote:
However this works in my test program
std::string teststr(newItem.ascii()); std::cout << teststr << "\n";
and this contradicts the logic of ascii all äöü are there
Looking further into it I solved the issue by passing c_str() to parseVCard
TDEABC::Addressee addressee = converter.parseVCard(item.c_str());
works OK
and when reading an item the same, after converting the TQString into std::string, passing the c_str() to the function.
works OK
so in both directions now encoding is preserved.
thanks for the hints and advises, without your help I wouldn't have solved it so fast.
2016-03-25 3:58 GMT+03:00 deloptes deloptes@gmail.com:
Hi all,
...
However this works in my test program
std::string teststr(newItem.ascii()); std::cout << teststr << "\n";
and this contradicts the logic of ascii all äöü are there
Nope, it doesn't... see the ascii () documentation: «If a codec has been set using QTextCodec::codecForCStrings(), it is used to convert Unicode to 8-bit char. Otherwise, this function does the same as latin1().» However you generally shouldn't use ascii() unless either you are positive that string contains only ascii chars or some over interface accepts strictly those and you doesn't care about others...
Fat-Zer wrote:
2016-03-25 3:58 GMT+03:00 deloptes deloptes@gmail.com:
Hi all,
...
However this works in my test program
std::string teststr(newItem.ascii()); std::cout << teststr << "\n";
and this contradicts the logic of ascii all äöü are there
Nope, it doesn't... see the ascii () documentation: «If a codec has been set using QTextCodec::codecForCStrings(), it is used to convert Unicode to 8-bit char. Otherwise, this function does the same as latin1().» However you generally shouldn't use ascii() unless either you are positive that string contains only ascii chars or some over interface accepts strictly those and you doesn't care about others...
Hi, this is also how I understand the ascii(), but do you have explanation how I then see the üöä (utf?). The above was just an experiment. For the code I wrote I solved the problem by passing the c_str() to parseVCard. This passes char array and does not care about the content that much (my understanding)
regards
2016-03-26 11:42 GMT+03:00 deloptes deloptes@gmail.com:
Hi, this is also how I understand the ascii(), but do you have explanation how I then see the üöä (utf?). The above was just an experiment. For the code I wrote I solved the problem by passing the c_str() to parseVCard. This passes char array and does not care about the content that much (my understanding)
regards
Yes, It seems you are right there is a bug, haven't tried myself, but should do the trick:
diff --git a/tdeabc/vcardparser/vcardparser.cpp b/tdeabc/vcardparser/vcardparser.cpp index 7ac07ce..db33263 100644 --- a/tdeabc/vcardparser/vcardparser.cpp +++ b/tdeabc/vcardparser/vcardparser.cpp @@ -152,7 +152,7 @@ VCard::List VCardParser::parseVCards( const TQString& text ) KCodecs::quotedPrintableDecode( input, output ); } } else { - output = TQCString(value.latin1()); + output = TQCString(value.utf8()); }
if ( params.findIndex( "charset" ) != -1 ) { // have to convert the data
Note that VCardParser::parseVCards() is generally encoding-unsafe...
PS, some notes about your code:
std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER \n%s\n",data_str.c_str() );
Note that there is no need here to create here an intermediate std::string, next code should work by itself:
std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER\n%s\n",data.utf8() );
if not, just cast it to (const char *).
TDEABC::Addressee addressee = converter.parseVCard(item.c_str());
This is an equivalent to fromLatin1(), so it will work only for your locale...
Next tim if you encounter such issues, try to provide a minimal compiliable test example. That will significantly ease the testing and understanding what's wrong...
Fat-Zer wrote:
2016-03-26 11:42 GMT+03:00 deloptes deloptes@gmail.com:
Hi, this is also how I understand the ascii(), but do you have explanation how I then see the üöä (utf?). The above was just an experiment. For the code I wrote I solved the problem by passing the c_str() to parseVCard. This passes char array and does not care about the content that much (my understanding)
regards
Yes, It seems you are right there is a bug, haven't tried myself, but should do the trick:
diff --git a/tdeabc/vcardparser/vcardparser.cpp b/tdeabc/vcardparser/vcardparser.cpp index 7ac07ce..db33263 100644 --- a/tdeabc/vcardparser/vcardparser.cpp +++ b/tdeabc/vcardparser/vcardparser.cpp @@ -152,7 +152,7 @@ VCard::List VCardParser::parseVCards( const TQString& text ) KCodecs::quotedPrintableDecode( input, output ); } } else {
output = TQCString(value.latin1());
output = TQCString(value.utf8()); } if ( params.findIndex( "charset" ) != -1 ) { // have to
convert the data
Note that VCardParser::parseVCards() is generally encoding-unsafe...
Yes I also looked into this, I closed the file less than 60sec later - because I started having headache. I don't understand what this diff means - do you mean how it should be or is it something from the history of the file?
PS, some notes about your code:
std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER \n%s\n",data_str.c_str() );
Note that there is no need here to create here an intermediate std::string, next code should work by itself:
std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER\n%s\n",data.utf8() );
if not, just cast it to (const char *).
Thank you - you speak out some of my thoughts. I also think I have tested the above, but not sure anymore.
TDEABC::Addressee addressee = converter.parseVCard(item.c_str());
This is an equivalent to fromLatin1(), so it will work only for your locale...
I'm not sure if I understand this well. fromLatin1 means I use iso-8859, but I use utf8. It is also obvious that the input (item) is received in utf8. I had different experience when using TQString::fromLatin1 ()
Next tim if you encounter such issues, try to provide a minimal compiliable test example. That will significantly ease the testing and understanding what's wrong...
Yes, you are correct again, however time constrains and frustration prevented me doing this as I have to clean the test code from older tests, I commented out. I add now something, however I was disappointed that output of the test program was different to what I saw in the AddressBook. Perhaps because you have to convert the addressee back and this makes the original problem right somewhere in the converter.
Find attached the code and compile like this
PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/opt/trinity/lib/pkgconfig \ g++ `pkg-config --cflags tqt` -I/opt/trinity/include \ `pkg-config --libs tqt` -L/opt/trinity/lib \ -ltdecore -ltdeabc std-test.cc -o std-abreader
To check the result in the address book you have to uncomment the write and save lines and also cleanup tdeabc lock and cache files prior to executing. I had a very bad experience with those already
rm -f ~/.trinity/share/apps/tdeabc/lock/*.trinity_share_apps_tdeabc_std.vcf* \ ~/.trinity/share/apps/tdeabc/std.vcf__*
regards
2016-03-27 15:07 GMT+03:00 deloptes deloptes@gmail.com:
Fat-Zer wrote:
2016-03-26 11:42 GMT+03:00 deloptes deloptes@gmail.com:
Hi, this is also how I understand the ascii(), but do you have explanation how I then see the üöä (utf?). The above was just an experiment. For the code I wrote I solved the problem by passing the c_str() to parseVCard. This passes char array and does not care about the content that much (my understanding)
regards
Yes, It seems you are right there is a bug, haven't tried myself, but should do the trick:
diff --git a/tdeabc/vcardparser/vcardparser.cpp b/tdeabc/vcardparser/vcardparser.cpp index 7ac07ce..db33263 100644 --- a/tdeabc/vcardparser/vcardparser.cpp +++ b/tdeabc/vcardparser/vcardparser.cpp @@ -152,7 +152,7 @@ VCard::List VCardParser::parseVCards( const TQString& text ) KCodecs::quotedPrintableDecode( input, output ); } } else {
output = TQCString(value.latin1());
output = TQCString(value.utf8()); } if ( params.findIndex( "charset" ) != -1 ) { // have to
convert the data
Note that VCardParser::parseVCards() is generally encoding-unsafe...
Yes I also looked into this, I closed the file less than 60sec later - because I started having headache. I don't understand what this diff means - do you mean how it should be or is it something from the history of the file?
Yes, it is how it should be, a fix for tdelibs. But note if vcard will have a field in a different encoding (e.g. "charset" parameter is set) the code will likely fail... To fix it completely the whole api changes are required (pass TQByteArray to the parser rather than TQString).
PS, some notes about your code:
std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER \n%s\n",data_str.c_str() );
Note that there is no need here to create here an intermediate std::string, next code should work by itself:
std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER\n%s\n",data.utf8() );
if not, just cast it to (const char *).
Thank you - you speak out some of my thoughts. I also think I have tested the above, but not sure anymore.
TDEABC::Addressee addressee = converter.parseVCard(item.c_str());
This is an equivalent to fromLatin1(), so it will work only for your locale...
I'm not sure if I understand this well. fromLatin1 means I use iso-8859, but I use utf8. It is also obvious that the input (item) is received in utf8. I had different experience when using TQString::fromLatin1 ()
Here you implicitly use QString (const char*) which is an equivalent to QString::fromAscii (), which is equivalent of fromLatin1 () as far as you don't set QTextCodec::codecForCStrings(). So the code will likely fail if it will have some other encoding.
Next tim if you encounter such issues, try to provide a minimal compiliable test example. That will significantly ease the testing and understanding what's wrong...
Yes, you are correct again, however time constrains and frustration prevented me doing this as I have to clean the test code from older tests, I commented out. I add now something, however I was disappointed that output of the test program was different to what I saw in the AddressBook. Perhaps because you have to convert the addressee back and this makes the original problem right somewhere in the converter.
Find attached the code and compile like this
PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/opt/trinity/lib/pkgconfig \ g++ `pkg-config --cflags tqt` -I/opt/trinity/include \ `pkg-config --libs tqt` -L/opt/trinity/lib \ -ltdecore -ltdeabc std-test.cc -o std-abreader
To check the result in the address book you have to uncomment the write and save lines and also cleanup tdeabc lock and cache files prior to executing. I had a very bad experience with those already
rm -f ~/.trinity/share/apps/tdeabc/lock/*.trinity_share_apps_tdeabc_std.vcf* \ ~/.trinity/share/apps/tdeabc/std.vcf__*
regards
To unsubscribe, e-mail: trinity-devel-unsubscribe@lists.pearsoncomputing.net For additional commands, e-mail: trinity-devel-help@lists.pearsoncomputing.net Read list messages on the web archive: http://trinity-devel.pearsoncomputing.net/ Please remember not to top-post: http://trinity.pearsoncomputing.net/mailing_lists/#top-posting
Fat-Zer wrote:
2016-03-27 15:07 GMT+03:00 deloptes deloptes@gmail.com:
Fat-Zer wrote:
diff --git a/tdeabc/vcardparser/vcardparser.cpp b/tdeabc/vcardparser/vcardparser.cpp index 7ac07ce..db33263 100644 --- a/tdeabc/vcardparser/vcardparser.cpp +++ b/tdeabc/vcardparser/vcardparser.cpp @@ -152,7 +152,7 @@ VCard::List VCardParser::parseVCards( const TQString& text ) KCodecs::quotedPrintableDecode( input, output ); } } else {
output = TQCString(value.latin1());
output = TQCString(value.utf8()); } if ( params.findIndex( "charset" ) != -1 ) { // have to
convert the data
Note that VCardParser::parseVCards() is generally encoding-unsafe...
Yes I also looked into this, I closed the file less than 60sec later - because I started having headache. I don't understand what this diff means - do you mean how it should be or is it something from the history of the file?
Yes, it is how it should be, a fix for tdelibs. But note if vcard will have a field in a different encoding (e.g. "charset" parameter is set) the code will likely fail... To fix it completely the whole api changes are required (pass TQByteArray to the parser rather than TQString).
You mean charset different than UTF-8? But this "else" refers to the case when no charset+encoding is specified, so it should really default to UTF (IMO)
PS, some notes about your code:
std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER \n%s\n",data_str.c_str() );
Note that there is no need here to create here an intermediate std::string, next code should work by itself:
std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER\n%s\n",data.utf8() );
if not, just cast it to (const char *).
Thank you - you speak out some of my thoughts. I also think I have tested the above, but not sure anymore.
TDEABC::Addressee addressee = converter.parseVCard(item.c_str());
This is an equivalent to fromLatin1(), so it will work only for your locale...
I'm not sure if I understand this well. fromLatin1 means I use iso-8859, but I use utf8. It is also obvious that the input (item) is received in utf8. I had different experience when using TQString::fromLatin1 ()
Here you implicitly use QString (const char*) which is an equivalent to QString::fromAscii (), which is equivalent of fromLatin1 () as far as you don't set QTextCodec::codecForCStrings(). So the code will likely fail if it will have some other encoding.
You mean charset different than UTF-8?
I'm not sure because I observed some strange behavior - just played around with the test code until it worked. The most frustrating was to see all looks fine in the test program and after sync it was mangled in the address book. So I did some testing on the parseVCard until I found out it works thebest when passing c_str(). I tried all options that were highlighted in the thread here or in syncevolution.
Thanks for explanation on the above. I think it is pity I do not have more time to track it further, but I still do not understand when you say it is equivalent to .... and the code will fail if encoding is set.
In vCard 2.1 you have the option to specify charset+encoding In vCard 3.0 it looks like it is default to UTF and I've not tested charset+encoding
The code I produced operates based on what is coming from syncevolution and offers vCard 3.0, so we receive automatically UTF input. Perhaps I should test with vCard 2.1. Or better someone else, but this is good point to make a todo note.
thanks again, appreciated regards
2016-03-28 1:00 GMT+03:00 deloptes deloptes@gmail.com:
Fat-Zer wrote:
2016-03-27 15:07 GMT+03:00 deloptes deloptes@gmail.com:
Fat-Zer wrote:
Yes, it is how it should be, a fix for tdelibs. But note if vcard will have a field in a different encoding (e.g. "charset" parameter is set) the code will likely fail... To fix it completely the whole api changes are required (pass TQByteArray to the parser rather than TQString).
You mean charset different than UTF-8? But this "else" refers to the case when no charset+encoding is specified, so it should really default to UTF (IMO)
This "else" refers to case than no "encoding" specified. The charset is handled later. It seems will be handled correct if bote encoding and charset are specified, but it will be wrong if only charset is set. Here is a general mistake: the QString is used in those functions as a container for a sequence of bytes with undefined charset, which is generally wrong. It works because of QString internally consist of two independent data sets: zero-terminated const char* for fast return with ascii() or latin1() (in case the string is latin1) and a QChar[]. But this is a very malicious practice...
So to make the code work in it's current state an obscure and unintuitive code is required: converter.parseVCard( TQString::fromLatin1(str.utf8()) );
The correct solution is to change the API so parseVCard would accept a QByteArray rather than a QString. Also note that it was started during kde times: note KABC_VCARD_ENCODING_FIX ifdefs in tdepim...
Here you implicitly use QString (const char*) which is an equivalent to QString::fromAscii (), which is equivalent of fromLatin1 () as far as you don't set QTextCodec::codecForCStrings(). So the code will likely fail if it will have some other encoding.
You mean charset different than UTF-8?
I'm not sure because I observed some strange behavior - just played around with the test code until it worked. The most frustrating was to see all looks fine in the test program and after sync it was mangled in the address book. So I did some testing on the parseVCard until I found out it works thebest when passing c_str(). I tried all options that were highlighted in the thread here or in syncevolution.
Thanks for explanation on the above. I think it is pity I do not have more time to track it further, but I still do not understand when you say it is equivalent to .... and the code will fail if encoding is set.
"Equivalent" here means that the following code will have exactly the same results: converter.parseVCard(item.c_str()); converter.parseVCard(TQString(item.c_str())); converter.parseVCard(TQString::fromAscii(item.c_str())); // if you haven't set TextCodec::codecForCStrings() explicitly converter.parseVCard(TQString::fromLatin1(item.c_str()));
In vCard 2.1 you have the option to specify charset+encoding In vCard 3.0 it looks like it is default to UTF and I've not tested charset+encoding
The code I produced operates based on what is coming from syncevolution and offers vCard 3.0, so we receive automatically UTF input. Perhaps I should test with vCard 2.1. Or better someone else, but this is good point to make a todo note.
2016-03-28 1:09 GMT+03:00 deloptes deloptes@gmail.com:
BTW did you raise a bug to fix this?
No, I haven't...
Fat-Zer wrote:
diff --git a/tdeabc/vcardparser/vcardparser.cpp b/tdeabc/vcardparser/vcardparser.cpp index 7ac07ce..db33263 100644 --- a/tdeabc/vcardparser/vcardparser.cpp +++ b/tdeabc/vcardparser/vcardparser.cpp @@ -152,7 +152,7 @@ VCard::List VCardParser::parseVCards( const TQString& text ) KCodecs::quotedPrintableDecode( input, output ); } } else {
- output = TQCString(value.latin1());
- output = TQCString(value.utf8());
}
if ( params.findIndex( "charset" ) != -1 ) { // have to convert the data
Note that VCardParser::parseVCards() is generally encoding-unsafe...
BTW did you raise a bug to fix this?
regards