TQString::fromUtf8 vs TQString::fromLatin1

List overview All Threads
Download

newer

older

tdelibs FTBFS

No system sounds on openSUSE 42.1

deloptes

22 Mar 2016 22 Mar '16

11:08 p.m.

Hi all,

I need some help again.

Which is the preferred function to use when creating TQString from std::string and how can I make sure that I end up with Utf-8.

The thing is that input in std::string can be either UTF-8 or not UTF-8.

What is the standard way of doing this in TDE (TQt)?

I am really confused, because I was looking in some KDE3/TDE code and I see both used.

My problem is that some older phones would most likely lack UTF and newer would do only UTF. So how can I make sure to "speak the right language" with them?

A hint would be appreciated.

regards

Show replies by date

Fat-Zer

23 Mar 23 Mar

12:39 a.m.

New subject: [trinity-devel] TQString::fromUtf8 vs TQString::fromLatin1

2016-03-23 2:08 GMT+03:00 deloptes deloptes@gmail.com:

...

Hi all,

I need some help again.

Which is the preferred function to use when creating TQString from std::string and how can I make sure that I end up with Utf-8.

The thing is that input in std::string can be either UTF-8 or not UTF-8.

What is the standard way of doing this in TDE (TQt)?

I am really confused, because I was looking in some KDE3/TDE code and I see both used.

My problem is that some older phones would most likely lack UTF and newer would do only UTF. So how can I make sure to "speak the right language" with them?

A hint would be appreciated.

regards

1) If you construct a string from a const char * c-string in your code you better use fromLatin1(). e.g TQString::fromLatin1 ("blabla") 1.1) If the string includes some local symbols or some non-latin1 symbols somewhy, but your source xode is strictly in utf8 you may use TQString::fromUtf8 ("10°") [note the degree sign], But this is kinda dirty practice 2) If you receive a string from OS e.g. a file path from system calls, you would likely should use TQString::fromLocal8bit(), Note that it will decode from utf8 on most modern linux boxes. 3) If you receive a string from some third party module or where ever else you should follow to it's documentation. It may return a text in some other encoding, and you will have to use TQTextCodec (or whatever it's called). 3.1) If you are not sure if it will give you either a latin1 or utf8 string, You are safe to use TQString::fromUtf8()

Note that it's quite safe to use fromUtf8() everywhere instead of fromLatin1(), in most of cases you risk to get just some performance overhead...

Michele Calgaro

2:31 a.m.

New subject: [trinity-devel] TQString::fromUtf8 vs TQString::fromLatin1

On 03/23/2016 09:39 AM, Fat-Zer wrote:

...

2016-03-23 2:08 GMT+03:00 deloptes deloptes@gmail.com:

...
Hi all,

I need some help again.

Which is the preferred function to use when creating TQString from std::string and how can I make sure that I end up with Utf-8.

The thing is that input in std::string can be either UTF-8 or not UTF-8.

What is the standard way of doing this in TDE (TQt)?

I am really confused, because I was looking in some KDE3/TDE code and I see both used.

My problem is that some older phones would most likely lack UTF and newer would do only UTF. So how can I make sure to "speak the right language" with them?

A hint would be appreciated.

regards

If you construct a string from a const char * c-string in your code

you better use fromLatin1(). e.g TQString::fromLatin1 ("blabla") 1.1) If the string includes some local symbols or some non-latin1 symbols somewhy, but your source xode is strictly in utf8 you may use TQString::fromUtf8 ("10°") [note the degree sign], But this is kinda dirty practice 2) If you receive a string from OS e.g. a file path from system calls, you would likely should use TQString::fromLocal8bit(), Note that it will decode from utf8 on most modern linux boxes. 3) If you receive a string from some third party module or where ever else you should follow to it's documentation. It may return a text in some other encoding, and you will have to use TQTextCodec (or whatever it's called). 3.1) If you are not sure if it will give you either a latin1 or utf8 string, You are safe to use TQString::fromUtf8()

Note that it's quite safe to use fromUtf8() everywhere instead of fromLatin1(), in most of cases you risk to get just some performance overhead...

Internally TQString is basically a TQChar array. As Alexander said, TQString::fromUtf8() is probably the safest way to go for most of the cases. Use TQTextCodec::codecForCStrings() if you want to set a specific 8bit to Unicode codec for c-strings. The default is latin1 anyway, so even a simple TQString(<your c-string>) would work in case you know the string is a latin1 string.

I disagree with Alex on point 2). I would still go for TQString::fromUtf8() if I am handling strings from the OS, just in case ;-)

Cheers Michele

Fat-Zer

5:51 a.m.

New subject: [trinity-devel] TQString::fromUtf8 vs TQString::fromLatin1

2016-03-23 5:31 GMT+03:00 Michele Calgaro michele.calgaro@yahoo.it:

...

On 03/23/2016 09:39 AM, Fat-Zer wrote:

I disagree with Alex on point 2). I would still go for TQString::fromUtf8() if I am handling strings from the OS, just in case ;-)

Just a thought experiment: Imaging a spherical user in a vacume, on an island free of friction far far away. For simplicity let's say his system has some single imaginary locale (LC_ALL=xx_XX.NON_UTF). A user types in the terminal: "touch 'ટેસ્ટ' ". And it creates a file "ટેસ્ટ" with name encoded in NON_UTF. Exactly the same record will be on the filesystem since nor touch, nor the linux kernel, nor extX driver don't do any encoding conversions. Then he desires to open that file with a tqt program. Somewhere deep inside TQDir it gets a string from readdir() that contains exactly "ટેસ્ટ" encoded in the same NON_UTF encoding... What should be next? TQString::fromUtf8(), which makes so simple string "ટેસ્ટ" look like some gibberish and causing the angry user to loose his belief in humanity and become a serial killer? Or use TQString::fromLocal8Bit() so he could be happy and see the "ટેસ્ટ" in his TQOpenFileDialog.

Luckily, now systems with non-utf8 locales are mostly extinct at least on linux and desktops...

Michele Calgaro

6:20 a.m.

New subject: [trinity-devel] TQString::fromUtf8 vs TQString::fromLatin1

On 2016/03/23 02:51 PM, Fat-Zer wrote:

...

2016-03-23 5:31 GMT+03:00 Michele Calgaro michele.calgaro@yahoo.it:

...
On 03/23/2016 09:39 AM, Fat-Zer wrote:

I disagree with Alex on point 2). I would still go for TQString::fromUtf8() if I am handling strings from the OS, just in case ;-)

Just a thought experiment: Imaging a spherical user in a vacume, on an island free of friction far far away. For simplicity let's say his system has some single imaginary locale (LC_ALL=xx_XX.NON_UTF). A user types in the terminal: "touch 'ટેસ્ટ' ". And it creates a file "ટેસ્ટ" with name encoded in NON_UTF. Exactly the same record will be on the filesystem since nor touch, nor the linux kernel, nor extX driver don't do any encoding conversions. Then he desires to open that file with a tqt program. Somewhere deep inside TQDir it gets a string from readdir() that contains exactly "ટેસ્ટ" encoded in the same NON_UTF encoding... What should be next? TQString::fromUtf8(), which makes so simple string "ટેસ્ટ" look like some gibberish and causing the angry user to loose his belief in humanity and become a serial killer? Or use TQString::fromLocal8Bit() so he could be happy and see the "ટેસ્ટ" in his TQOpenFileDialog.

Luckily, now systems with non-utf8 locales are mostly extinct at least on linux and desktops...

Uhm, makes sense, good point. Interestingly my disagreement with you came from the other way around: what if the filesystem is using a 16bit or 32bit encoding? How would TQString::fromLocal8Bit() interpret that? Anyhow it was just my 2 cents :-) Cheers Michele

deloptes

7:34 a.m.

Michele Calgaro wrote:

...

On 2016/03/23 02:51 PM, Fat-Zer wrote:

...
2016-03-23 5:31 GMT+03:00 Michele Calgaro michele.calgaro@yahoo.it:

...
On 03/23/2016 09:39 AM, Fat-Zer wrote:

I disagree with Alex on point 2). I would still go for TQString::fromUtf8() if I am handling strings from the OS, just in case ;-)

Just a thought experiment: Imaging a spherical user in a vacume, on an island free of friction far far away. For simplicity let's say his system has some single imaginary locale (LC_ALL=xx_XX.NON_UTF). A user types in the terminal: "touch 'ટેસ્ટ' ". And it creates a file "ટેસ્ટ" with name encoded in NON_UTF. Exactly the same record will be on the filesystem since nor touch, nor the linux kernel, nor extX driver don't do any encoding conversions. Then he desires to open that file with a tqt program. Somewhere deep inside TQDir it gets a string from readdir() that contains exactly "ટેસ્ટ" encoded in the same NON_UTF encoding... What should be next? TQString::fromUtf8(), which makes so simple string "ટેસ્ટ" look like some gibberish and causing the angry user to loose his belief in humanity and become a serial killer? Or use TQString::fromLocal8Bit() so he could be happy and see the "ટેસ્ટ" in his TQOpenFileDialog.

Luckily, now systems with non-utf8 locales are mostly extinct at least on linux and desktops...

Uhm, makes sense, good point. Interestingly my disagreement with you came from the other way around: what if the filesystem is using a 16bit or 32bit encoding? How would TQString::fromLocal8Bit() interpret that? Anyhow it was just my 2 cents :-) Cheers Michele

Thank you for the explanations. This confirms my understanding of the matter, but does not explain why I get mangled characters at the end.

The old Nokia phone (5530) seems to be Latin1 (ISO-8859-15). So I get on syncrequest std::string data. I do TQString data = TQString::fromUtf8(item.data(), item.size());

With my N9 or the filesync (TDE filesystem) it works fine, but with the 5530 I get the german ü/ä/ö mangled.

There seems to be something I do not understand correctly - or indeed I should use fromLocal8Bit(). I read about this time ago and compared the way KDE4 handles it. Now they are tricky using QByteArray and I was wondering if TQByteArray could do the work.

I'll ask also the syncevo team, if one could pass the encoding to the config. This way we'll be able to handle it properly.

regards

Fat-Zer

7:47 a.m.

New subject: [trinity-devel] Re: TQString::fromUtf8 vs TQString::fromLatin1

2016-03-23 10:34 GMT+03:00 deloptes deloptes@gmail.com:

...

Michele Calgaro wrote:

Thank you for the explanations. This confirms my understanding of the matter, but does not explain why I get mangled characters at the end.

The old Nokia phone (5530) seems to be Latin1 (ISO-8859-15). So I get on syncrequest std::string data. I do TQString data = TQString::fromUtf8(item.data(), item.size());

With my N9 or the filesync (TDE filesystem) it works fine, but with the 5530 I get the german ü/ä/ö mangled.

Sorry about that, my mistake... fromUtf8() is safe to use instead of fromAscii() off coarse. for upper part of latin1 table it will give different result... You are supposed to manually set encoding (and likely let the user to choose it in this case), and use TQTextCodec to decode strings

...

There seems to be something I do not understand correctly - or indeed I should use fromLocal8Bit(). I read about this time ago and compared the way KDE4 handles it. Now they are tricky using QByteArray and I was wondering if TQByteArray could do the work.

I'll ask also the syncevo team, if one could pass the encoding to the config. This way we'll be able to handle it properly.

regards

To unsubscribe, e-mail: trinity-devel-unsubscribe@lists.pearsoncomputing.net For additional commands, e-mail: trinity-devel-help@lists.pearsoncomputing.net Read list messages on the web archive: http://trinity-devel.pearsoncomputing.net/ Please remember not to top-post: http://trinity.pearsoncomputing.net/mailing_lists/#top-posting

Fat-Zer

7:56 a.m.

New subject: [trinity-devel] Re: TQString::fromUtf8 vs TQString::fromLatin1

2016-03-23 10:47 GMT+03:00 Fat-Zer fatzer2@gmail.com:

...

2016-03-23 10:34 GMT+03:00 deloptes deloptes@gmail.com:

...
Michele Calgaro wrote:

Thank you for the explanations. This confirms my understanding of the matter, but does not explain why I get mangled characters at the end.

The old Nokia phone (5530) seems to be Latin1 (ISO-8859-15). So I get on syncrequest std::string data. I do TQString data = TQString::fromUtf8(item.data(), item.size());

With my N9 or the filesync (TDE filesystem) it works fine, but with the 5530 I get the german ü/ä/ö mangled.

Sorry about that, my mistake... fromUtf8() is safe to use instead of fromAscii() off coarse. for upper part of latin1 table it will give different result... You are supposed to manually set encoding (and likely let the user to choose it in this case), and use TQTextCodec to decode strings

...
There seems to be something I do not understand correctly - or indeed I should use fromLocal8Bit(). I read about this time ago and compared the way KDE4 handles it. Now they are tricky using QByteArray and I was wondering if TQByteArray could do the work.

I'll ask also the syncevo team, if one could pass the encoding to the config. This way we'll be able to handle it properly.

regards

To unsubscribe, e-mail: trinity-devel-unsubscribe@lists.pearsoncomputing.net For additional commands, e-mail: trinity-devel-help@lists.pearsoncomputing.net Read list messages on the web archive: http://trinity-devel.pearsoncomputing.net/ Please remember not to top-post: http://trinity.pearsoncomputing.net/mailing_lists/#top-posting

Forgot a 1-line snippet:

TQString data = TQTextCodec::codecForName("ISO-8859-15")->toUnicode (item.data(), item.size());

Haven't tested but should work...

deloptes

9:55 p.m.

Fat-Zer wrote:

...

2016-03-23 10:47 GMT+03:00 Fat-Zer fatzer2@gmail.com:

...
2016-03-23 10:34 GMT+03:00 deloptes deloptes@gmail.com:

...
Michele Calgaro wrote:

Thank you for the explanations. This confirms my understanding of the matter, but does not explain why I get mangled characters at the end.

The old Nokia phone (5530) seems to be Latin1 (ISO-8859-15). So I get on syncrequest std::string data. I do TQString data = TQString::fromUtf8(item.data(), item.size());

With my N9 or the filesync (TDE filesystem) it works fine, but with the 5530 I get the german ü/ä/ö mangled.

Sorry about that, my mistake... fromUtf8() is safe to use instead of fromAscii() off coarse. for upper part of latin1 table it will give different result... You are supposed to manually set encoding (and likely let the user to choose it in this case), and use TQTextCodec to decode strings

...
There seems to be something I do not understand correctly - or indeed I should use fromLocal8Bit(). I read about this time ago and compared the way KDE4 handles it. Now they are tricky using QByteArray and I was wondering if TQByteArray could do the work.

I'll ask also the syncevo team, if one could pass the encoding to the config. This way we'll be able to handle it properly.

regards

To unsubscribe, e-mail:

trinity-devel-unsubscribe@lists.pearsoncomputing.net

...

...
...
For additional commands, e-mail:

trinity-devel-help@lists.pearsoncomputing.net

...

...
...
Read list messages on the web archive: http://trinity-devel.pearsoncomputing.net/ Please remember not to top-post: http://trinity.pearsoncomputing.net/mailing_lists/#top-posting

Forgot a 1-line snippet:

TQString data = TQTextCodec::codecForName("ISO-8859-15")->toUnicode (item.data(), item.size());

Haven't tested but should work...

Thank you, this was very good direction pointer!

Fat-Zer

7:35 a.m.

New subject: [trinity-devel] TQString::fromUtf8 vs TQString::fromLatin1

2016-03-23 9:20 GMT+03:00 Michele Calgaro michele.calgaro@yahoo.it:

...

On 2016/03/23 02:51 PM, Fat-Zer wrote: Uhm, makes sense, good point. Interestingly my disagreement with you came from the other way around: what if the filesystem is using a 16bit or 32bit encoding? How would TQString::fromLocal8Bit() interpret that? Anyhow it was just my 2 cents :-) Cheers Michele

Firstly, TQString::fromLocal8Bit() has const char * argument, but neither utf16 nor utf32 may be stored inside a plain char* array, so it's not an issue (unless we are on some very strange platform with a 16 or 18 bit chars)... For Utf16 there is fromUcs2(), but on system interaction level it's useful only for non-*nix'es... In the unix world all API calls use char* and as a consequence no native unix filesystem uses long-Char encoding. AFAIK the only one semi-supported filesystem in linux that uses utf16 is ntfs, and symbols are converted by the kernel (or ntfs-3g) to the desired encoding (see "nls" and "utf8" mount options for kernel module and "locale" for ntfs-3g).

PS: Sorry to everybody if we are making too much noise on the mail list...

Michele Calgaro

7:41 a.m.

New subject: [trinity-devel] TQString::fromUtf8 vs TQString::fromLatin1

On 2016/03/23 04:35 PM, Fat-Zer wrote:

...

2016-03-23 9:20 GMT+03:00 Michele Calgaro michele.calgaro@yahoo.it:

...
On 2016/03/23 02:51 PM, Fat-Zer wrote: Uhm, makes sense, good point. Interestingly my disagreement with you came from the other way around: what if the filesystem is using a 16bit or 32bit encoding? How would TQString::fromLocal8Bit() interpret that? Anyhow it was just my 2 cents :-) Cheers Michele

Firstly, TQString::fromLocal8Bit() has const char * argument, but neither utf16 nor utf32 may be stored inside a plain char* array, so it's not an issue (unless we are on some very strange platform with a 16 or 18 bit chars)... For Utf16 there is fromUcs2(), but on system interaction level it's useful only for non-*nix'es... In the unix world all API calls use char* and as a consequence no native unix filesystem uses long-Char encoding. AFAIK the only one semi-supported filesystem in linux that uses utf16 is ntfs, and symbols are converted by the kernel (or ntfs-3g) to the desired encoding (see "nls" and "utf8" mount options for kernel module and "locale" for ntfs-3g).

PS: Sorry to everybody if we are making too much noise on the mail list...

Thanks for the detailed explanation Alex, always good to learn some more bits of information, since you are quite obviously more knowledgeable than me on this matter ;-) Cheers Michele

deloptes

25 Mar 25 Mar

12:58 a.m.

New subject: TQString::fromUtf8 vs TQString::fromLatin1 [possible bug in parseVCard]

Hi all,

after some testing I suspect the problem in or around the parseVCard somehow creating.

The phones report in vcard2.1. It is converted by syncevolution to v3 and passed to the plugin. The attached file shows the output of the syncevolution backend (tdepim).

addressbook: TDE addressbook ENTRY BEFORE - prints std::string item (the vCard)

SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY BEFORE \n%s\n",item.c_str() );

Überdrüber OK

addressbook: TDE addressbook ENTRY FROM UTF - is the std::string value converted to TQString via fromUtf8 as discussed in previous posts

TQString input = TQString::fromUtf8(item.data(),item.size()); std::string input_str(input.utf8(),input.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY FROM UTF \n%s\n",input_str.c_str() );

Überdrüber OK

addressbook: TDE addressbook ENTRY AFTER is the output of addressee after converter.parseVCard(input) is called and converted to std::string

TDEABC::Addressee addressee = converter.parseVCard(input);

/* DEBUG */ TQString data; if (m_type == TDEPIM_CONTACT_V21 ) data = converter.createVCard(addressee, TDEABC::VCardConverter::v2_1); else data = converter.createVCard(addressee, TDEABC::VCardConverter::v3_0); std::string data_str(data.utf8(),data.utf8().length());

SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER \n%s\n",data_str.c_str() );

I could reproduce something similar in a test program by

std::string teststr(input.utf8(),input.utf8().length()); std::cout << teststr << "\n";

result NOK

This however does not explain the above problem as I see the broken äöü in the AddressBook in TDE. This must be coming from parseVCard as after this addressee is added to the AddressBook.

This is consistent BTW with my experience with KDE3 and opensync, where I had same problems, but never had the balls to confront them.

However this works in my test program

std::string teststr(newItem.ascii()); std::cout << teststr << "\n";

and this contradicts the logic of ascii all äöü are there

regards

deloptes

3:59 p.m.

New subject: [SOLVED] Re: TQString::fromUtf8 vs TQString::fromLatin1 [possible bug in parseVCard]

deloptes wrote:

...

However this works in my test program
    std::string teststr(newItem.ascii());
    std::cout << teststr << "\n";
and this contradicts the logic of ascii all äöü are there

Looking further into it I solved the issue by passing c_str() to parseVCard

TDEABC::Addressee addressee = converter.parseVCard(item.c_str());

works OK

and when reading an item the same, after converting the TQString into std::string, passing the c_str() to the function.

works OK

so in both directions now encoding is preserved.

thanks for the hints and advises, without your help I wouldn't have solved it so fast.

Fat-Zer

26 Mar 26 Mar

6:46 a.m.

New subject: [trinity-devel] Re: TQString::fromUtf8 vs TQString::fromLatin1 [possible bug in parseVCard]

2016-03-25 3:58 GMT+03:00 deloptes deloptes@gmail.com:

...

Hi all,

...

However this works in my test program
    std::string teststr(newItem.ascii());
    std::cout << teststr << "\n";
and this contradicts the logic of ascii all äöü are there

Nope, it doesn't... see the ascii () documentation: «If a codec has been set using QTextCodec::codecForCStrings(), it is used to convert Unicode to 8-bit char. Otherwise, this function does the same as latin1().» However you generally shouldn't use ascii() unless either you are positive that string contains only ascii chars or some over interface accepts strictly those and you doesn't care about others...

deloptes

8:42 a.m.

New subject: TQString::fromUtf8 vs TQString::fromLatin1 [possible bug in parseVCard]

Fat-Zer wrote:

...

2016-03-25 3:58 GMT+03:00 deloptes deloptes@gmail.com:

...
Hi all,

...

However this works in my test program
    std::string teststr(newItem.ascii());
    std::cout << teststr << "\n";
and this contradicts the logic of ascii all äöü are there
Nope, it doesn't... see the ascii () documentation: «If a codec has been set using QTextCodec::codecForCStrings(), it is used to convert Unicode to 8-bit char. Otherwise, this function does the same as latin1().» However you generally shouldn't use ascii() unless either you are positive that string contains only ascii chars or some over interface accepts strictly those and you doesn't care about others...

Hi, this is also how I understand the ascii(), but do you have explanation how I then see the üöä (utf?). The above was just an experiment. For the code I wrote I solved the problem by passing the c_str() to parseVCard. This passes char array and does not care about the content that much (my understanding)

regards

Fat-Zer

2:59 p.m.

New subject: [trinity-devel] Re: Re: TQString::fromUtf8 vs TQString::fromLatin1 [possible bug in parseVCard]

2016-03-26 11:42 GMT+03:00 deloptes deloptes@gmail.com:

...

Hi, this is also how I understand the ascii(), but do you have explanation how I then see the üöä (utf?). The above was just an experiment. For the code I wrote I solved the problem by passing the c_str() to parseVCard. This passes char array and does not care about the content that much (my understanding)

regards

Yes, It seems you are right there is a bug, haven't tried myself, but should do the trick:

diff --git a/tdeabc/vcardparser/vcardparser.cpp b/tdeabc/vcardparser/vcardparser.cpp index 7ac07ce..db33263 100644 --- a/tdeabc/vcardparser/vcardparser.cpp +++ b/tdeabc/vcardparser/vcardparser.cpp @@ -152,7 +152,7 @@ VCard::List VCardParser::parseVCards( const TQString& text ) KCodecs::quotedPrintableDecode( input, output ); } } else { - output = TQCString(value.latin1()); + output = TQCString(value.utf8()); }

if ( params.findIndex( "charset" ) != -1 ) { // have to convert the data

Note that VCardParser::parseVCards() is generally encoding-unsafe...

PS, some notes about your code:

...

std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER \n%s\n",data_str.c_str() );

Note that there is no need here to create here an intermediate std::string, next code should work by itself:

std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER\n%s\n",data.utf8() );

if not, just cast it to (const char *).

...

TDEABC::Addressee addressee = converter.parseVCard(item.c_str());

This is an equivalent to fromLatin1(), so it will work only for your locale...

Next tim if you encounter such issues, try to provide a minimal compiliable test example. That will significantly ease the testing and understanding what's wrong...

deloptes

27 Mar 27 Mar

12:07 p.m.

New subject: TQString::fromUtf8 vs TQString::fromLatin1 [possible bug in parseVCard]

Fat-Zer wrote:

...

2016-03-26 11:42 GMT+03:00 deloptes deloptes@gmail.com:

...
Hi, this is also how I understand the ascii(), but do you have explanation how I then see the üöä (utf?). The above was just an experiment. For the code I wrote I solved the problem by passing the c_str() to parseVCard. This passes char array and does not care about the content that much (my understanding)

regards

Yes, It seems you are right there is a bug, haven't tried myself, but should do the trick:

diff --git a/tdeabc/vcardparser/vcardparser.cpp b/tdeabc/vcardparser/vcardparser.cpp index 7ac07ce..db33263 100644 --- a/tdeabc/vcardparser/vcardparser.cpp +++ b/tdeabc/vcardparser/vcardparser.cpp @@ -152,7 +152,7 @@ VCard::List VCardParser::parseVCards( const TQString& text ) KCodecs::quotedPrintableDecode( input, output ); } } else {
     output = TQCString(value.latin1());
     output = TQCString(value.utf8());
   }

   if ( params.findIndex( "charset" ) != -1 ) { // have to
convert the data

Note that VCardParser::parseVCards() is generally encoding-unsafe...

Yes I also looked into this, I closed the file less than 60sec later - because I started having headache. I don't understand what this diff means - do you mean how it should be or is it something from the history of the file?

...

PS, some notes about your code:

...
std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER \n%s\n",data_str.c_str() );

Note that there is no need here to create here an intermediate std::string, next code should work by itself:

std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER\n%s\n",data.utf8() );

if not, just cast it to (const char *).

Thank you - you speak out some of my thoughts. I also think I have tested the above, but not sure anymore.

...

...
TDEABC::Addressee addressee = converter.parseVCard(item.c_str());

This is an equivalent to fromLatin1(), so it will work only for your locale...

I'm not sure if I understand this well. fromLatin1 means I use iso-8859, but I use utf8. It is also obvious that the input (item) is received in utf8. I had different experience when using TQString::fromLatin1 ()

...

Next tim if you encounter such issues, try to provide a minimal compiliable test example. That will significantly ease the testing and understanding what's wrong...

Yes, you are correct again, however time constrains and frustration prevented me doing this as I have to clean the test code from older tests, I commented out. I add now something, however I was disappointed that output of the test program was different to what I saw in the AddressBook. Perhaps because you have to convert the addressee back and this makes the original problem right somewhere in the converter.

Find attached the code and compile like this

PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/opt/trinity/lib/pkgconfig \ g++ `pkg-config --cflags tqt` -I/opt/trinity/include \ `pkg-config --libs tqt` -L/opt/trinity/lib \ -ltdecore -ltdeabc std-test.cc -o std-abreader

To check the result in the address book you have to uncomment the write and save lines and also cleanup tdeabc lock and cache files prior to executing. I had a very bad experience with those already

rm -f ~/.trinity/share/apps/tdeabc/lock/*.trinity_share_apps_tdeabc_std.vcf* \ ~/.trinity/share/apps/tdeabc/std.vcf__*

regards

Fat-Zer

2:09 p.m.

New subject: [trinity-devel] Re: Re: Re: TQString::fromUtf8 vs TQString::fromLatin1 [possible bug in parseVCard]

2016-03-27 15:07 GMT+03:00 deloptes deloptes@gmail.com:

...

Fat-Zer wrote:

...
2016-03-26 11:42 GMT+03:00 deloptes deloptes@gmail.com:

...
Hi, this is also how I understand the ascii(), but do you have explanation how I then see the üöä (utf?). The above was just an experiment. For the code I wrote I solved the problem by passing the c_str() to parseVCard. This passes char array and does not care about the content that much (my understanding)

regards

Yes, It seems you are right there is a bug, haven't tried myself, but should do the trick:

diff --git a/tdeabc/vcardparser/vcardparser.cpp b/tdeabc/vcardparser/vcardparser.cpp index 7ac07ce..db33263 100644 --- a/tdeabc/vcardparser/vcardparser.cpp +++ b/tdeabc/vcardparser/vcardparser.cpp @@ -152,7 +152,7 @@ VCard::List VCardParser::parseVCards( const TQString& text ) KCodecs::quotedPrintableDecode( input, output ); } } else {
     output = TQCString(value.latin1());
     output = TQCString(value.utf8());
   }

   if ( params.findIndex( "charset" ) != -1 ) { // have to
convert the data

Note that VCardParser::parseVCards() is generally encoding-unsafe...
Yes I also looked into this, I closed the file less than 60sec later - because I started having headache. I don't understand what this diff means - do you mean how it should be or is it something from the history of the file?

Yes, it is how it should be, a fix for tdelibs. But note if vcard will have a field in a different encoding (e.g. "charset" parameter is set) the code will likely fail... To fix it completely the whole api changes are required (pass TQByteArray to the parser rather than TQString).

...

...
PS, some notes about your code:

...
std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER \n%s\n",data_str.c_str() );

Note that there is no need here to create here an intermediate std::string, next code should work by itself:

std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER\n%s\n",data.utf8() );

if not, just cast it to (const char *).

Thank you - you speak out some of my thoughts. I also think I have tested the above, but not sure anymore.

...
...
TDEABC::Addressee addressee = converter.parseVCard(item.c_str());

This is an equivalent to fromLatin1(), so it will work only for your locale...

I'm not sure if I understand this well. fromLatin1 means I use iso-8859, but I use utf8. It is also obvious that the input (item) is received in utf8. I had different experience when using TQString::fromLatin1 ()

Here you implicitly use QString (const char*) which is an equivalent to QString::fromAscii (), which is equivalent of fromLatin1 () as far as you don't set QTextCodec::codecForCStrings(). So the code will likely fail if it will have some other encoding.

...

...
Next tim if you encounter such issues, try to provide a minimal compiliable test example. That will significantly ease the testing and understanding what's wrong...

Yes, you are correct again, however time constrains and frustration prevented me doing this as I have to clean the test code from older tests, I commented out. I add now something, however I was disappointed that output of the test program was different to what I saw in the AddressBook. Perhaps because you have to convert the addressee back and this makes the original problem right somewhere in the converter.

Find attached the code and compile like this

PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/opt/trinity/lib/pkgconfig \ g++ `pkg-config --cflags tqt` -I/opt/trinity/include \ `pkg-config --libs tqt` -L/opt/trinity/lib \ -ltdecore -ltdeabc std-test.cc -o std-abreader

To check the result in the address book you have to uncomment the write and save lines and also cleanup tdeabc lock and cache files prior to executing. I had a very bad experience with those already

rm -f ~/.trinity/share/apps/tdeabc/lock/*.trinity_share_apps_tdeabc_std.vcf* \ ~/.trinity/share/apps/tdeabc/std.vcf__*

regards

To unsubscribe, e-mail: trinity-devel-unsubscribe@lists.pearsoncomputing.net For additional commands, e-mail: trinity-devel-help@lists.pearsoncomputing.net Read list messages on the web archive: http://trinity-devel.pearsoncomputing.net/ Please remember not to top-post: http://trinity.pearsoncomputing.net/mailing_lists/#top-posting

deloptes

10 p.m.

New subject: TQString::fromUtf8 vs TQString::fromLatin1 [possible bug in parseVCard]

Fat-Zer wrote:

...

2016-03-27 15:07 GMT+03:00 deloptes deloptes@gmail.com:

...
Fat-Zer wrote:

...

...
...
diff --git a/tdeabc/vcardparser/vcardparser.cpp b/tdeabc/vcardparser/vcardparser.cpp index 7ac07ce..db33263 100644 --- a/tdeabc/vcardparser/vcardparser.cpp +++ b/tdeabc/vcardparser/vcardparser.cpp @@ -152,7 +152,7 @@ VCard::List VCardParser::parseVCards( const TQString& text ) KCodecs::quotedPrintableDecode( input, output ); } } else {
     output = TQCString(value.latin1());
     output = TQCString(value.utf8());
   }

   if ( params.findIndex( "charset" ) != -1 ) { // have to
convert the data

Note that VCardParser::parseVCards() is generally encoding-unsafe...
Yes I also looked into this, I closed the file less than 60sec later - because I started having headache. I don't understand what this diff means - do you mean how it should be or is it something from the history of the file?
Yes, it is how it should be, a fix for tdelibs. But note if vcard will have a field in a different encoding (e.g. "charset" parameter is set) the code will likely fail... To fix it completely the whole api changes are required (pass TQByteArray to the parser rather than TQString).

You mean charset different than UTF-8? But this "else" refers to the case when no charset+encoding is specified, so it should really default to UTF (IMO)

...

...
...
PS, some notes about your code:

...
std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER \n%s\n",data_str.c_str() );

Note that there is no need here to create here an intermediate std::string, next code should work by itself:

std::string data_str(data.utf8(),data.utf8().length()); SE_LOG_DEBUG(getDisplayName(), "TDE addressbook ENTRY AFTER\n%s\n",data.utf8() );

if not, just cast it to (const char *).

Thank you - you speak out some of my thoughts. I also think I have tested the above, but not sure anymore.

...
...
TDEABC::Addressee addressee = converter.parseVCard(item.c_str());

This is an equivalent to fromLatin1(), so it will work only for your locale...

I'm not sure if I understand this well. fromLatin1 means I use iso-8859, but I use utf8. It is also obvious that the input (item) is received in utf8. I had different experience when using TQString::fromLatin1 ()

Here you implicitly use QString (const char*) which is an equivalent to QString::fromAscii (), which is equivalent of fromLatin1 () as far as you don't set QTextCodec::codecForCStrings(). So the code will likely fail if it will have some other encoding.

You mean charset different than UTF-8?

I'm not sure because I observed some strange behavior - just played around with the test code until it worked. The most frustrating was to see all looks fine in the test program and after sync it was mangled in the address book. So I did some testing on the parseVCard until I found out it works thebest when passing c_str(). I tried all options that were highlighted in the thread here or in syncevolution.

Thanks for explanation on the above. I think it is pity I do not have more time to track it further, but I still do not understand when you say it is equivalent to .... and the code will fail if encoding is set.

In vCard 2.1 you have the option to specify charset+encoding In vCard 3.0 it looks like it is default to UTF and I've not tested charset+encoding

The code I produced operates based on what is coming from syncevolution and offers vCard 3.0, so we receive automatically UTF input. Perhaps I should test with vCard 2.1. Or better someone else, but this is good point to make a todo note.

thanks again, appreciated regards

Fat-Zer

28 Mar 28 Mar

6:47 a.m.

New subject: [trinity-devel] Re: Re: Re: Re: TQString::fromUtf8 vs TQString::fromLatin1 [possible bug in parseVCard]

2016-03-28 1:00 GMT+03:00 deloptes deloptes@gmail.com:

...

Fat-Zer wrote:

...
2016-03-27 15:07 GMT+03:00 deloptes deloptes@gmail.com:

...
Fat-Zer wrote:

...
Yes, it is how it should be, a fix for tdelibs. But note if vcard will have a field in a different encoding (e.g. "charset" parameter is set) the code will likely fail... To fix it completely the whole api changes are required (pass TQByteArray to the parser rather than TQString).

You mean charset different than UTF-8? But this "else" refers to the case when no charset+encoding is specified, so it should really default to UTF (IMO)

This "else" refers to case than no "encoding" specified. The charset is handled later. It seems will be handled correct if bote encoding and charset are specified, but it will be wrong if only charset is set. Here is a general mistake: the QString is used in those functions as a container for a sequence of bytes with undefined charset, which is generally wrong. It works because of QString internally consist of two independent data sets: zero-terminated const char* for fast return with ascii() or latin1() (in case the string is latin1) and a QChar[]. But this is a very malicious practice...

So to make the code work in it's current state an obscure and unintuitive code is required: converter.parseVCard( TQString::fromLatin1(str.utf8()) );

The correct solution is to change the API so parseVCard would accept a QByteArray rather than a QString. Also note that it was started during kde times: note KABC_VCARD_ENCODING_FIX ifdefs in tdepim...

...

...
Here you implicitly use QString (const char*) which is an equivalent to QString::fromAscii (), which is equivalent of fromLatin1 () as far as you don't set QTextCodec::codecForCStrings(). So the code will likely fail if it will have some other encoding.

You mean charset different than UTF-8?

I'm not sure because I observed some strange behavior - just played around with the test code until it worked. The most frustrating was to see all looks fine in the test program and after sync it was mangled in the address book. So I did some testing on the parseVCard until I found out it works thebest when passing c_str(). I tried all options that were highlighted in the thread here or in syncevolution.

Thanks for explanation on the above. I think it is pity I do not have more time to track it further, but I still do not understand when you say it is equivalent to .... and the code will fail if encoding is set.

"Equivalent" here means that the following code will have exactly the same results: converter.parseVCard(item.c_str()); converter.parseVCard(TQString(item.c_str())); converter.parseVCard(TQString::fromAscii(item.c_str())); // if you haven't set TextCodec::codecForCStrings() explicitly converter.parseVCard(TQString::fromLatin1(item.c_str()));

...

In vCard 2.1 you have the option to specify charset+encoding In vCard 3.0 it looks like it is default to UTF and I've not tested charset+encoding

The code I produced operates based on what is coming from syncevolution and offers vCard 3.0, so we receive automatically UTF input. Perhaps I should test with vCard 2.1. Or better someone else, but this is good point to make a todo note.

2016-03-28 1:09 GMT+03:00 deloptes deloptes@gmail.com:

...

BTW did you raise a bug to fix this?

No, I haven't...

deloptes

27 Mar 27 Mar

10:09 p.m.

New subject: TQString::fromUtf8 vs TQString::fromLatin1 [possible bug in parseVCard]

Fat-Zer wrote:

...

diff --git a/tdeabc/vcardparser/vcardparser.cpp b/tdeabc/vcardparser/vcardparser.cpp index 7ac07ce..db33263 100644 --- a/tdeabc/vcardparser/vcardparser.cpp +++ b/tdeabc/vcardparser/vcardparser.cpp @@ -152,7 +152,7 @@ VCard::List VCardParser::parseVCards( const TQString& text ) KCodecs::quotedPrintableDecode( input, output ); } } else {

output = TQCString(value.latin1());

output = TQCString(value.utf8());

}

if ( params.findIndex( "charset" ) != -1 ) { // have to convert the data

Note that VCardParser::parseVCards() is generally encoding-unsafe...

BTW did you raise a bug to fix this?

regards

3385

Age (days ago)

3391

Last active (days ago)

devels@trinitydesktop.org

20 comments

3 participants

tags (0)

participants (3)

deloptes
Fat-Zer
Michele Calgaro