Elxis CMS Forum

Support => Elxis 4.x/5.x DEV => Topic started by: maple on June 10, 2013, 13:44:29

Title: charset utf-8 - problems for some countries.
Post by: maple on June 10, 2013, 13:44:29

If Elxis is utf-8, why using the forms don't parse correctly some characters like apostrophe, middle dot and so on...? Neither caption field from the image plugin.

Thanks
Title: Re: charset utf-8 - problems for some countries.
Post by: datahell on June 10, 2013, 20:41:07
It is not a matter of utf-8. Elxis in some cases filters out some characters that may cause problems or are invalid for the type of the input element.
For instance a middle dot or an apostrophe will be removed in a url input field. The same for a seo title or an email field.
You must tell me the exact form element to reply you for that element in particular.
Title: Re: charset utf-8 - problems for some countries.
Post by: maple on June 11, 2013, 12:03:49
It is not a matter of utf-8. Elxis in some cases filters out some characters that may cause problems or are invalid for the type of the input element.
For instance a middle dot or an apostrophe will be removed in a url input field. The same for a seo title or an email field.
You must tell me the exact form element to reply you for that element in particular.


datahell:

As you ask me, the exact forms are the contact form plugin (message field) and the send-to-friend (when I translate the language config files), also the image plugin (caption field). Also in some Language files configuration like ones that implicate popups and others. Also in title phrase that can see in the top of browsers. Perhaps php htmlentities() or php htmlspecialchars() functions could resolve this. This is annoying for example for Catalan, Andorra (country that also speak catalan), Italian and French languages. Another issue, I can see that in french config language files there're inside cedilla, grave and acute like "commençant", "résultat", "trové", "dernière"... I need to use also this in Catalan (new language for Elxis ) but when I use this cedillas, graves and acutes like in french, doesn't work properly, perhaps someting is missing in core files (I just added the catalan inside the langdb.php but not resolve the problem only doing this). In Catalan we have also diaeresis ü. We have hundreds of words with apostrophe and middle dot. In my case, in Catalan we can't do something like: in english "it's" to "it is" there're no way to do this in our language. A few examples list:

APOSTROPHE '

l'article

s'apostrofa

s'ha

l'espasa

l'artista

l'Anna

l'única

l'Agustí

l'Oriol

m'està

s'integra

l'obté

d'herba

te'ls


and so on...



MIDDLE DOT ·

al·literació

il·limitat

il·lògic

síl·laba

cal·ligrafia

cal·ligrama

fil·loxera

mil·lenari

mil·lèssim

gal·licisme

gàl·lia

col·leccions

al·lèrgia

al·licient

al·ludir

al·lusió

al·lucinar

al·lot

col·legi

col·lapse

satèl·lit

and so on...


Thanks






Title: Re: charset utf-8 - problems for some countries.
Post by: datahell on June 11, 2013, 22:03:28
In your language files you can add whatever you like, mid-dots included. The filters I wrote about are only for input data, they don't affect your language files!
Greek language has also many extra needs -even more than yours- but works like a charm. We have also "diaeresis" (dialitika) in Greek, for example: άϋλα, αϊτός, etc.
Ancient Greek with even much more complex accentuation works also excellent in Elxis (like τῷ, οἰκοδομεῖν, ἦν, ἤκουσαν, etc...)

So, are you sure you do it the proper way? Language files must be saved as UTF-8 without BOM.
Maybe you use a wrong encoding or wrong font in your CSS. Not all fonts supports all characters.

There is already support for Catalan language in Elxis! You don't have to modify the langdb.php file!
The Elxis identifier for Catalan language is "ca".

'ca' => array('LANGUAGE' => 'ca', 'REGION' => 'AD', 'DIR' => 'ltr', 'NAME' => 'català', 'NAME_ENG' => 'Catalan'),
Title: Re: charset utf-8 - problems for some countries.
Post by: maple on June 13, 2013, 10:24:42
Hi datahell,

thanks a lot for your response. I see now that there're no problems with language files. Only need to resolve the input data. Surely there is a solution out there. Could be interesting to implementation in the future revisions.

thanks