I’ve created a new language (slovak) by copying the english language files into a new folder called “sk” in the “oxid\out\basic” folder. Everything worked beautifully except some slovak characters (like ‘č’) couldn’t be entered into the lang.php file, because it was in ANSI encoding. I tried to change the encoding of the file to UTF8. In this case the page displayed the characters ok, but I got an error when I tried to put something into the cart:
Warning: Cannot modify header information - headers already sent by (output started at C:\xampp\htdocs\oxid\out\basic\sk\lang.php:1) in C:\xampp\htdocs\oxid\core\oxutils.php on line 792
I figured out that the UTF8 BOM was causing the problem. So I converted the file to UTF8 without the BOM part. Now I don’t get the error but my strings are all messed up. I think the browser doesn’t recognize that the strings are in UTF8 and tries to read them as ANSI.
So here I have 2 solutions to my problem but each has a bug. Please advise me on how I could bypass this problem. Thank you!
actually, the shop works with latin encoding, UTF-8 is not supportet yet (but we already work on it).
One solution could be to enter your special language characters as entities into the lang file. Another one could be to adopt the charset entry in your _header.tpl file. See how it works when you change it from iso-8859-1 to iso-8859-2.
I appreciate you feed back!
Marco Steinhäuser
Community Operator
OXID eSales AG
I’ve found the sollution: it seems that Oxid does support utf-8 after all
I’ve converted the lang.php file to utf-8 without BOM and found there is a charset id in the $aLang array. I changed this to it’s value to ‘utf-8’ and it works fine. I hope I wont find any places in the program where this will be causing problems. Thanks for your help!
ps. maybe you can now say on your site that Oxid does support utf-8
Unfortunately this has only fixed the hardcoded strings. Strings entered into the database (like shop name) are wrong. Even characters that can be represented in ASCII (á) show up as ? signs with a black backround. If I switch back the encoding to latin1 then the database strings are fine (as long as they don’t contain any special chars) but of course the hardcoded strings become corrupted.
I tried to do this with html entities and it works (I’ll need some special text editor so I can create the translation but thats ok), but if I put entities inside strings stored in database (I tried shop name) they show up on the webpage like ‘Č’. Somehow the browser doesn’t convert them. I see that the admin interface automatically converts these characters for me, which is very nice, but then they show up wrong on the page. Any thoughts what’s causing this?
I’m using the community edition so the editor is not WYSIWYG. But still if I enter the shop name in the textfield it changes the Č character to Č. The only problem is, that when I see that string in the shop the browser shows the entity code instead of the character.
I’ve looked into header.tpl but only found this reference (as in all other tpl files):
I see that charset refers to a constant. I couldn’t find that constant defined anywhere in the program, and didn’t want to change it in only a single file. But when I change the charset value in the aLang array (in lang.php) all webpages will be using that charset. So I figured that that is the place where the [{$charset}] constant is defined for the tpl files.
I’ll try to explain again. I managed to set the codepage, but that didn’t help because it fixed some things and broke others. HTML entities do help and dont cause problems, but if I enter HTML entities into the shop name field those entities wont be converted to their respective characters on the page.
Here is what’s appearing on my page:
“A customer account with #268;orálký has advantages like:”
I ommited the & sign from the entity, because in the forum it does appear correctly.
Dont have to much time to explain the details too much since we roll out the RU Version now. But in general gamble with the below advises. Cyrillic is w/o any Ansi Character so we had to learn a lot, but it worked out finaly.
a) don’t use UTF -8 as encoding for the “lang.php” it has some critical problems to overcome, better use the appropriate ISO or Windows encodings first. Don’t forget to set the charset= parameter as well in the ./out/admin and ./out/basic folders…
b) use a good editor like notepad++ which can re-encode charsets - and change and save the langfiles in the appropriate encoding (same if you edit e.g. the SQL Setup Files) - this ones you may save then as UTF-8 better then (with out “BOM” Byte Order Mark) since this files are anyhow stored in UTF in the DB.
c) edit the products in the admin in the same language and codepage as the you use in the basic settings. (not UTF -8) This avoids the cross convertion of the slovak characters in the " #268;" style.
d) (in the ADMIN ) enhance the SEO Translation Table with a bunch of letters or even the complete Aphabet to replace the specialal Characters in the SEO URLs with e.g. englisch transliteration ones.
Sorry, if this is noticed somewhere else, didn’t find a roadmap or something and yes i know, it’s finished when it’s finished , but nevertheless: Can you tell us already when full UTF8 support will be available?
I’ve finally figured out a good way to do this, but have run into a problem which I think is a bug in the system.
I (actually my girlfriend Erika, who will be using the shop) have done the lang.php translation using html entities and it works fine. We used an editor called Babelfish which can automatically encode html entites.
The problem I ran into is that when I enter a hyphenated character in the administrator it converts it into a html entity while saving into the database. That’s fine, because this way I don’t need to worry about the encoding in the database tables. The problem is when oxid reads these strings from the database it converts every & sign into its html entity (& amp;). Because every html entity saved in the database begins with an & sign they all become corrupted on the page.
I get strings like this:
Ako nakupovať? As you can see the & sign at the start of the ť character is converted into a html entity and the browser cannot process it. (If you cannot see my point becouse the browser converted the entities, pls look at the page source, or at the page directly: http://koralky.getmyip.com/eshop)
I think this may be a bug, because there is no point in converting the charactes in the admin then converting some after they are coming out of the database. I would suggest disabling the conversion while reading the database, because it is not needed as only the admin can write the db so no special characters may get into it (becauce they will be converted at write time).
What do you think? Also can I somehow change this behaviour in the php code on my own? I fear this may be pretty low level stuff…
I don’t think this is related to me using the ISO-8859-15 character set. The same thing happens if I use any character set (I’ve tried a few of them).
The problem is the same: when the program reads text from the database it converts all & signs to a html entity, thus corrupting all html entities stored in that text, because html entities begin with a & sign. Please have a look at my problem and at least tell me where this conversion takes place, so I can have a shot at disabling it. I just want that oxid puts my strings on the page just as they are in the database.
I can’t imagine that the character is correct in the database and will be converted in the moment of the output… Pls. make sure which character is in the database.
Did you try to find an alternative entity in the charset?
Could you wait till 4.1.0 which will come out on April 8th?
As for putting other entity: this happens with all entities, I can’t replace them all (I also would have to replace them for other entities that would get similar corruption)
And unfortunately I can’t really wait, I promised that the shop would be complete in a week from now. Also there are lots of products already entered into the database thats’ names and descriptions contain entities, I would not want to tell the people who are working on that that they have to enter them again.
And the last thing is, that this seems such a small and trivial matter, but it’s causing me great trouble (look how the site looks now), so I really want it resolved even if it means a little hacking into the code.