Home  /  RSS  /  RSS Comments  /  Project Site  /  Enter

Localization in 2009 and broken standard of C++.

08/10/09, by artyom ; Posted in: Unicode and Localization; 3 comments

There are many goodies in upcoming standard C++0x. Both, core language and standard libraries were significantly improved.

However, there is one important part of the library that remains broken – localization.

Let’s write a simple program that prints number to file in C++:

#include <iostream>
#include <fstream>
#include <locale>


int main()
{
        // Set global locale to system default;
        std::locale::global(std::locale(""));

        // open file "number.txt"
        std::ofstream number("number.txt");

        // write a number to file and close it
        number<<13456<<std::endl;
}

And in C:

#include <stdio.h>
#include <locale.h>

int main()
{
        setlocale(LC_ALL,"");
        FILE *f=fopen("number.txt","w");
        fprintf(f,"%'f\n",13456);
        fclose(f);
        return 0;
}

Lets run both programs with en_US.UTF-8 locale and observe the following number in the output file:

13,456

Now lets run this program with Russian locale LC_ALL=ru_RU.UTF-8 ./a.out. C version gives us as expected:

13 456

When C++ version produces:

13<?>456

Incorrect UTF-8 output text! What happens? What is the difference between C library and C++ library that use same locale database?

According to the locale, the thousands separator in Russian is U+2002 – EN SPACE, the codepoint that requires more then one byte in UTF-8 encoding. But let’s take a look on C++ numbers formatting provider: std::numpunct. We can see that member functions thousands_sep returns single character. When in C locale definition, thousands separator represented as a string, so there is no limitation of single character as in C++ standard class.

This was just a simple and easily reproducible problems with C++ standard locale facets. There much more:

Its very frustrating that in 2009 such annoying, easily reproducible bugs exist and make localization facilities totally useless in certain locales.

All the work I had recently done with support of localization in CppCMS framework had convinced me in important decision — ICU would be mandatory dependency and provide most of localization facilities by default, because native C++ localization is no-go…

The question is: "Would C++0x committee revisit localization support in C++0x?"

Comments

nenTi, at 10:03 25/10/09

I love the idea of your project :) the blog was a litle slow last days but I'm soooo cool using this framework :)

artyom, at 21:16 25/10/09

I'm glad to hear ;)

Viet, at 12:32 27/10/09

I'm very looking forward to growth and development of this project. It’s awesome to write web app/services in C++ ;)

Keep up the good work!

You can send a trackback to following url:

Add Comment:

Name:
E-Mail:
Site:

You can write your messages using Markdown syntax.

You must enable JavaScript in order to post comments.

Pages

Categories

Development

Powered By

3rd Party