C++ Character Encoding Issues in Mac
I'm developing a Cross Platform File Sync Application.In Mac OS X to get File System events, i read from /dev/fsevents system buffer and send it over unix sockets to another app. I'm not doing any character encoding until now.
This is my Print in app which recieves the FS Events :
######## File Name ::: ébê123.rtf
######## File Name in WCHAR ::: ébê123.rtf
code which i used to convert char to wchar
int wCharLen1 = mbstowcs(NULL, fName, 0); // fName is the char which i recieved through unix socket
WCHAR* fileName = new WCHAR[wCharLen1 + 1];
memset(fileName,'\0',(wCharLen1 + 1) *sizeof(WCHAR));
mbstowcs(fileName, fName, wCharLen1);
I'm sending the file name to my Server and have printed the file name before DB Insert, which prints the exact file name :
######## Recieved File Name ::: ébê123.rtf
But in DB it inserts the file Name as 'ébê123.rtf'
I'm using the same code in Windows except i don't have to do wchar conversion, because the Windows Directory Monitoring itself gives the file name in wchar. I don't have any issues with the windows client and the file name is inserted correctly in the database as ' ébê123.rtf '. I suspect that i'm missing some encoding before converting char to wchar in Mac. I have tried encoding to UTF-8 , but the file Names have changed to
######### FileName ::: ébê123.rtf after Encoding TO UTF-8 ::: ébeÌ‚123.rtf [MAC]
Another Case :
When uploading files from Windows with the above file name 'ébê123.rtf' , the file gets downloaded in Mac with the correct file name. But when the file is uploaded from Mac , then the file name seems to be downloaded correctly in Windows, but as soon as i change anything in that file, the file name is sent as 'e%cc%81be%cc%82123.rtf' to Server,then to Mac. But if i originally create the file 'ébê123.rtf' in Windows, then it is sent correctly.
I suspect i have to encode the file name in mac to UTF-8 string before converting char to wchar in Mac. But i have tried some open source code like the one below :
void latin1_to_utf8(unsigned char *in, unsigned char *out)
{
while (*in)
{
if (*in<128)
{
*out++=*in++;
}
else
{
*out++=0xc2+(*in>0xbf);
*out++=(*in++&0x3f)+0x80;
}
}
*out = '\0';
}
And it didn't worked. Now i'm looking for a library or some code to convert the string to utf-8 string in C++ in Mac.Of Course this function works when the file name is recieved from Windows to Mac.
I have tried NFD to NFC, using iconv. This is the Code which i have used :
string MacEncode(string _strToEncode)
{
iconv_t convDes; /* conversion descriptor */
convDes = convDes = iconv_open("UTF-8-MAC", "UTF-8");
if (convDes == (iconv_t)(-1))
{
cout << "Cannot open iconv converter for utf-8-mac to utf-8 \n";
return "";
}
char* inpStr =(char*) _strToEncode.c_str();
size_t inpLen = strlen(inpStr);
size_t outLen = (2*inpLen)+1;
char* outBuf = new char[outLen];
memset(outBuf,'\0',outLen);
char* outBuffer = outBuf;
int retCode;
retCode = iconv(convDes, &inpStr, &inpLen, &outBuffer, &outLen);
if(retCode == -1)
{
return "";
}
string outputbuf = outBuf;
return outputbuf;
}
File Name : äåéööéåä123.txt
In DB : äaÌŠeÌööeÌaÌŠä123.rtf
But the issue persists. Any ideas ..?
Comments
http://stackoverflow.com/questions/13592598/difference-and-conversions-between-wchar-t-for-linux-and-for-windows
w_char is different on each platform. You might be able to format the data on the server though e.g take the received string, encode it on the server, store it like that in the DB and send it back out in a standard format.