C++ Character Encoding Issues in Mac

Posted:
in General Discussion edited January 2014


I'm developing a Cross Platform File Sync Application.In Mac OS X to get File System events, i read from /dev/fsevents system buffer and send it over unix sockets to another app. I'm not doing any character encoding until now.


 


 


This is my Print in app which recieves the FS Events :


########          File Name ::: ébê123.rtf


######## File Name in WCHAR ::: ébê123.rtf


 


code which i used to convert char to wchar


 


int wCharLen1 = mbstowcs(NULL, fName, 0);          // fName is the char which i recieved through unix socket


WCHAR* fileName = new WCHAR[wCharLen1 + 1];


memset(fileName,'\0',(wCharLen1 + 1) *sizeof(WCHAR));


mbstowcs(fileName, fName, wCharLen1);


 


I'm sending the file name to my Server and have printed the file name before DB Insert, which prints the exact file name :


  ########          Recieved File Name ::: ébê123.rtf


    But in DB it inserts the file Name as 'ébeÌ‚123.rtf'


 


I'm using the same code in Windows except i don't have to do wchar conversion, because the Windows Directory Monitoring itself gives the file name in wchar. I don't have any issues with the windows client and the file name is inserted correctly in the database as ' ébê123.rtf '. I suspect that i'm missing some encoding before converting char to wchar in Mac. I have tried encoding to UTF-8 , but the file Names have changed to


######### FileName ::: ébê123.rtf  after Encoding TO UTF-8 :::    ébeÌ‚123.rtf   [MAC]


 


 


Another Case :


  When uploading files from Windows with the above file name 'ébê123.rtf' , the file gets downloaded in Mac with the correct file name. But when the file is uploaded from Mac , then the file name seems to be downloaded correctly in Windows, but as soon as i change anything in that file, the file name is sent as 'e%cc%81be%cc%82123.rtf' to Server,then to Mac. But if i originally create the file 'ébê123.rtf' in Windows, then it is sent correctly.


 


 


I suspect i have to encode the file name in mac to UTF-8 string before converting char to wchar in Mac. But i have tried some open source code like the one below :


  void latin1_to_utf8(unsigned char *in, unsigned char *out)


  {


    while (*in)


    {


      if (*in<128)


      {


        *out++=*in++;


      }


      else


      {


        *out++=0xc2+(*in>0xbf);


        *out++=(*in++&0x3f)+0x80;


      }


    }


    *out = '\0';


  }


 


And it didn't worked. Now i'm looking for a library or some code to convert the string to utf-8 string in C++ in Mac.Of Course this function works when the file name is recieved from Windows to Mac.


 


I have tried NFD to NFC, using iconv. This is the Code which i have used :


 


string MacEncode(string _strToEncode)


 


{


    iconv_t          convDes;     /* conversion descriptor          */


    convDes = convDes = iconv_open("UTF-8-MAC", "UTF-8");


    if (convDes == (iconv_t)(-1))


    {


      cout << "Cannot open iconv converter for utf-8-mac to utf-8 \n";


      return "";


    }


 


 


    char* inpStr =(char*) _strToEncode.c_str();


 


 


    size_t inpLen = strlen(inpStr);


    size_t outLen = (2*inpLen)+1;


 


 


    char* outBuf = new char[outLen];


    memset(outBuf,'\0',outLen);


 


 


    char* outBuffer = outBuf;


    int retCode;


    retCode = iconv(convDes, &inpStr, &inpLen, &outBuffer, &outLen);


    if(retCode == -1)


    {


      return "";


    }


    string outputbuf = outBuf;


    return outputbuf;


}


 


 


File Name : äåéööéåä123.txt


 


In DB : äåéööéåä123.rtf


 


 


 



 


 


But the issue persists. Any ideas ..?


 

Comments

Sign In or Register to comment.