CyberArmy University | Open Source Institute | CyberArmy Intelligence & Security | CyberArmy Services & Projects

How SNEAK deals w/ unicode . . .


[Replies] [Reply] [View by Thread] [Help]
[Back To SNEAK Development Forum]

Posted by Cpt SAJChurchey On 2005-02-19 16:57:24
In Reply to RE: Java - ASCII2Bin - First Draft Posted by Ret. Gen D-Cypell On 2005-02-18 14:32:32

Cpt
Cpt SAJChurchey


Well,

From what I can gather, the PHP version of sneak will convert UTF-8 into binary if it is put in as an "ASCII" value. As far as UTF-16 is concerned, it substitutes the character in the text field w/ what I guess is an HTML code:

ÀԳՉ is changed to ÀԳՉ

Չ => Չ
Գ => Գ

Then, SNEAK analyzes each of the seven characters in the substituted code and adds them to the binary string that is reported.

ÀԳՉ => 11000000 00100110 00100011 00110001 00110011 00110011 00110001 00111011 00100110 00100011 00110001 00110011 00110101 00110011 00111011

This is incorrect b/c UTF-16 is not made up of 7 bytes but 2, but then again this is an ASCII2Bin function not a UTF162Bin function. So we need to handle this special case in the same manner that SNEAK does, or should we improve the conversion capabilities if we can?


Cpt SAJChurchey
C/O of
Editorial
OSI Staff edit0r
OSI Feedback Representative




Replies:


Guest:
Subject:
Message:
Signature:
Optional Image Link:
http://

CyberArmy::Forum v0.6
Generated In 0.02294 seconds


About Us | Privacy Policy | Mission Statement | Help