RE: Java - ASCII2Bin - First Draft |
||
![]() Ret. Gen D-Cypell I want to make a couple of points on this particular function because it is a little trickier than the solution provided in this example and subsequent revisions in the thread. One of the stumbling blocks here is that Java's (and presumably C# and most other modern languages) char type consists of 2 bytes not 1. This is to provide support for UTF-16, or extended unicode. A quick description of character encoding... (if you already know this stuff, please dont get offended, someone else may not...) Obviously, the disk, memory etc is only capable of containing binary digits so to store text, we need a coversion table... enter ASCII... ASCII is a coversation table for numbers to letters and only uses 7-bits. This is why you are needing the prefix of "0" to bring it up to an even byte. 7-Bits were choosen because (when unsigned) we get the range 0-127 which is more than enough for all (latin alphabet) lowercase letters, uppercase letters, digits, punctuation and specials (like space and carriage return). Problem is, some characters in use in non-english speaking countries (and even some in the country that invented the damn language *cough*pound sign*cough*) were not included in the standard 0-127 ASCII table. Enter UTF-8... UTF-8 uses the whole 8-bit byte. Now we have 0-255 and space to include things like accented characters that appears in languages such as french and german. Its still not enough... what about the russians and the orientals and the other countries that dont use the latin alphabet. Guess what... UTF-16... two whole bytes and 0-131071 possible character codes. That is enough for all earth languages plus we can accomadate E.T., Predator and Darth Vader. Fortunately, ASCII is a subset of UTF-8 (i.e. any UTF-8 value that starts with a 0 is the same as the 7-bit ASCII), and (to my knowledge) UTF-8 is a subset of UTF-16 (if the most significant byte of UTF-16 is all 0 then its value is the same as UTF-8). So... before writing the code you have to decide what you want to do when UTF-8 is presented and even UTF-16. Its probably wise to have a fixed binary number length per character. So that if the input data is ASCII or UTF-8 you always get 8 binary digits per characters and if its UTF-16 you always get 16, but the java method toBinaryString will not give you this. That method will strip off any leading 0's so the strategy of just adding a 0 wont work. What you will have to do is check the length of the returned binary string and subtract this from how long you want the string to be (8 for UTF-8 or smaller, 16 for UTF-16), this will give you the number of 0's to append to the beginning. There is probably a better way to do this using the standard Java class library but i dont have that info in my head... I will look this up when required. On 2005-02-17 20:09:14, Socrat wrote >Ok, i managed to find how to change a character to its binary form with this code. The problem is , when i entered a normal character ( a or b or ^*&*( ;' ) its fine, it give the exact same result as the php SNEAK. but then i tryed with some ASCII like °╞╧◘↔¼èá4↕ÿ and that messed up because on the php SNEAK version, the characters are converted back to #6365 or whatever their value is before they are converted to binary. But with this code they get converted for their real binary value, which is not what we want since they can be 15 chars long (1100101011110). So anyone knows how to modify it to check if its a "special" char and if it is convert to the #5646 value? > > > > public static String string2Bin(String str){ > > String binaryString = new String(); > for(int i=0;i<str.length();i++){ > int decimal = str.charAt(i); > String binary = Integer.toBinaryString(decimal); > binaryString += " 0"+binary; > } > return binaryString; > } > > Replies:
|
||
| CyberArmy::Forum v0.6 Generated In 0.02386 seconds |