Thursday, 3 March 2011

Listing Charactersets

One of the seemingly most neglected areas of coding is the mystic surrounding character sets and I guess that the reason for this is that no-one really cares too much about it. Several European wide systems I’ve worked on did specify a character set, and that was UTF8 and I suspect that this was mainly because it’s a default standard that’s big enough to hold every character for every European language plus lots of others. But, there are lots of character sets, so do we need them all? I guess not, but I think that for historical reasons we’re stuck with them...


Being a largely ignored area of Java, most developers seem to forget that Java will run with its default character set as defined by the operating system. The easiest way to get Java to override the default char set is to specify the charset you require on the Java command line using the -Dfile.encoding argument. For example to specify UTF8 you need to start Java with the following command line:

java -Dfile.encoding=UTF8 <Other Args Go Here>

If you want to know just how many character sets there are on your system, run the following code:

public class CharacterSets {

 
/**
   * Main method that runs the test.
   *
   *
@param args
   *            Not used
   */
 
public static void main(String[] args) {

   
System.out.println("Displaying Available Character Sets");

   
// Get hold of the character sets available
   
SortedMap<String, Charset> charSets = Charset.availableCharsets();

    Iterator<String> it = charSets.keySet
().iterator();

   
while (it.hasNext()) {

     
String name = it.next();

      System.out.print
("Name: " + name + " ");

      Iterator<String> aliases = charSets.get
(name).aliases().iterator();

     
if (aliases.hasNext()) {
       
System.out.print(": ");
     
}

     
while (aliases.hasNext()) {

       
System.out.print(aliases.next());

       
if (aliases.hasNext())
         
System.out.print(", ");
     
}

     
System.out.println();
   
}

   
System.out.println("End of Charset List");
    System.out.println
("The current charset is: " + System.getProperty("file.encoding"));

 
}

}

On my little netbook, it turns out that there are 162 character sets with the default being MacRoman.

No comments: