An Annotation object is used as a wrapper for a text attribute value if the attribute has annotation characteristics. These characteristics are:
The text range that the attribute is applied to is critical to the semantics of the range. That means, the attribute cannot be applied to subranges of the text range that it applies to, and, if two adjacent text ranges have the same value for this attribute, the attribute still cannot be applied to the combined range as a whole with this value. The attribute or its value usually do no longer apply if the underlying text is changed.
An example is grammatical information attached to a sentence: For the previous sentence, you can say that "an example" is the subject, but you cannot say the same about "an", "example", or "exam". When the text is changed, the grammatical information typically becomes invalid. Another example is Japanese reading information (yomi).
Wrapping the attribute value into an Annotation object guarantees that adjacent text runs don't get merged even if the attribute values are equal, and indicates to text containers that the attribute should be discarded if the underlying text is modified.
An Annotation object is used as a wrapper for a text attribute value if the attribute has annotation characteristics. These characteristics are: The text range that the attribute is applied to is critical to the semantics of the range. That means, the attribute cannot be applied to subranges of the text range that it applies to, and, if two adjacent text ranges have the same value for this attribute, the attribute still cannot be applied to the combined range as a whole with this value. The attribute or its value usually do no longer apply if the underlying text is changed. An example is grammatical information attached to a sentence: For the previous sentence, you can say that "an example" is the subject, but you cannot say the same about "an", "example", or "exam". When the text is changed, the grammatical information typically becomes invalid. Another example is Japanese reading information (yomi). Wrapping the attribute value into an Annotation object guarantees that adjacent text runs don't get merged even if the attribute values are equal, and indicates to text containers that the attribute should be discarded if the underlying text is modified.
An AttributedCharacterIterator allows iteration through both text and related attribute information.
An attribute is a key/value pair, identified by the key. No two attributes on a given character can have the same key.
The values for an attribute are immutable, or must not be mutated by clients or storage. They are always passed by reference, and not cloned.
A run with respect to an attribute is a maximum text range for which:
the attribute is undefined or null for the entire range, or the attribute value is defined and has the same non-null value for the entire range.
A run with respect to a set of attributes is a maximum text range for which this condition is met for each member attribute.
When getting a run with no explicit attributes specified (i.e., calling getRunStart() and getRunLimit()), any contiguous text segments having the same attributes (the same set of attribute/value pairs) are treated as separate runs if the attributes have been given to those text segments separately.
The returned indexes are limited to the range of the iterator.
The returned attribute information is limited to runs that contain the current character.
Attribute keys are instances of AttributedCharacterIterator.Attribute and its subclasses, such as TextAttribute.
An AttributedCharacterIterator allows iteration through both text and related attribute information. An attribute is a key/value pair, identified by the key. No two attributes on a given character can have the same key. The values for an attribute are immutable, or must not be mutated by clients or storage. They are always passed by reference, and not cloned. A run with respect to an attribute is a maximum text range for which: the attribute is undefined or null for the entire range, or the attribute value is defined and has the same non-null value for the entire range. A run with respect to a set of attributes is a maximum text range for which this condition is met for each member attribute. When getting a run with no explicit attributes specified (i.e., calling getRunStart() and getRunLimit()), any contiguous text segments having the same attributes (the same set of attribute/value pairs) are treated as separate runs if the attributes have been given to those text segments separately. The returned indexes are limited to the range of the iterator. The returned attribute information is limited to runs that contain the current character. Attribute keys are instances of AttributedCharacterIterator.Attribute and its subclasses, such as TextAttribute.
Defines attribute keys that are used to identify text attributes. These keys are used in AttributedCharacterIterator and AttributedString.
Defines attribute keys that are used to identify text attributes. These keys are used in AttributedCharacterIterator and AttributedString.
An AttributedString holds text and related attribute information. It may be used as the actual data storage in some cases where a text reader wants to access attributed text through the AttributedCharacterIterator interface.
An attribute is a key/value pair, identified by the key. No two attributes on a given character can have the same key.
The values for an attribute are immutable, or must not be mutated by clients or storage. They are always passed by reference, and not cloned.
An AttributedString holds text and related attribute information. It may be used as the actual data storage in some cases where a text reader wants to access attributed text through the AttributedCharacterIterator interface. An attribute is a key/value pair, identified by the key. No two attributes on a given character can have the same key. The values for an attribute are immutable, or must not be mutated by clients or storage. They are always passed by reference, and not cloned.
This class implements the Unicode Bidirectional Algorithm.
A Bidi object provides information on the bidirectional reordering of the text used to create it. This is required, for example, to properly display Arabic or Hebrew text. These languages are inherently mixed directional, as they order numbers from left-to-right while ordering most other text from right-to-left.
Once created, a Bidi object can be queried to see if the text it represents is all left-to-right or all right-to-left. Such objects are very lightweight and this text is relatively easy to process.
If there are multiple runs of text, information about the runs can be accessed by indexing to get the start, limit, and level of a run. The level represents both the direction and the 'nesting level' of a directional run. Odd levels are right-to-left, while even levels are left-to-right. So for example level 0 represents left-to-right text, while level 1 represents right-to-left text, and level 2 represents left-to-right text embedded in a right-to-left run.
This class implements the Unicode Bidirectional Algorithm. A Bidi object provides information on the bidirectional reordering of the text used to create it. This is required, for example, to properly display Arabic or Hebrew text. These languages are inherently mixed directional, as they order numbers from left-to-right while ordering most other text from right-to-left. Once created, a Bidi object can be queried to see if the text it represents is all left-to-right or all right-to-left. Such objects are very lightweight and this text is relatively easy to process. If there are multiple runs of text, information about the runs can be accessed by indexing to get the start, limit, and level of a run. The level represents both the direction and the 'nesting level' of a directional run. Odd levels are right-to-left, while even levels are left-to-right. So for example level 0 represents left-to-right text, while level 1 represents right-to-left text, and level 2 represents left-to-right text embedded in a right-to-left run.
The BreakIterator class implements methods for finding the location of boundaries in text. Instances of BreakIterator maintain a current position and scan over text returning the index of characters where boundaries occur. Internally, BreakIterator scans text using a CharacterIterator, and is thus able to scan text held by any object implementing that protocol. A StringCharacterIterator is used to scan String objects passed to setText.
You use the factory methods provided by this class to create instances of various types of break iterators. In particular, use getWordInstance, getLineInstance, getSentenceInstance, and getCharacterInstance to create BreakIterators that perform word, line, sentence, and character boundary analysis respectively. A single BreakIterator can work only on one unit (word, line, sentence, and so on). You must use a different iterator for each unit boundary analysis you wish to perform.
Line boundary analysis determines where a text string can be broken when line-wrapping. The mechanism correctly handles punctuation and hyphenated words. Actual line breaking needs to also consider the available line width and is handled by higher-level software.
Sentence boundary analysis allows selection with correct interpretation of periods within numbers and abbreviations, and trailing punctuation marks such as quotation marks and parentheses.
Word boundary analysis is used by search and replace functions, as well as within text editing applications that allow the user to select words with a double click. Word selection provides correct interpretation of punctuation marks within and following words. Characters that are not part of a word, such as symbols or punctuation marks, have word-breaks on both sides.
Character boundary analysis allows users to interact with characters as they expect to, for example, when moving the cursor through a text string. Character boundary analysis provides correct navigation through character strings, regardless of how the character is stored. The boundaries returned may be those of supplementary characters, combining character sequences, or ligature clusters. For example, an accented character might be stored as a base character and a diacritical mark. What users consider to be a character can differ between languages.
The BreakIterator instances returned by the factory methods of this class are intended for use with natural languages only, not for programming language text. It is however possible to define subclasses that tokenize a programming language.
Examples: Creating and using text boundaries:
public static void main(String args[]) { if (args.length == 1) { String stringToExamine = args[0]; //print each word in order BreakIterator boundary = BreakIterator.getWordInstance(); boundary.setText(stringToExamine); printEachForward(boundary, stringToExamine); //print each sentence in reverse order boundary = BreakIterator.getSentenceInstance(Locale.US); boundary.setText(stringToExamine); printEachBackward(boundary, stringToExamine); printFirst(boundary, stringToExamine); printLast(boundary, stringToExamine); } }
Print each element in order:
public static void printEachForward(BreakIterator boundary, String source) { int start = boundary.first(); for (int end = boundary.next(); end != BreakIterator.DONE; start = end, end = boundary.next()) { System.out.println(source.substring(start,end)); } }
Print each element in reverse order:
public static void printEachBackward(BreakIterator boundary, String source) { int end = boundary.last(); for (int start = boundary.previous(); start != BreakIterator.DONE; end = start, start = boundary.previous()) { System.out.println(source.substring(start,end)); } }
Print first element:
public static void printFirst(BreakIterator boundary, String source) { int start = boundary.first(); int end = boundary.next(); System.out.println(source.substring(start,end)); }
Print last element:
public static void printLast(BreakIterator boundary, String source) { int end = boundary.last(); int start = boundary.previous(); System.out.println(source.substring(start,end)); }
Print the element at a specified position:
public static void printAt(BreakIterator boundary, int pos, String source) { int end = boundary.following(pos); int start = boundary.previous(); System.out.println(source.substring(start,end)); }
Find the next word:
public static int nextWordStartAfter(int pos, String text) { BreakIterator wb = BreakIterator.getWordInstance(); wb.setText(text); int last = wb.following(pos); int current = wb.next(); while (current != BreakIterator.DONE) { for (int p = last; p < current; p++) { if (Character.isLetter(text.codePointAt(p))) return last; } last = current; current = wb.next(); } return BreakIterator.DONE; } (The iterator returned by BreakIterator.getWordInstance() is unique in that the break positions it returns don't represent both the start and end of the thing being iterated over. That is, a sentence-break iterator returns breaks that each represent the end of one sentence and the beginning of the next. With the word-break iterator, the characters between two boundaries might be a word, or they might be the punctuation or whitespace between two words. The above code uses a simple heuristic to determine which boundary is the beginning of a word: If the characters between this boundary and the next boundary include at least one letter (this can be an alphabetical letter, a CJK ideograph, a Hangul syllable, a Kana character, etc.), then the text between this boundary and the next is a word; otherwise, it's the material between words.)
The BreakIterator class implements methods for finding the location of boundaries in text. Instances of BreakIterator maintain a current position and scan over text returning the index of characters where boundaries occur. Internally, BreakIterator scans text using a CharacterIterator, and is thus able to scan text held by any object implementing that protocol. A StringCharacterIterator is used to scan String objects passed to setText. You use the factory methods provided by this class to create instances of various types of break iterators. In particular, use getWordInstance, getLineInstance, getSentenceInstance, and getCharacterInstance to create BreakIterators that perform word, line, sentence, and character boundary analysis respectively. A single BreakIterator can work only on one unit (word, line, sentence, and so on). You must use a different iterator for each unit boundary analysis you wish to perform. Line boundary analysis determines where a text string can be broken when line-wrapping. The mechanism correctly handles punctuation and hyphenated words. Actual line breaking needs to also consider the available line width and is handled by higher-level software. Sentence boundary analysis allows selection with correct interpretation of periods within numbers and abbreviations, and trailing punctuation marks such as quotation marks and parentheses. Word boundary analysis is used by search and replace functions, as well as within text editing applications that allow the user to select words with a double click. Word selection provides correct interpretation of punctuation marks within and following words. Characters that are not part of a word, such as symbols or punctuation marks, have word-breaks on both sides. Character boundary analysis allows users to interact with characters as they expect to, for example, when moving the cursor through a text string. Character boundary analysis provides correct navigation through character strings, regardless of how the character is stored. The boundaries returned may be those of supplementary characters, combining character sequences, or ligature clusters. For example, an accented character might be stored as a base character and a diacritical mark. What users consider to be a character can differ between languages. The BreakIterator instances returned by the factory methods of this class are intended for use with natural languages only, not for programming language text. It is however possible to define subclasses that tokenize a programming language. Examples: Creating and using text boundaries: public static void main(String args[]) { if (args.length == 1) { String stringToExamine = args[0]; //print each word in order BreakIterator boundary = BreakIterator.getWordInstance(); boundary.setText(stringToExamine); printEachForward(boundary, stringToExamine); //print each sentence in reverse order boundary = BreakIterator.getSentenceInstance(Locale.US); boundary.setText(stringToExamine); printEachBackward(boundary, stringToExamine); printFirst(boundary, stringToExamine); printLast(boundary, stringToExamine); } } Print each element in order: public static void printEachForward(BreakIterator boundary, String source) { int start = boundary.first(); for (int end = boundary.next(); end != BreakIterator.DONE; start = end, end = boundary.next()) { System.out.println(source.substring(start,end)); } } Print each element in reverse order: public static void printEachBackward(BreakIterator boundary, String source) { int end = boundary.last(); for (int start = boundary.previous(); start != BreakIterator.DONE; end = start, start = boundary.previous()) { System.out.println(source.substring(start,end)); } } Print first element: public static void printFirst(BreakIterator boundary, String source) { int start = boundary.first(); int end = boundary.next(); System.out.println(source.substring(start,end)); } Print last element: public static void printLast(BreakIterator boundary, String source) { int end = boundary.last(); int start = boundary.previous(); System.out.println(source.substring(start,end)); } Print the element at a specified position: public static void printAt(BreakIterator boundary, int pos, String source) { int end = boundary.following(pos); int start = boundary.previous(); System.out.println(source.substring(start,end)); } Find the next word: public static int nextWordStartAfter(int pos, String text) { BreakIterator wb = BreakIterator.getWordInstance(); wb.setText(text); int last = wb.following(pos); int current = wb.next(); while (current != BreakIterator.DONE) { for (int p = last; p < current; p++) { if (Character.isLetter(text.codePointAt(p))) return last; } last = current; current = wb.next(); } return BreakIterator.DONE; } (The iterator returned by BreakIterator.getWordInstance() is unique in that the break positions it returns don't represent both the start and end of the thing being iterated over. That is, a sentence-break iterator returns breaks that each represent the end of one sentence and the beginning of the next. With the word-break iterator, the characters between two boundaries might be a word, or they might be the punctuation or whitespace between two words. The above code uses a simple heuristic to determine which boundary is the beginning of a word: If the characters between this boundary and the next boundary include at least one letter (this can be an alphabetical letter, a CJK ideograph, a Hangul syllable, a Kana character, etc.), then the text between this boundary and the next is a word; otherwise, it's the material between words.)
This interface defines a protocol for bidirectional iteration over text. The iterator iterates over a bounded sequence of characters. Characters are indexed with values beginning with the value returned by getBeginIndex() and continuing through the value returned by getEndIndex()-1.
Iterators maintain a current character index, whose valid range is from getBeginIndex() to getEndIndex(); the value getEndIndex() is included to allow handling of zero-length text ranges and for historical reasons. The current index can be retrieved by calling getIndex() and set directly by calling setIndex(), first(), and last().
The methods previous() and next() are used for iteration. They return DONE if they would move outside the range from getBeginIndex() to getEndIndex() -1, signaling that the iterator has reached the end of the sequence. DONE is also returned by other methods to indicate that the current index is outside this range.
Examples:
Traverse the text from start to finish
public void traverseForward(CharacterIterator iter) { for(char c = iter.first(); c != CharacterIterator.DONE; c = iter.next()) { processChar(c); } }
Traverse the text backwards, from end to start
public void traverseBackward(CharacterIterator iter) { for(char c = iter.last(); c != CharacterIterator.DONE; c = iter.previous()) { processChar(c); } }
Traverse both forward and backward from a given position in the text. Calls to notBoundary() in this example represents some additional stopping criteria.
public void traverseOut(CharacterIterator iter, int pos) { for (char c = iter.setIndex(pos); c != CharacterIterator.DONE && notBoundary(c); c = iter.next()) { } int end = iter.getIndex(); for (char c = iter.setIndex(pos); c != CharacterIterator.DONE && notBoundary(c); c = iter.previous()) { } int start = iter.getIndex(); processSection(start, end); }
This interface defines a protocol for bidirectional iteration over text. The iterator iterates over a bounded sequence of characters. Characters are indexed with values beginning with the value returned by getBeginIndex() and continuing through the value returned by getEndIndex()-1. Iterators maintain a current character index, whose valid range is from getBeginIndex() to getEndIndex(); the value getEndIndex() is included to allow handling of zero-length text ranges and for historical reasons. The current index can be retrieved by calling getIndex() and set directly by calling setIndex(), first(), and last(). The methods previous() and next() are used for iteration. They return DONE if they would move outside the range from getBeginIndex() to getEndIndex() -1, signaling that the iterator has reached the end of the sequence. DONE is also returned by other methods to indicate that the current index is outside this range. Examples: Traverse the text from start to finish public void traverseForward(CharacterIterator iter) { for(char c = iter.first(); c != CharacterIterator.DONE; c = iter.next()) { processChar(c); } } Traverse the text backwards, from end to start public void traverseBackward(CharacterIterator iter) { for(char c = iter.last(); c != CharacterIterator.DONE; c = iter.previous()) { processChar(c); } } Traverse both forward and backward from a given position in the text. Calls to notBoundary() in this example represents some additional stopping criteria. public void traverseOut(CharacterIterator iter, int pos) { for (char c = iter.setIndex(pos); c != CharacterIterator.DONE && notBoundary(c); c = iter.next()) { } int end = iter.getIndex(); for (char c = iter.setIndex(pos); c != CharacterIterator.DONE && notBoundary(c); c = iter.previous()) { } int start = iter.getIndex(); processSection(start, end); }
A ChoiceFormat allows you to attach a format to a range of numbers. It is generally used in a MessageFormat for handling plurals. The choice is specified with an ascending list of doubles, where each item specifies a half-open interval up to the next item:
X matches j if and only if limit[j] ≤ X < limit[j+1]
If there is no match, then either the first or last index is used, depending on whether the number (X) is too low or too high. If the limit array is not in ascending order, the results of formatting will be incorrect. ChoiceFormat also accepts \u221E as equivalent to infinity(INF).
Note: ChoiceFormat differs from the other Format classes in that you create a ChoiceFormat object with a constructor (not with a getInstance style factory method). The factory methods aren't necessary because ChoiceFormat doesn't require any complex setup for a given locale. In fact, ChoiceFormat doesn't implement any locale specific behavior.
When creating a ChoiceFormat, you must specify an array of formats and an array of limits. The length of these arrays must be the same. For example,
limits = {1,2,3,4,5,6,7}
formats = {"Sun","Mon","Tue","Wed","Thur","Fri","Sat"}
limits = {0, 1, ChoiceFormat.nextDouble(1)}
formats = {"no files", "one file", "many files"}
(nextDouble can be used to get the next higher double, to
make the half-open interval.)
Here is a simple example that shows formatting and parsing:
double[] limits = {1,2,3,4,5,6,7}; String[] dayOfWeekNames = {"Sun","Mon","Tue","Wed","Thur","Fri","Sat"}; ChoiceFormat form = new ChoiceFormat(limits, dayOfWeekNames); ParsePosition status = new ParsePosition(0); for (double i = 0.0; i <= 8.0; +i) { status.setIndex(0); System.out.println(i " -> " form.format(i) " -> " form.parse(form.format(i),status)); }
Here is a more complex example, with a pattern format:
double[] filelimits = {0,1,2}; String[] filepart = {"are no files","is one file","are {2} files"}; ChoiceFormat fileform = new ChoiceFormat(filelimits, filepart); Format[] testFormats = {fileform, null, NumberFormat.getInstance()}; MessageFormat pattform = new MessageFormat("There {0} on {1}"); pattform.setFormats(testFormats); Object[] testArgs = {null, "ADisk", null}; for (int i = 0; i < 4; +i) { testArgs[0] = new Integer(i); testArgs[2] = testArgs[0]; System.out.println(pattform.format(testArgs)); }
Specifying a pattern for ChoiceFormat objects is fairly straightforward. For example:
ChoiceFormat fmt = new ChoiceFormat( "-1#is negative| 0#is zero or fraction | 1#is one |1.0<is 1+ |2#is two |2<is more than 2."); System.out.println("Formatter Pattern : " fmt.toPattern());
System.out.println("Format with -INF : " fmt.format(Double.NEGATIVE_INFINITY)); System.out.println("Format with -1.0 : " fmt.format(-1.0)); System.out.println("Format with 0 : " fmt.format(0)); System.out.println("Format with 0.9 : " fmt.format(0.9)); System.out.println("Format with 1.0 : " fmt.format(1)); System.out.println("Format with 1.5 : " fmt.format(1.5)); System.out.println("Format with 2 : " fmt.format(2)); System.out.println("Format with 2.1 : " fmt.format(2.1)); System.out.println("Format with NaN : " fmt.format(Double.NaN)); System.out.println("Format with INF : " fmt.format(Double.POSITIVE_INFINITY));
And the output result would be like the following:
Format with -INF : is negative Format with -1.0 : is negative Format with 0 : is zero or fraction Format with 0.9 : is zero or fraction Format with 1.0 : is one Format with 1.5 : is 1+ Format with 2 : is two Format with 2.1 : is more than 2. Format with NaN : is negative Format with INF : is more than 2.
Synchronization
Choice formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
A ChoiceFormat allows you to attach a format to a range of numbers. It is generally used in a MessageFormat for handling plurals. The choice is specified with an ascending list of doubles, where each item specifies a half-open interval up to the next item: X matches j if and only if limit[j] ≤ X < limit[j+1] If there is no match, then either the first or last index is used, depending on whether the number (X) is too low or too high. If the limit array is not in ascending order, the results of formatting will be incorrect. ChoiceFormat also accepts \u221E as equivalent to infinity(INF). Note: ChoiceFormat differs from the other Format classes in that you create a ChoiceFormat object with a constructor (not with a getInstance style factory method). The factory methods aren't necessary because ChoiceFormat doesn't require any complex setup for a given locale. In fact, ChoiceFormat doesn't implement any locale specific behavior. When creating a ChoiceFormat, you must specify an array of formats and an array of limits. The length of these arrays must be the same. For example, limits = {1,2,3,4,5,6,7} formats = {"Sun","Mon","Tue","Wed","Thur","Fri","Sat"} limits = {0, 1, ChoiceFormat.nextDouble(1)} formats = {"no files", "one file", "many files"} (nextDouble can be used to get the next higher double, to make the half-open interval.) Here is a simple example that shows formatting and parsing: double[] limits = {1,2,3,4,5,6,7}; String[] dayOfWeekNames = {"Sun","Mon","Tue","Wed","Thur","Fri","Sat"}; ChoiceFormat form = new ChoiceFormat(limits, dayOfWeekNames); ParsePosition status = new ParsePosition(0); for (double i = 0.0; i <= 8.0; +i) { status.setIndex(0); System.out.println(i " -> " form.format(i) " -> " form.parse(form.format(i),status)); } Here is a more complex example, with a pattern format: double[] filelimits = {0,1,2}; String[] filepart = {"are no files","is one file","are {2} files"}; ChoiceFormat fileform = new ChoiceFormat(filelimits, filepart); Format[] testFormats = {fileform, null, NumberFormat.getInstance()}; MessageFormat pattform = new MessageFormat("There {0} on {1}"); pattform.setFormats(testFormats); Object[] testArgs = {null, "ADisk", null}; for (int i = 0; i < 4; +i) { testArgs[0] = new Integer(i); testArgs[2] = testArgs[0]; System.out.println(pattform.format(testArgs)); } Specifying a pattern for ChoiceFormat objects is fairly straightforward. For example: ChoiceFormat fmt = new ChoiceFormat( "-1#is negative| 0#is zero or fraction | 1#is one |1.0<is 1+ |2#is two |2<is more than 2."); System.out.println("Formatter Pattern : " fmt.toPattern()); System.out.println("Format with -INF : " fmt.format(Double.NEGATIVE_INFINITY)); System.out.println("Format with -1.0 : " fmt.format(-1.0)); System.out.println("Format with 0 : " fmt.format(0)); System.out.println("Format with 0.9 : " fmt.format(0.9)); System.out.println("Format with 1.0 : " fmt.format(1)); System.out.println("Format with 1.5 : " fmt.format(1.5)); System.out.println("Format with 2 : " fmt.format(2)); System.out.println("Format with 2.1 : " fmt.format(2.1)); System.out.println("Format with NaN : " fmt.format(Double.NaN)); System.out.println("Format with INF : " fmt.format(Double.POSITIVE_INFINITY)); And the output result would be like the following: Format with -INF : is negative Format with -1.0 : is negative Format with 0 : is zero or fraction Format with 0.9 : is zero or fraction Format with 1.0 : is one Format with 1.5 : is 1+ Format with 2 : is two Format with 2.1 : is more than 2. Format with NaN : is negative Format with INF : is more than 2. Synchronization Choice formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
The CollationElementIterator class is used as an iterator to walk through each character of an international string. Use the iterator to return the ordering priority of the positioned character. The ordering priority of a character, which we refer to as a key, defines how a character is collated in the given collation object.
For example, consider the following in Spanish:
"ca" → the first key is key('c') and second key is key('a'). "cha" → the first key is key('ch') and second key is key('a').
And in German,
"�b" → the first key is key('a'), the second key is key('e'), and the third key is key('b').
The key of a character is an integer composed of primary order(short), secondary order(byte), and tertiary order(byte). Java strictly defines the size and signedness of its primitive data types. Therefore, the static functions primaryOrder, secondaryOrder, and tertiaryOrder return int, short, and short respectively to ensure the correctness of the key value.
Example of the iterator usage,
String testString = "This is a test"; Collator col = Collator.getInstance(); if (col instanceof RuleBasedCollator) { RuleBasedCollator ruleBasedCollator = (RuleBasedCollator)col; CollationElementIterator collationElementIterator = ruleBasedCollator.getCollationElementIterator(testString); int primaryOrder = CollationElementIterator.primaryOrder(collationElementIterator.next()); : }
CollationElementIterator.next returns the collation order of the next character. A collation order consists of primary order, secondary order and tertiary order. The data type of the collation order is int. The first 16 bits of a collation order is its primary order; the next 8 bits is the secondary order and the last 8 bits is the tertiary order.
Note: CollationElementIterator is a part of RuleBasedCollator implementation. It is only usable with RuleBasedCollator instances.
The CollationElementIterator class is used as an iterator to walk through each character of an international string. Use the iterator to return the ordering priority of the positioned character. The ordering priority of a character, which we refer to as a key, defines how a character is collated in the given collation object. For example, consider the following in Spanish: "ca" → the first key is key('c') and second key is key('a'). "cha" → the first key is key('ch') and second key is key('a'). And in German, "�b" → the first key is key('a'), the second key is key('e'), and the third key is key('b'). The key of a character is an integer composed of primary order(short), secondary order(byte), and tertiary order(byte). Java strictly defines the size and signedness of its primitive data types. Therefore, the static functions primaryOrder, secondaryOrder, and tertiaryOrder return int, short, and short respectively to ensure the correctness of the key value. Example of the iterator usage, String testString = "This is a test"; Collator col = Collator.getInstance(); if (col instanceof RuleBasedCollator) { RuleBasedCollator ruleBasedCollator = (RuleBasedCollator)col; CollationElementIterator collationElementIterator = ruleBasedCollator.getCollationElementIterator(testString); int primaryOrder = CollationElementIterator.primaryOrder(collationElementIterator.next()); : } CollationElementIterator.next returns the collation order of the next character. A collation order consists of primary order, secondary order and tertiary order. The data type of the collation order is int. The first 16 bits of a collation order is its primary order; the next 8 bits is the secondary order and the last 8 bits is the tertiary order. Note: CollationElementIterator is a part of RuleBasedCollator implementation. It is only usable with RuleBasedCollator instances.
A CollationKey represents a String under the rules of a specific Collator object. Comparing two CollationKeys returns the relative order of the Strings they represent. Using CollationKeys to compare Strings is generally faster than using Collator.compare. Thus, when the Strings must be compared multiple times, for example when sorting a list of Strings. It's more efficient to use CollationKeys.
You can not create CollationKeys directly. Rather, generate them by calling Collator.getCollationKey. You can only compare CollationKeys generated from the same Collator object.
Generating a CollationKey for a String involves examining the entire String and converting it to series of bits that can be compared bitwise. This allows fast comparisons once the keys are generated. The cost of generating keys is recouped in faster comparisons when Strings need to be compared many times. On the other hand, the result of a comparison is often determined by the first couple of characters of each String. Collator.compare examines only as many characters as it needs which allows it to be faster when doing single comparisons.
The following example shows how CollationKeys might be used to sort a list of Strings.
// Create an array of CollationKeys for the Strings to be sorted. Collator myCollator = Collator.getInstance(); CollationKey[] keys = new CollationKey[3]; keys[0] = myCollator.getCollationKey("Tom"); keys[1] = myCollator.getCollationKey("Dick"); keys[2] = myCollator.getCollationKey("Harry"); sort(keys);
//...
// Inside body of sort routine, compare keys this way if (keys[i].compareTo(keys[j]) > 0) // swap keys[i] and keys[j]
//...
// Finally, when we've returned from sort. System.out.println(keys[0].getSourceString()); System.out.println(keys[1].getSourceString()); System.out.println(keys[2].getSourceString());
A CollationKey represents a String under the rules of a specific Collator object. Comparing two CollationKeys returns the relative order of the Strings they represent. Using CollationKeys to compare Strings is generally faster than using Collator.compare. Thus, when the Strings must be compared multiple times, for example when sorting a list of Strings. It's more efficient to use CollationKeys. You can not create CollationKeys directly. Rather, generate them by calling Collator.getCollationKey. You can only compare CollationKeys generated from the same Collator object. Generating a CollationKey for a String involves examining the entire String and converting it to series of bits that can be compared bitwise. This allows fast comparisons once the keys are generated. The cost of generating keys is recouped in faster comparisons when Strings need to be compared many times. On the other hand, the result of a comparison is often determined by the first couple of characters of each String. Collator.compare examines only as many characters as it needs which allows it to be faster when doing single comparisons. The following example shows how CollationKeys might be used to sort a list of Strings. // Create an array of CollationKeys for the Strings to be sorted. Collator myCollator = Collator.getInstance(); CollationKey[] keys = new CollationKey[3]; keys[0] = myCollator.getCollationKey("Tom"); keys[1] = myCollator.getCollationKey("Dick"); keys[2] = myCollator.getCollationKey("Harry"); sort(keys); //... // Inside body of sort routine, compare keys this way if (keys[i].compareTo(keys[j]) > 0) // swap keys[i] and keys[j] //... // Finally, when we've returned from sort. System.out.println(keys[0].getSourceString()); System.out.println(keys[1].getSourceString()); System.out.println(keys[2].getSourceString());
The Collator class performs locale-sensitive String comparison. You use this class to build searching and sorting routines for natural language text.
Collator is an abstract base class. Subclasses implement specific collation strategies. One subclass, RuleBasedCollator, is currently provided with the Java Platform and is applicable to a wide set of languages. Other subclasses may be created to handle more specialized needs.
Like other locale-sensitive classes, you can use the static factory method, getInstance, to obtain the appropriate Collator object for a given locale. You will only need to look at the subclasses of Collator if you need to understand the details of a particular collation strategy or if you need to modify that strategy.
The following example shows how to compare two strings using the Collator for the default locale.
// Compare two strings in the default locale Collator myCollator = Collator.getInstance(); if( myCollator.compare("abc", "ABC") < 0 ) System.out.println("abc is less than ABC"); else System.out.println("abc is greater than or equal to ABC");
You can set a Collator's strength property to determine the level of difference considered significant in comparisons. Four strengths are provided: PRIMARY, SECONDARY, TERTIARY, and IDENTICAL. The exact assignment of strengths to language features is locale dependant. For example, in Czech, "e" and "f" are considered primary differences, while "e" and "ě" are secondary differences, "e" and "E" are tertiary differences and "e" and "e" are identical. The following shows how both case and accents could be ignored for US English.
//Get the Collator for US English and set its strength to PRIMARY Collator usCollator = Collator.getInstance(Locale.US); usCollator.setStrength(Collator.PRIMARY); if( usCollator.compare("abc", "ABC") == 0 ) { System.out.println("Strings are equivalent"); }
For comparing Strings exactly once, the compare method provides the best performance. When sorting a list of Strings however, it is generally necessary to compare each String multiple times. In this case, CollationKeys provide better performance. The CollationKey class converts a String to a series of bits that can be compared bitwise against other CollationKeys. A CollationKey is created by a Collator object for a given String.
Note: CollationKeys from different Collators can not be compared. See the class description for CollationKey for an example using CollationKeys.
The Collator class performs locale-sensitive String comparison. You use this class to build searching and sorting routines for natural language text. Collator is an abstract base class. Subclasses implement specific collation strategies. One subclass, RuleBasedCollator, is currently provided with the Java Platform and is applicable to a wide set of languages. Other subclasses may be created to handle more specialized needs. Like other locale-sensitive classes, you can use the static factory method, getInstance, to obtain the appropriate Collator object for a given locale. You will only need to look at the subclasses of Collator if you need to understand the details of a particular collation strategy or if you need to modify that strategy. The following example shows how to compare two strings using the Collator for the default locale. // Compare two strings in the default locale Collator myCollator = Collator.getInstance(); if( myCollator.compare("abc", "ABC") < 0 ) System.out.println("abc is less than ABC"); else System.out.println("abc is greater than or equal to ABC"); You can set a Collator's strength property to determine the level of difference considered significant in comparisons. Four strengths are provided: PRIMARY, SECONDARY, TERTIARY, and IDENTICAL. The exact assignment of strengths to language features is locale dependant. For example, in Czech, "e" and "f" are considered primary differences, while "e" and "ě" are secondary differences, "e" and "E" are tertiary differences and "e" and "e" are identical. The following shows how both case and accents could be ignored for US English. //Get the Collator for US English and set its strength to PRIMARY Collator usCollator = Collator.getInstance(Locale.US); usCollator.setStrength(Collator.PRIMARY); if( usCollator.compare("abc", "ABC") == 0 ) { System.out.println("Strings are equivalent"); } For comparing Strings exactly once, the compare method provides the best performance. When sorting a list of Strings however, it is generally necessary to compare each String multiple times. In this case, CollationKeys provide better performance. The CollationKey class converts a String to a series of bits that can be compared bitwise against other CollationKeys. A CollationKey is created by a Collator object for a given String. Note: CollationKeys from different Collators can not be compared. See the class description for CollationKey for an example using CollationKeys.
No vars found in this namespace.
DateFormat is an abstract class for date/time formatting subclasses which formats and parses dates or time in a language-independent manner. The date/time formatting subclass, such as SimpleDateFormat, allows for formatting (i.e., date → text), parsing (text → date), and normalization. The date is represented as a Date object or as the milliseconds since January 1, 1970, 00:00:00 GMT.
DateFormat provides many class methods for obtaining default date/time formatters based on the default or a given locale and a number of formatting styles. The formatting styles include FULL, LONG, MEDIUM, and SHORT. More detail and examples of using these styles are provided in the method descriptions.
DateFormat helps you to format and parse dates for any locale. Your code can be completely independent of the locale conventions for months, days of the week, or even the calendar format: lunar vs. solar.
To format a date for the current Locale, use one of the static factory methods:
myString = DateFormat.getDateInstance().format(myDate);
If you are formatting multiple dates, it is more efficient to get the format and use it multiple times so that the system doesn't have to fetch the information about the local language and country conventions multiple times.
DateFormat df = DateFormat.getDateInstance(); for (int i = 0; i < myDate.length; +i) { output.println(df.format(myDate[i]) "; "); }
To format a date for a different Locale, specify it in the call to getDateInstance().
DateFormat df = DateFormat.getDateInstance(DateFormat.LONG, Locale.FRANCE);
You can use a DateFormat to parse also.
myDate = df.parse(myString);
Use getDateInstance to get the normal date format for that country. There are other static factory methods available. Use getTimeInstance to get the time format for that country. Use getDateTimeInstance to get a date and time format. You can pass in different options to these factory methods to control the length of the result; from SHORT to MEDIUM to LONG to FULL. The exact result depends on the locale, but generally: SHORT is completely numeric, such as 12.13.52 or 3:30pm MEDIUM is longer, such as Jan 12, 1952 LONG is longer, such as January 12, 1952 or 3:30:32pm FULL is pretty completely specified, such as Tuesday, April 12, 1952 AD or 3:30:42pm PST.
You can also set the time zone on the format if you wish. If you want even more control over the format or parsing, (or want to give your users more control), you can try casting the DateFormat you get from the factory methods to a SimpleDateFormat. This will work for the majority of countries; just remember to put it in a try block in case you encounter an unusual one.
You can also use forms of the parse and format methods with ParsePosition and FieldPosition to allow you to progressively parse through pieces of a string. align any particular field, or find out where it is for selection on the screen.
Synchronization
Date formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
DateFormat is an abstract class for date/time formatting subclasses which formats and parses dates or time in a language-independent manner. The date/time formatting subclass, such as SimpleDateFormat, allows for formatting (i.e., date → text), parsing (text → date), and normalization. The date is represented as a Date object or as the milliseconds since January 1, 1970, 00:00:00 GMT. DateFormat provides many class methods for obtaining default date/time formatters based on the default or a given locale and a number of formatting styles. The formatting styles include FULL, LONG, MEDIUM, and SHORT. More detail and examples of using these styles are provided in the method descriptions. DateFormat helps you to format and parse dates for any locale. Your code can be completely independent of the locale conventions for months, days of the week, or even the calendar format: lunar vs. solar. To format a date for the current Locale, use one of the static factory methods: myString = DateFormat.getDateInstance().format(myDate); If you are formatting multiple dates, it is more efficient to get the format and use it multiple times so that the system doesn't have to fetch the information about the local language and country conventions multiple times. DateFormat df = DateFormat.getDateInstance(); for (int i = 0; i < myDate.length; +i) { output.println(df.format(myDate[i]) "; "); } To format a date for a different Locale, specify it in the call to getDateInstance(). DateFormat df = DateFormat.getDateInstance(DateFormat.LONG, Locale.FRANCE); You can use a DateFormat to parse also. myDate = df.parse(myString); Use getDateInstance to get the normal date format for that country. There are other static factory methods available. Use getTimeInstance to get the time format for that country. Use getDateTimeInstance to get a date and time format. You can pass in different options to these factory methods to control the length of the result; from SHORT to MEDIUM to LONG to FULL. The exact result depends on the locale, but generally: SHORT is completely numeric, such as 12.13.52 or 3:30pm MEDIUM is longer, such as Jan 12, 1952 LONG is longer, such as January 12, 1952 or 3:30:32pm FULL is pretty completely specified, such as Tuesday, April 12, 1952 AD or 3:30:42pm PST. You can also set the time zone on the format if you wish. If you want even more control over the format or parsing, (or want to give your users more control), you can try casting the DateFormat you get from the factory methods to a SimpleDateFormat. This will work for the majority of countries; just remember to put it in a try block in case you encounter an unusual one. You can also use forms of the parse and format methods with ParsePosition and FieldPosition to allow you to progressively parse through pieces of a string. align any particular field, or find out where it is for selection on the screen. Synchronization Date formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
Defines constants that are used as attribute keys in the AttributedCharacterIterator returned from DateFormat.formatToCharacterIterator and as field identifiers in FieldPosition.
The class also provides two methods to map between its constants and the corresponding Calendar constants.
Defines constants that are used as attribute keys in the AttributedCharacterIterator returned from DateFormat.formatToCharacterIterator and as field identifiers in FieldPosition. The class also provides two methods to map between its constants and the corresponding Calendar constants.
DateFormatSymbols is a public class for encapsulating localizable date-time formatting data, such as the names of the months, the names of the days of the week, and the time zone data. SimpleDateFormat uses DateFormatSymbols to encapsulate this information.
Typically you shouldn't use DateFormatSymbols directly. Rather, you are encouraged to create a date-time formatter with the DateFormat class's factory methods: getTimeInstance, getDateInstance, or getDateTimeInstance. These methods automatically create a DateFormatSymbols for the formatter so that you don't have to. After the formatter is created, you may modify its format pattern using the setPattern method. For more information about creating formatters using DateFormat's factory methods, see DateFormat.
If you decide to create a date-time formatter with a specific format pattern for a specific locale, you can do so with:
new SimpleDateFormat(aPattern, DateFormatSymbols.getInstance(aLocale)).
DateFormatSymbols objects are cloneable. When you obtain a DateFormatSymbols object, feel free to modify the date-time formatting data. For instance, you can replace the localized date-time format pattern characters with the ones that you feel easy to remember. Or you can change the representative cities to your favorite ones.
New DateFormatSymbols subclasses may be added to support SimpleDateFormat for date-time formatting for additional locales.
DateFormatSymbols is a public class for encapsulating localizable date-time formatting data, such as the names of the months, the names of the days of the week, and the time zone data. SimpleDateFormat uses DateFormatSymbols to encapsulate this information. Typically you shouldn't use DateFormatSymbols directly. Rather, you are encouraged to create a date-time formatter with the DateFormat class's factory methods: getTimeInstance, getDateInstance, or getDateTimeInstance. These methods automatically create a DateFormatSymbols for the formatter so that you don't have to. After the formatter is created, you may modify its format pattern using the setPattern method. For more information about creating formatters using DateFormat's factory methods, see DateFormat. If you decide to create a date-time formatter with a specific format pattern for a specific locale, you can do so with: new SimpleDateFormat(aPattern, DateFormatSymbols.getInstance(aLocale)). DateFormatSymbols objects are cloneable. When you obtain a DateFormatSymbols object, feel free to modify the date-time formatting data. For instance, you can replace the localized date-time format pattern characters with the ones that you feel easy to remember. Or you can change the representative cities to your favorite ones. New DateFormatSymbols subclasses may be added to support SimpleDateFormat for date-time formatting for additional locales.
DecimalFormat is a concrete subclass of NumberFormat that formats decimal numbers. It has a variety of features designed to make it possible to parse and format numbers in any locale, including support for Western, Arabic, and Indic digits. It also supports different kinds of numbers, including integers (123), fixed-point numbers (123.4), scientific notation (1.23E4), percentages (12%), and currency amounts ($123). All of these can be localized.
To obtain a NumberFormat for a specific locale, including the default locale, call one of NumberFormat's factory methods, such as getInstance(). In general, do not call the DecimalFormat constructors directly, since the NumberFormat factory methods may return subclasses other than DecimalFormat. If you need to customize the format object, do something like this:
NumberFormat f = NumberFormat.getInstance(loc); if (f instanceof DecimalFormat) { ((DecimalFormat) f).setDecimalSeparatorAlwaysShown(true); }
A DecimalFormat comprises a pattern and a set of symbols. The pattern may be set directly using applyPattern(), or indirectly using the API methods. The symbols are stored in a DecimalFormatSymbols object. When using the NumberFormat factory methods, the pattern and symbols are read from localized ResourceBundles.
Patterns
DecimalFormat patterns have the following syntax:
Pattern: PositivePattern PositivePattern ; NegativePattern PositivePattern: Prefixopt Number Suffixopt NegativePattern: Prefixopt Number Suffixopt Prefix: any Unicode characters except \uFFFE, \uFFFF, and special characters Suffix: any Unicode characters except \uFFFE, \uFFFF, and special characters Number: Integer Exponentopt Integer . Fraction Exponentopt Integer: MinimumInteger # # Integer # , Integer MinimumInteger: 0 0 MinimumInteger 0 , MinimumInteger Fraction: MinimumFractionopt OptionalFractionopt MinimumFraction: 0 MinimumFractionopt OptionalFraction: # OptionalFractionopt Exponent: E MinimumExponent MinimumExponent: 0 MinimumExponentopt
A DecimalFormat pattern contains a positive and negative subpattern, for example, "#,##0.00;(#,##0.00)". Each subpattern has a prefix, numeric part, and suffix. The negative subpattern is optional; if absent, then the positive subpattern prefixed with the localized minus sign ('-' in most locales) is used as the negative subpattern. That is, "0.00" alone is equivalent to "0.00;-0.00". If there is an explicit negative subpattern, it serves only to specify the negative prefix and suffix; the number of digits, minimal digits, and other characteristics are all the same as the positive pattern. That means that "#,##0.0#;(#)" produces precisely the same behavior as "#,##0.0#;(#,##0.0#)".
The prefixes, suffixes, and various symbols used for infinity, digits, thousands separators, decimal separators, etc. may be set to arbitrary values, and they will appear properly during formatting. However, care must be taken that the symbols and strings do not conflict, or parsing will be unreliable. For example, either the positive and negative prefixes or the suffixes must be distinct for DecimalFormat.parse() to be able to distinguish positive from negative values. (If they are identical, then DecimalFormat will behave as if no negative subpattern was specified.) Another example is that the decimal separator and thousands separator should be distinct characters, or parsing will be impossible.
The grouping separator is commonly used for thousands, but in some countries it separates ten-thousands. The grouping size is a constant number of digits between the grouping characters, such as 3 for 100,000,000 or 4 for 1,0000,0000. If you supply a pattern with multiple grouping characters, the interval between the last one and the end of the integer is the one that is used. So "#,##,###,####" == "######,####" == "##,####,####".
Special Pattern Characters
Many characters in a pattern are taken literally; they are matched during parsing and output unchanged during formatting. Special characters, on the other hand, stand for other characters, strings, or classes of characters. They must be quoted, unless noted otherwise, if they are to appear in the prefix or suffix as literals.
The characters listed here are used in non-localized patterns. Localized patterns use the corresponding characters taken from this formatter's DecimalFormatSymbols object instead, and these characters lose their special status. Two exceptions are the currency sign and quote, which are not localized.
Symbol
Location
Localized?
Meaning
0
Number
Yes
Digit
#
Number
Yes
Digit, zero shows as absent
.
Number
Yes
Decimal separator or monetary decimal separator
-
Number
Yes
Minus sign
,
Number
Yes
Grouping separator
E
Number
Yes
Separates mantissa and exponent in scientific notation.
Need not be quoted in prefix or suffix.
;
Subpattern boundary
Yes
Separates positive and negative subpatterns
%
Prefix or suffix
Yes
Multiply by 100 and show as percentage
\u2030
Prefix or suffix
Yes
Multiply by 1000 and show as per mille value
¤ (\u00A4)
Prefix or suffix
No
Currency sign, replaced by currency symbol. If
doubled, replaced by international currency symbol.
If present in a pattern, the monetary decimal separator
is used instead of the decimal separator.
'
Prefix or suffix
No
Used to quote special characters in a prefix or suffix,
for example, "'#'#" formats 123 to
"#123". To create a single quote
itself, use two in a row: "# o''clock".
Scientific Notation
Numbers in scientific notation are expressed as the product of a mantissa and a power of ten, for example, 1234 can be expressed as 1.234 x 10^3. The mantissa is often in the range 1.0 ≤ x < 10.0, but it need not be. DecimalFormat can be instructed to format and parse scientific notation only via a pattern; there is currently no factory method that creates a scientific notation format. In a pattern, the exponent character immediately followed by one or more digit characters indicates scientific notation. Example: "0.###E0" formats the number 1234 as "1.234E3".
The number of digit characters after the exponent character gives the minimum exponent digit count. There is no maximum. Negative exponents are formatted using the localized minus sign, not the prefix and suffix from the pattern. This allows patterns such as "0.###E0 m/s".
The minimum and maximum number of integer digits are interpreted together:
If the maximum number of integer digits is greater than their minimum number and greater than 1, it forces the exponent to be a multiple of the maximum number of integer digits, and the minimum number of integer digits to be interpreted as 1. The most common use of this is to generate engineering notation, in which the exponent is a multiple of three, e.g., "##0.#####E0". Using this pattern, the number 12345 formats to "12.345E3", and 123456 formats to "123.456E3".
Otherwise, the minimum number of integer digits is achieved by adjusting the exponent. Example: 0.00123 formatted with "00.###E0" yields "12.3E-4".
The number of significant digits in the mantissa is the sum of the minimum integer and maximum fraction digits, and is unaffected by the maximum integer digits. For example, 12345 formatted with "##0.##E0" is "12.3E3". To show all digits, set the significant digits count to zero. The number of significant digits does not affect parsing.
Exponential patterns may not contain grouping separators.
Rounding
DecimalFormat provides rounding modes defined in RoundingMode for formatting. By default, it uses RoundingMode.HALF_EVEN.
Digits
For formatting, DecimalFormat uses the ten consecutive characters starting with the localized zero digit defined in the DecimalFormatSymbols object as digits. For parsing, these digits as well as all Unicode decimal digits, as defined by Character.digit, are recognized.
Special Values
NaN is formatted as a string, which typically has a single character \uFFFD. This string is determined by the DecimalFormatSymbols object. This is the only value for which the prefixes and suffixes are not used.
Infinity is formatted as a string, which typically has a single character \u221E, with the positive or negative prefixes and suffixes applied. The infinity string is determined by the DecimalFormatSymbols object.
Negative zero ("-0") parses to
BigDecimal(0) if isParseBigDecimal() is true, Long(0) if isParseBigDecimal() is false and isParseIntegerOnly() is true, Double(-0.0) if both isParseBigDecimal() and isParseIntegerOnly() are false.
Synchronization
Decimal formats are generally not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
Example
<strong>// Print out a number using the localized number, integer, currency, // and percent format for each locale</strong> Locale[] locales = NumberFormat.getAvailableLocales(); double myNumber = -1234.56; NumberFormat form; for (int j = 0; j < 4; +j) { System.out.println("FORMAT"); for (int i = 0; i < locales.length; +i) { if (locales[i].getCountry().length() == 0) { continue; // Skip language-only locales } System.out.print(locales[i].getDisplayName()); switch (j) { case 0: form = NumberFormat.getInstance(locales[i]); break; case 1: form = NumberFormat.getIntegerInstance(locales[i]); break; case 2: form = NumberFormat.getCurrencyInstance(locales[i]); break; default: form = NumberFormat.getPercentInstance(locales[i]); break; } if (form instanceof DecimalFormat) { System.out.print(": " ((DecimalFormat) form).toPattern()); } System.out.print(" -> " form.format(myNumber)); try { System.out.println(" -> " form.parse(form.format(myNumber))); } catch (ParseException e) {} } }
DecimalFormat is a concrete subclass of NumberFormat that formats decimal numbers. It has a variety of features designed to make it possible to parse and format numbers in any locale, including support for Western, Arabic, and Indic digits. It also supports different kinds of numbers, including integers (123), fixed-point numbers (123.4), scientific notation (1.23E4), percentages (12%), and currency amounts ($123). All of these can be localized. To obtain a NumberFormat for a specific locale, including the default locale, call one of NumberFormat's factory methods, such as getInstance(). In general, do not call the DecimalFormat constructors directly, since the NumberFormat factory methods may return subclasses other than DecimalFormat. If you need to customize the format object, do something like this: NumberFormat f = NumberFormat.getInstance(loc); if (f instanceof DecimalFormat) { ((DecimalFormat) f).setDecimalSeparatorAlwaysShown(true); } A DecimalFormat comprises a pattern and a set of symbols. The pattern may be set directly using applyPattern(), or indirectly using the API methods. The symbols are stored in a DecimalFormatSymbols object. When using the NumberFormat factory methods, the pattern and symbols are read from localized ResourceBundles. Patterns DecimalFormat patterns have the following syntax: Pattern: PositivePattern PositivePattern ; NegativePattern PositivePattern: Prefixopt Number Suffixopt NegativePattern: Prefixopt Number Suffixopt Prefix: any Unicode characters except \uFFFE, \uFFFF, and special characters Suffix: any Unicode characters except \uFFFE, \uFFFF, and special characters Number: Integer Exponentopt Integer . Fraction Exponentopt Integer: MinimumInteger # # Integer # , Integer MinimumInteger: 0 0 MinimumInteger 0 , MinimumInteger Fraction: MinimumFractionopt OptionalFractionopt MinimumFraction: 0 MinimumFractionopt OptionalFraction: # OptionalFractionopt Exponent: E MinimumExponent MinimumExponent: 0 MinimumExponentopt A DecimalFormat pattern contains a positive and negative subpattern, for example, "#,##0.00;(#,##0.00)". Each subpattern has a prefix, numeric part, and suffix. The negative subpattern is optional; if absent, then the positive subpattern prefixed with the localized minus sign ('-' in most locales) is used as the negative subpattern. That is, "0.00" alone is equivalent to "0.00;-0.00". If there is an explicit negative subpattern, it serves only to specify the negative prefix and suffix; the number of digits, minimal digits, and other characteristics are all the same as the positive pattern. That means that "#,##0.0#;(#)" produces precisely the same behavior as "#,##0.0#;(#,##0.0#)". The prefixes, suffixes, and various symbols used for infinity, digits, thousands separators, decimal separators, etc. may be set to arbitrary values, and they will appear properly during formatting. However, care must be taken that the symbols and strings do not conflict, or parsing will be unreliable. For example, either the positive and negative prefixes or the suffixes must be distinct for DecimalFormat.parse() to be able to distinguish positive from negative values. (If they are identical, then DecimalFormat will behave as if no negative subpattern was specified.) Another example is that the decimal separator and thousands separator should be distinct characters, or parsing will be impossible. The grouping separator is commonly used for thousands, but in some countries it separates ten-thousands. The grouping size is a constant number of digits between the grouping characters, such as 3 for 100,000,000 or 4 for 1,0000,0000. If you supply a pattern with multiple grouping characters, the interval between the last one and the end of the integer is the one that is used. So "#,##,###,####" == "######,####" == "##,####,####". Special Pattern Characters Many characters in a pattern are taken literally; they are matched during parsing and output unchanged during formatting. Special characters, on the other hand, stand for other characters, strings, or classes of characters. They must be quoted, unless noted otherwise, if they are to appear in the prefix or suffix as literals. The characters listed here are used in non-localized patterns. Localized patterns use the corresponding characters taken from this formatter's DecimalFormatSymbols object instead, and these characters lose their special status. Two exceptions are the currency sign and quote, which are not localized. Symbol Location Localized? Meaning 0 Number Yes Digit # Number Yes Digit, zero shows as absent . Number Yes Decimal separator or monetary decimal separator - Number Yes Minus sign , Number Yes Grouping separator E Number Yes Separates mantissa and exponent in scientific notation. Need not be quoted in prefix or suffix. ; Subpattern boundary Yes Separates positive and negative subpatterns % Prefix or suffix Yes Multiply by 100 and show as percentage \u2030 Prefix or suffix Yes Multiply by 1000 and show as per mille value ¤ (\u00A4) Prefix or suffix No Currency sign, replaced by currency symbol. If doubled, replaced by international currency symbol. If present in a pattern, the monetary decimal separator is used instead of the decimal separator. ' Prefix or suffix No Used to quote special characters in a prefix or suffix, for example, "'#'#" formats 123 to "#123". To create a single quote itself, use two in a row: "# o''clock". Scientific Notation Numbers in scientific notation are expressed as the product of a mantissa and a power of ten, for example, 1234 can be expressed as 1.234 x 10^3. The mantissa is often in the range 1.0 ≤ x < 10.0, but it need not be. DecimalFormat can be instructed to format and parse scientific notation only via a pattern; there is currently no factory method that creates a scientific notation format. In a pattern, the exponent character immediately followed by one or more digit characters indicates scientific notation. Example: "0.###E0" formats the number 1234 as "1.234E3". The number of digit characters after the exponent character gives the minimum exponent digit count. There is no maximum. Negative exponents are formatted using the localized minus sign, not the prefix and suffix from the pattern. This allows patterns such as "0.###E0 m/s". The minimum and maximum number of integer digits are interpreted together: If the maximum number of integer digits is greater than their minimum number and greater than 1, it forces the exponent to be a multiple of the maximum number of integer digits, and the minimum number of integer digits to be interpreted as 1. The most common use of this is to generate engineering notation, in which the exponent is a multiple of three, e.g., "##0.#####E0". Using this pattern, the number 12345 formats to "12.345E3", and 123456 formats to "123.456E3". Otherwise, the minimum number of integer digits is achieved by adjusting the exponent. Example: 0.00123 formatted with "00.###E0" yields "12.3E-4". The number of significant digits in the mantissa is the sum of the minimum integer and maximum fraction digits, and is unaffected by the maximum integer digits. For example, 12345 formatted with "##0.##E0" is "12.3E3". To show all digits, set the significant digits count to zero. The number of significant digits does not affect parsing. Exponential patterns may not contain grouping separators. Rounding DecimalFormat provides rounding modes defined in RoundingMode for formatting. By default, it uses RoundingMode.HALF_EVEN. Digits For formatting, DecimalFormat uses the ten consecutive characters starting with the localized zero digit defined in the DecimalFormatSymbols object as digits. For parsing, these digits as well as all Unicode decimal digits, as defined by Character.digit, are recognized. Special Values NaN is formatted as a string, which typically has a single character \uFFFD. This string is determined by the DecimalFormatSymbols object. This is the only value for which the prefixes and suffixes are not used. Infinity is formatted as a string, which typically has a single character \u221E, with the positive or negative prefixes and suffixes applied. The infinity string is determined by the DecimalFormatSymbols object. Negative zero ("-0") parses to BigDecimal(0) if isParseBigDecimal() is true, Long(0) if isParseBigDecimal() is false and isParseIntegerOnly() is true, Double(-0.0) if both isParseBigDecimal() and isParseIntegerOnly() are false. Synchronization Decimal formats are generally not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally. Example <strong>// Print out a number using the localized number, integer, currency, // and percent format for each locale</strong> Locale[] locales = NumberFormat.getAvailableLocales(); double myNumber = -1234.56; NumberFormat form; for (int j = 0; j < 4; +j) { System.out.println("FORMAT"); for (int i = 0; i < locales.length; +i) { if (locales[i].getCountry().length() == 0) { continue; // Skip language-only locales } System.out.print(locales[i].getDisplayName()); switch (j) { case 0: form = NumberFormat.getInstance(locales[i]); break; case 1: form = NumberFormat.getIntegerInstance(locales[i]); break; case 2: form = NumberFormat.getCurrencyInstance(locales[i]); break; default: form = NumberFormat.getPercentInstance(locales[i]); break; } if (form instanceof DecimalFormat) { System.out.print(": " ((DecimalFormat) form).toPattern()); } System.out.print(" -> " form.format(myNumber)); try { System.out.println(" -> " form.parse(form.format(myNumber))); } catch (ParseException e) {} } }
This class represents the set of symbols (such as the decimal separator, the grouping separator, and so on) needed by DecimalFormat to format numbers. DecimalFormat creates for itself an instance of DecimalFormatSymbols from its locale data. If you need to change any of these symbols, you can get the DecimalFormatSymbols object from your DecimalFormat and modify it.
This class represents the set of symbols (such as the decimal separator, the grouping separator, and so on) needed by DecimalFormat to format numbers. DecimalFormat creates for itself an instance of DecimalFormatSymbols from its locale data. If you need to change any of these symbols, you can get the DecimalFormatSymbols object from your DecimalFormat and modify it.
FieldPosition is a simple class used by Format and its subclasses to identify fields in formatted output. Fields can be identified in two ways:
By an integer constant, whose names typically end with _FIELD. The constants are defined in the various subclasses of Format. By a Format.Field constant, see ERA_FIELD and its friends in DateFormat for an example.
FieldPosition keeps track of the position of the field within the formatted output with two indices: the index of the first character of the field and the index of the last character of the field.
One version of the format method in the various Format classes requires a FieldPosition object as an argument. You use this format method to perform partial formatting or to get information about the formatted output (such as the position of a field).
If you are interested in the positions of all attributes in the formatted string use the Format method formatToCharacterIterator.
FieldPosition is a simple class used by Format and its subclasses to identify fields in formatted output. Fields can be identified in two ways: By an integer constant, whose names typically end with _FIELD. The constants are defined in the various subclasses of Format. By a Format.Field constant, see ERA_FIELD and its friends in DateFormat for an example. FieldPosition keeps track of the position of the field within the formatted output with two indices: the index of the first character of the field and the index of the last character of the field. One version of the format method in the various Format classes requires a FieldPosition object as an argument. You use this format method to perform partial formatting or to get information about the formatted output (such as the position of a field). If you are interested in the positions of all attributes in the formatted string use the Format method formatToCharacterIterator.
Format is an abstract base class for formatting locale-sensitive information such as dates, messages, and numbers.
Format defines the programming interface for formatting locale-sensitive objects into Strings (the format method) and for parsing Strings back into objects (the parseObject method).
Generally, a format's parseObject method must be able to parse any string formatted by its format method. However, there may be exceptional cases where this is not possible. For example, a format method might create two adjacent integer numbers with no separator in between, and in this case the parseObject could not tell which digits belong to which number.
Subclassing
The Java Platform provides three specialized subclasses of Format-- DateFormat, MessageFormat, and NumberFormat--for formatting dates, messages, and numbers, respectively.
Concrete subclasses must implement three methods:
format(Object obj, StringBuffer toAppendTo, FieldPosition pos) formatToCharacterIterator(Object obj) parseObject(String source, ParsePosition pos)
These general methods allow polymorphic parsing and formatting of objects and are used, for example, by MessageFormat. Subclasses often also provide additional format methods for specific input types as well as parse methods for specific result types. Any parse method that does not take a ParsePosition argument should throw ParseException when no text in the required format is at the beginning of the input text.
Most subclasses will also implement the following factory methods:
getInstance for getting a useful format object appropriate for the current locale
getInstance(Locale) for getting a useful format object appropriate for the specified locale
In addition, some subclasses may also implement other getXxxxInstance methods for more specialized control. For example, the NumberFormat class provides getPercentInstance and getCurrencyInstance methods for getting specialized number formatters.
Subclasses of Format that allow programmers to create objects for locales (with getInstance(Locale) for example) must also implement the following class method:
public static Locale[] getAvailableLocales()
And finally subclasses may define a set of constants to identify the various fields in the formatted output. These constants are used to create a FieldPosition object which identifies what information is contained in the field and its position in the formatted result. These constants should be named item_FIELD where item identifies the field. For examples of these constants, see ERA_FIELD and its friends in DateFormat.
Synchronization
Formats are generally not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
Format is an abstract base class for formatting locale-sensitive information such as dates, messages, and numbers. Format defines the programming interface for formatting locale-sensitive objects into Strings (the format method) and for parsing Strings back into objects (the parseObject method). Generally, a format's parseObject method must be able to parse any string formatted by its format method. However, there may be exceptional cases where this is not possible. For example, a format method might create two adjacent integer numbers with no separator in between, and in this case the parseObject could not tell which digits belong to which number. Subclassing The Java Platform provides three specialized subclasses of Format-- DateFormat, MessageFormat, and NumberFormat--for formatting dates, messages, and numbers, respectively. Concrete subclasses must implement three methods: format(Object obj, StringBuffer toAppendTo, FieldPosition pos) formatToCharacterIterator(Object obj) parseObject(String source, ParsePosition pos) These general methods allow polymorphic parsing and formatting of objects and are used, for example, by MessageFormat. Subclasses often also provide additional format methods for specific input types as well as parse methods for specific result types. Any parse method that does not take a ParsePosition argument should throw ParseException when no text in the required format is at the beginning of the input text. Most subclasses will also implement the following factory methods: getInstance for getting a useful format object appropriate for the current locale getInstance(Locale) for getting a useful format object appropriate for the specified locale In addition, some subclasses may also implement other getXxxxInstance methods for more specialized control. For example, the NumberFormat class provides getPercentInstance and getCurrencyInstance methods for getting specialized number formatters. Subclasses of Format that allow programmers to create objects for locales (with getInstance(Locale) for example) must also implement the following class method: public static Locale[] getAvailableLocales() And finally subclasses may define a set of constants to identify the various fields in the formatted output. These constants are used to create a FieldPosition object which identifies what information is contained in the field and its position in the formatted result. These constants should be named item_FIELD where item identifies the field. For examples of these constants, see ERA_FIELD and its friends in DateFormat. Synchronization Formats are generally not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
Defines constants that are used as attribute keys in the AttributedCharacterIterator returned from Format.formatToCharacterIterator and as field identifiers in FieldPosition.
Defines constants that are used as attribute keys in the AttributedCharacterIterator returned from Format.formatToCharacterIterator and as field identifiers in FieldPosition.
No vars found in this namespace.
MessageFormat provides a means to produce concatenated messages in a language-neutral way. Use this to construct messages displayed for end users.
MessageFormat takes a set of objects, formats them, then inserts the formatted strings into the pattern at the appropriate places.
Note: MessageFormat differs from the other Format classes in that you create a MessageFormat object with one of its constructors (not with a getInstance style factory method). The factory methods aren't necessary because MessageFormat itself doesn't implement locale specific behavior. Any locale specific behavior is defined by the pattern that you provide as well as the subformats used for inserted arguments.
Patterns and Their Interpretation
MessageFormat uses patterns of the following form:
MessageFormatPattern: String MessageFormatPattern FormatElement String
FormatElement: { ArgumentIndex } { ArgumentIndex , FormatType } { ArgumentIndex , FormatType , FormatStyle }
FormatType: one of number date time choice
FormatStyle: short medium long full integer currency percent SubformatPattern
Within a String, a pair of single quotes can be used to quote any arbitrary characters except single quotes. For example, pattern string "'{0}'" represents string "{0}", not a FormatElement. A single quote itself must be represented by doubled single quotes '' throughout a String. For example, pattern string "'{''}'" is interpreted as a sequence of '{ (start of quoting and a left curly brace), '' (a single quote), and }' (a right curly brace and end of quoting), not '{' and '}' (quoted left and right curly braces): representing string "{'}", not "{}".
A SubformatPattern is interpreted by its corresponding subformat, and subformat-dependent pattern rules apply. For example, pattern string "{1,number,$'#',##}" (SubformatPattern with underline) will produce a number format with the pound-sign quoted, with a result such as: "$#31,45". Refer to each Format subclass documentation for details.
Any unmatched quote is treated as closed at the end of the given pattern. For example, pattern string "'{0}" is treated as pattern "'{0}'".
Any curly braces within an unquoted pattern must be balanced. For example, "ab {0} de" and "ab '}' de" are valid patterns, but "ab {0'}' de", "ab } de" and "''{''" are not.
Warning:The rules for using quotes within message format patterns unfortunately have shown to be somewhat confusing. In particular, it isn't always obvious to localizers whether single quotes need to be doubled or not. Make sure to inform localizers about the rules, and tell them (for example, by using comments in resource bundle source files) which strings will be processed by MessageFormat. Note that localizers may need to use single quotes in translated strings where the original version doesn't have them.
The ArgumentIndex value is a non-negative integer written using the digits '0' through '9', and represents an index into the arguments array passed to the format methods or the result array returned by the parse methods.
The FormatType and FormatStyle values are used to create a Format instance for the format element. The following table shows how the values map to Format instances. Combinations not shown in the table are illegal. A SubformatPattern must be a valid pattern string for the Format subclass used.
FormatType
FormatStyle
Subformat Created
(none)
(none)
null
number
(none)
NumberFormat.getInstance(getLocale())
integer
NumberFormat.getIntegerInstance(getLocale())
currency
NumberFormat.getCurrencyInstance(getLocale())
percent
NumberFormat.getPercentInstance(getLocale())
SubformatPattern
new DecimalFormat(subformatPattern, DecimalFormatSymbols.getInstance(getLocale()))
date
(none)
DateFormat.getDateInstance(DateFormat.DEFAULT, getLocale())
short
DateFormat.getDateInstance(DateFormat.SHORT, getLocale())
medium
DateFormat.getDateInstance(DateFormat.DEFAULT, getLocale())
long
DateFormat.getDateInstance(DateFormat.LONG, getLocale())
full
DateFormat.getDateInstance(DateFormat.FULL, getLocale())
SubformatPattern
new SimpleDateFormat(subformatPattern, getLocale())
time
(none)
DateFormat.getTimeInstance(DateFormat.DEFAULT, getLocale())
short
DateFormat.getTimeInstance(DateFormat.SHORT, getLocale())
medium
DateFormat.getTimeInstance(DateFormat.DEFAULT, getLocale())
long
DateFormat.getTimeInstance(DateFormat.LONG, getLocale())
full
DateFormat.getTimeInstance(DateFormat.FULL, getLocale())
SubformatPattern
new SimpleDateFormat(subformatPattern, getLocale())
choice
SubformatPattern
new ChoiceFormat(subformatPattern)
Usage Information
Here are some examples of usage. In real internationalized programs, the message format pattern and other static strings will, of course, be obtained from resource bundles. Other parameters will be dynamically determined at runtime.
The first example uses the static method MessageFormat.format, which internally creates a MessageFormat for one-time use:
int planet = 7; String event = "a disturbance in the Force";
String result = MessageFormat.format( "At {1,time} on {1,date}, there was {2} on planet {0,number,integer}.", planet, new Date(), event); The output is:
At 12:30 PM on Jul 3, 2053, there was a disturbance in the Force on planet 7.
The following example creates a MessageFormat instance that can be used repeatedly:
int fileCount = 1273; String diskName = "MyDisk"; Object[] testArgs = {new Long(fileCount), diskName};
MessageFormat form = new MessageFormat( "The disk "{1}" contains {0} file(s).");
System.out.println(form.format(testArgs)); The output with different values for fileCount:
The disk "MyDisk" contains 0 file(s). The disk "MyDisk" contains 1 file(s). The disk "MyDisk" contains 1,273 file(s).
For more sophisticated patterns, you can use a ChoiceFormat to produce correct forms for singular and plural:
MessageFormat form = new MessageFormat("The disk "{1}" contains {0}."); double[] filelimits = {0,1,2}; String[] filepart = {"no files","one file","{0,number} files"}; ChoiceFormat fileform = new ChoiceFormat(filelimits, filepart); form.setFormatByArgumentIndex(0, fileform);
int fileCount = 1273; String diskName = "MyDisk"; Object[] testArgs = {new Long(fileCount), diskName};
System.out.println(form.format(testArgs)); The output with different values for fileCount:
The disk "MyDisk" contains no files. The disk "MyDisk" contains one file. The disk "MyDisk" contains 1,273 files.
You can create the ChoiceFormat programmatically, as in the above example, or by using a pattern. See ChoiceFormat for more information.
form.applyPattern( "There {0,choice,0#are no files|1#is one file|1<are {0,number,integer} files}.");
Note: As we see above, the string produced by a ChoiceFormat in MessageFormat is treated as special; occurrences of '{' are used to indicate subformats, and cause recursion. If you create both a MessageFormat and ChoiceFormat programmatically (instead of using the string patterns), then be careful not to produce a format that recurses on itself, which will cause an infinite loop.
When a single argument is parsed more than once in the string, the last match will be the final result of the parsing. For example,
MessageFormat mf = new MessageFormat("{0,number,#.##}, {0,number,#.#}"); Object[] objs = {new Double(3.1415)}; String result = mf.format( objs ); // result now equals "3.14, 3.1" objs = null; objs = mf.parse(result, new ParsePosition(0)); // objs now equals {new Double(3.1)}
Likewise, parsing with a MessageFormat object using patterns containing multiple occurrences of the same argument would return the last match. For example,
MessageFormat mf = new MessageFormat("{0}, {0}, {0}"); String forParsing = "x, y, z"; Object[] objs = mf.parse(forParsing, new ParsePosition(0)); // result now equals {new String("z")}
Synchronization
Message formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
MessageFormat provides a means to produce concatenated messages in a language-neutral way. Use this to construct messages displayed for end users. MessageFormat takes a set of objects, formats them, then inserts the formatted strings into the pattern at the appropriate places. Note: MessageFormat differs from the other Format classes in that you create a MessageFormat object with one of its constructors (not with a getInstance style factory method). The factory methods aren't necessary because MessageFormat itself doesn't implement locale specific behavior. Any locale specific behavior is defined by the pattern that you provide as well as the subformats used for inserted arguments. Patterns and Their Interpretation MessageFormat uses patterns of the following form: MessageFormatPattern: String MessageFormatPattern FormatElement String FormatElement: { ArgumentIndex } { ArgumentIndex , FormatType } { ArgumentIndex , FormatType , FormatStyle } FormatType: one of number date time choice FormatStyle: short medium long full integer currency percent SubformatPattern Within a String, a pair of single quotes can be used to quote any arbitrary characters except single quotes. For example, pattern string "'{0}'" represents string "{0}", not a FormatElement. A single quote itself must be represented by doubled single quotes '' throughout a String. For example, pattern string "'{''}'" is interpreted as a sequence of '{ (start of quoting and a left curly brace), '' (a single quote), and }' (a right curly brace and end of quoting), not '{' and '}' (quoted left and right curly braces): representing string "{'}", not "{}". A SubformatPattern is interpreted by its corresponding subformat, and subformat-dependent pattern rules apply. For example, pattern string "{1,number,$'#',##}" (SubformatPattern with underline) will produce a number format with the pound-sign quoted, with a result such as: "$#31,45". Refer to each Format subclass documentation for details. Any unmatched quote is treated as closed at the end of the given pattern. For example, pattern string "'{0}" is treated as pattern "'{0}'". Any curly braces within an unquoted pattern must be balanced. For example, "ab {0} de" and "ab '}' de" are valid patterns, but "ab {0'}' de", "ab } de" and "''{''" are not. Warning:The rules for using quotes within message format patterns unfortunately have shown to be somewhat confusing. In particular, it isn't always obvious to localizers whether single quotes need to be doubled or not. Make sure to inform localizers about the rules, and tell them (for example, by using comments in resource bundle source files) which strings will be processed by MessageFormat. Note that localizers may need to use single quotes in translated strings where the original version doesn't have them. The ArgumentIndex value is a non-negative integer written using the digits '0' through '9', and represents an index into the arguments array passed to the format methods or the result array returned by the parse methods. The FormatType and FormatStyle values are used to create a Format instance for the format element. The following table shows how the values map to Format instances. Combinations not shown in the table are illegal. A SubformatPattern must be a valid pattern string for the Format subclass used. FormatType FormatStyle Subformat Created (none) (none) null number (none) NumberFormat.getInstance(getLocale()) integer NumberFormat.getIntegerInstance(getLocale()) currency NumberFormat.getCurrencyInstance(getLocale()) percent NumberFormat.getPercentInstance(getLocale()) SubformatPattern new DecimalFormat(subformatPattern, DecimalFormatSymbols.getInstance(getLocale())) date (none) DateFormat.getDateInstance(DateFormat.DEFAULT, getLocale()) short DateFormat.getDateInstance(DateFormat.SHORT, getLocale()) medium DateFormat.getDateInstance(DateFormat.DEFAULT, getLocale()) long DateFormat.getDateInstance(DateFormat.LONG, getLocale()) full DateFormat.getDateInstance(DateFormat.FULL, getLocale()) SubformatPattern new SimpleDateFormat(subformatPattern, getLocale()) time (none) DateFormat.getTimeInstance(DateFormat.DEFAULT, getLocale()) short DateFormat.getTimeInstance(DateFormat.SHORT, getLocale()) medium DateFormat.getTimeInstance(DateFormat.DEFAULT, getLocale()) long DateFormat.getTimeInstance(DateFormat.LONG, getLocale()) full DateFormat.getTimeInstance(DateFormat.FULL, getLocale()) SubformatPattern new SimpleDateFormat(subformatPattern, getLocale()) choice SubformatPattern new ChoiceFormat(subformatPattern) Usage Information Here are some examples of usage. In real internationalized programs, the message format pattern and other static strings will, of course, be obtained from resource bundles. Other parameters will be dynamically determined at runtime. The first example uses the static method MessageFormat.format, which internally creates a MessageFormat for one-time use: int planet = 7; String event = "a disturbance in the Force"; String result = MessageFormat.format( "At {1,time} on {1,date}, there was {2} on planet {0,number,integer}.", planet, new Date(), event); The output is: At 12:30 PM on Jul 3, 2053, there was a disturbance in the Force on planet 7. The following example creates a MessageFormat instance that can be used repeatedly: int fileCount = 1273; String diskName = "MyDisk"; Object[] testArgs = {new Long(fileCount), diskName}; MessageFormat form = new MessageFormat( "The disk \"{1}\" contains {0} file(s)."); System.out.println(form.format(testArgs)); The output with different values for fileCount: The disk "MyDisk" contains 0 file(s). The disk "MyDisk" contains 1 file(s). The disk "MyDisk" contains 1,273 file(s). For more sophisticated patterns, you can use a ChoiceFormat to produce correct forms for singular and plural: MessageFormat form = new MessageFormat("The disk \"{1}\" contains {0}."); double[] filelimits = {0,1,2}; String[] filepart = {"no files","one file","{0,number} files"}; ChoiceFormat fileform = new ChoiceFormat(filelimits, filepart); form.setFormatByArgumentIndex(0, fileform); int fileCount = 1273; String diskName = "MyDisk"; Object[] testArgs = {new Long(fileCount), diskName}; System.out.println(form.format(testArgs)); The output with different values for fileCount: The disk "MyDisk" contains no files. The disk "MyDisk" contains one file. The disk "MyDisk" contains 1,273 files. You can create the ChoiceFormat programmatically, as in the above example, or by using a pattern. See ChoiceFormat for more information. form.applyPattern( "There {0,choice,0#are no files|1#is one file|1<are {0,number,integer} files}."); Note: As we see above, the string produced by a ChoiceFormat in MessageFormat is treated as special; occurrences of '{' are used to indicate subformats, and cause recursion. If you create both a MessageFormat and ChoiceFormat programmatically (instead of using the string patterns), then be careful not to produce a format that recurses on itself, which will cause an infinite loop. When a single argument is parsed more than once in the string, the last match will be the final result of the parsing. For example, MessageFormat mf = new MessageFormat("{0,number,#.##}, {0,number,#.#}"); Object[] objs = {new Double(3.1415)}; String result = mf.format( objs ); // result now equals "3.14, 3.1" objs = null; objs = mf.parse(result, new ParsePosition(0)); // objs now equals {new Double(3.1)} Likewise, parsing with a MessageFormat object using patterns containing multiple occurrences of the same argument would return the last match. For example, MessageFormat mf = new MessageFormat("{0}, {0}, {0}"); String forParsing = "x, y, z"; Object[] objs = mf.parse(forParsing, new ParsePosition(0)); // result now equals {new String("z")} Synchronization Message formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
Defines constants that are used as attribute keys in the AttributedCharacterIterator returned from MessageFormat.formatToCharacterIterator.
Defines constants that are used as attribute keys in the AttributedCharacterIterator returned from MessageFormat.formatToCharacterIterator.
This class provides the method normalize which transforms Unicode text into an equivalent composed or decomposed form, allowing for easier sorting and searching of text. The normalize method supports the standard normalization forms described in
Unicode Standard Annex #15 — Unicode Normalization Forms.
Characters with accents or other adornments can be encoded in several different ways in Unicode. For example, take the character A-acute. In Unicode, this can be encoded as a single character (the "composed" form):
U+00C1 LATIN CAPITAL LETTER A WITH ACUTE
or as two separate characters (the "decomposed" form):
U+0041 LATIN CAPITAL LETTER A
U+0301 COMBINING ACUTE ACCENT
To a user of your program, however, both of these sequences should be treated as the same "user-level" character "A with acute accent". When you are searching or comparing text, you must ensure that these two sequences are treated as equivalent. In addition, you must handle characters with more than one accent. Sometimes the order of a character's combining accents is significant, while in other cases accent sequences in different orders are really equivalent.
Similarly, the string "ffi" can be encoded as three separate letters:
U+0066 LATIN SMALL LETTER F
U+0066 LATIN SMALL LETTER F
U+0069 LATIN SMALL LETTER I
or as the single character
U+FB03 LATIN SMALL LIGATURE FFI
The ffi ligature is not a distinct semantic character, and strictly speaking it shouldn't be in Unicode at all, but it was included for compatibility with existing character sets that already provided it. The Unicode standard identifies such characters by giving them "compatibility" decompositions into the corresponding semantic characters. When sorting and searching, you will often want to use these mappings.
The normalize method helps solve these problems by transforming text into the canonical composed and decomposed forms as shown in the first example above. In addition, you can have it perform compatibility decompositions so that you can treat compatibility characters the same as their equivalents. Finally, the normalize method rearranges accents into the proper canonical order, so that you do not have to worry about accent rearrangement on your own.
The W3C generally recommends to exchange texts in NFC. Note also that most legacy character encodings use only precomposed forms and often do not encode any combining marks by themselves. For conversion to such character encodings the Unicode text needs to be normalized to NFC. For more usage examples, see the Unicode Standard Annex.
This class provides the method normalize which transforms Unicode text into an equivalent composed or decomposed form, allowing for easier sorting and searching of text. The normalize method supports the standard normalization forms described in Unicode Standard Annex #15 — Unicode Normalization Forms. Characters with accents or other adornments can be encoded in several different ways in Unicode. For example, take the character A-acute. In Unicode, this can be encoded as a single character (the "composed" form): U+00C1 LATIN CAPITAL LETTER A WITH ACUTE or as two separate characters (the "decomposed" form): U+0041 LATIN CAPITAL LETTER A U+0301 COMBINING ACUTE ACCENT To a user of your program, however, both of these sequences should be treated as the same "user-level" character "A with acute accent". When you are searching or comparing text, you must ensure that these two sequences are treated as equivalent. In addition, you must handle characters with more than one accent. Sometimes the order of a character's combining accents is significant, while in other cases accent sequences in different orders are really equivalent. Similarly, the string "ffi" can be encoded as three separate letters: U+0066 LATIN SMALL LETTER F U+0066 LATIN SMALL LETTER F U+0069 LATIN SMALL LETTER I or as the single character U+FB03 LATIN SMALL LIGATURE FFI The ffi ligature is not a distinct semantic character, and strictly speaking it shouldn't be in Unicode at all, but it was included for compatibility with existing character sets that already provided it. The Unicode standard identifies such characters by giving them "compatibility" decompositions into the corresponding semantic characters. When sorting and searching, you will often want to use these mappings. The normalize method helps solve these problems by transforming text into the canonical composed and decomposed forms as shown in the first example above. In addition, you can have it perform compatibility decompositions so that you can treat compatibility characters the same as their equivalents. Finally, the normalize method rearranges accents into the proper canonical order, so that you do not have to worry about accent rearrangement on your own. The W3C generally recommends to exchange texts in NFC. Note also that most legacy character encodings use only precomposed forms and often do not encode any combining marks by themselves. For conversion to such character encodings the Unicode text needs to be normalized to NFC. For more usage examples, see the Unicode Standard Annex.
NumberFormat is the abstract base class for all number formats. This class provides the interface for formatting and parsing numbers. NumberFormat also provides methods for determining which locales have number formats, and what their names are.
NumberFormat helps you to format and parse numbers for any locale. Your code can be completely independent of the locale conventions for decimal points, thousands-separators, or even the particular decimal digits used, or whether the number format is even decimal.
To format a number for the current Locale, use one of the factory class methods:
myString = NumberFormat.getInstance().format(myNumber);
If you are formatting multiple numbers, it is more efficient to get the format and use it multiple times so that the system doesn't have to fetch the information about the local language and country conventions multiple times.
NumberFormat nf = NumberFormat.getInstance(); for (int i = 0; i < myNumber.length; +i) { output.println(nf.format(myNumber[i]) "; "); }
To format a number for a different Locale, specify it in the call to getInstance.
NumberFormat nf = NumberFormat.getInstance(Locale.FRENCH);
You can also use a NumberFormat to parse numbers:
myNumber = nf.parse(myString);
Use getInstance or getNumberInstance to get the normal number format. Use getIntegerInstance to get an integer number format. Use getCurrencyInstance to get the currency number format. And use getPercentInstance to get a format for displaying percentages. With this format, a fraction like 0.53 is displayed as 53%.
You can also control the display of numbers with such methods as setMinimumFractionDigits. If you want even more control over the format or parsing, or want to give your users more control, you can try casting the NumberFormat you get from the factory methods to a DecimalFormat. This will work for the vast majority of locales; just remember to put it in a try block in case you encounter an unusual one.
NumberFormat and DecimalFormat are designed such that some controls work for formatting and others work for parsing. The following is the detailed description for each these control methods,
setParseIntegerOnly : only affects parsing, e.g. if true, "3456.78" → 3456 (and leaves the parse position just after index 6) if false, "3456.78" → 3456.78 (and leaves the parse position just after index 8) This is independent of formatting. If you want to not show a decimal point where there might be no digits after the decimal point, use setDecimalSeparatorAlwaysShown.
setDecimalSeparatorAlwaysShown : only affects formatting, and only where there might be no digits after the decimal point, such as with a pattern like "#,##0.##", e.g., if true, 3456.00 → "3,456." if false, 3456.00 → "3456" This is independent of parsing. If you want parsing to stop at the decimal point, use setParseIntegerOnly.
You can also use forms of the parse and format methods with ParsePosition and FieldPosition to allow you to:
progressively parse through pieces of a string align the decimal point and other areas
For example, you can align numbers in two ways:
If you are using a monospaced font with spacing for alignment, you can pass the FieldPosition in your format call, with field = INTEGER_FIELD. On output, getEndIndex will be set to the offset between the last character of the integer and the decimal. Add (desiredSpaceCount - getEndIndex) spaces at the front of the string.
If you are using proportional fonts, instead of padding with spaces, measure the width of the string in pixels from the start to getEndIndex. Then move the pen by (desiredPixelWidth - widthToAlignmentPoint) before drawing the text. It also works where there is no decimal, but possibly additional characters at the end, e.g., with parentheses in negative numbers: "(12)" for -12.
Synchronization
Number formats are generally not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
NumberFormat is the abstract base class for all number formats. This class provides the interface for formatting and parsing numbers. NumberFormat also provides methods for determining which locales have number formats, and what their names are. NumberFormat helps you to format and parse numbers for any locale. Your code can be completely independent of the locale conventions for decimal points, thousands-separators, or even the particular decimal digits used, or whether the number format is even decimal. To format a number for the current Locale, use one of the factory class methods: myString = NumberFormat.getInstance().format(myNumber); If you are formatting multiple numbers, it is more efficient to get the format and use it multiple times so that the system doesn't have to fetch the information about the local language and country conventions multiple times. NumberFormat nf = NumberFormat.getInstance(); for (int i = 0; i < myNumber.length; +i) { output.println(nf.format(myNumber[i]) "; "); } To format a number for a different Locale, specify it in the call to getInstance. NumberFormat nf = NumberFormat.getInstance(Locale.FRENCH); You can also use a NumberFormat to parse numbers: myNumber = nf.parse(myString); Use getInstance or getNumberInstance to get the normal number format. Use getIntegerInstance to get an integer number format. Use getCurrencyInstance to get the currency number format. And use getPercentInstance to get a format for displaying percentages. With this format, a fraction like 0.53 is displayed as 53%. You can also control the display of numbers with such methods as setMinimumFractionDigits. If you want even more control over the format or parsing, or want to give your users more control, you can try casting the NumberFormat you get from the factory methods to a DecimalFormat. This will work for the vast majority of locales; just remember to put it in a try block in case you encounter an unusual one. NumberFormat and DecimalFormat are designed such that some controls work for formatting and others work for parsing. The following is the detailed description for each these control methods, setParseIntegerOnly : only affects parsing, e.g. if true, "3456.78" → 3456 (and leaves the parse position just after index 6) if false, "3456.78" → 3456.78 (and leaves the parse position just after index 8) This is independent of formatting. If you want to not show a decimal point where there might be no digits after the decimal point, use setDecimalSeparatorAlwaysShown. setDecimalSeparatorAlwaysShown : only affects formatting, and only where there might be no digits after the decimal point, such as with a pattern like "#,##0.##", e.g., if true, 3456.00 → "3,456." if false, 3456.00 → "3456" This is independent of parsing. If you want parsing to stop at the decimal point, use setParseIntegerOnly. You can also use forms of the parse and format methods with ParsePosition and FieldPosition to allow you to: progressively parse through pieces of a string align the decimal point and other areas For example, you can align numbers in two ways: If you are using a monospaced font with spacing for alignment, you can pass the FieldPosition in your format call, with field = INTEGER_FIELD. On output, getEndIndex will be set to the offset between the last character of the integer and the decimal. Add (desiredSpaceCount - getEndIndex) spaces at the front of the string. If you are using proportional fonts, instead of padding with spaces, measure the width of the string in pixels from the start to getEndIndex. Then move the pen by (desiredPixelWidth - widthToAlignmentPoint) before drawing the text. It also works where there is no decimal, but possibly additional characters at the end, e.g., with parentheses in negative numbers: "(12)" for -12. Synchronization Number formats are generally not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
Defines constants that are used as attribute keys in the AttributedCharacterIterator returned from NumberFormat.formatToCharacterIterator and as field identifiers in FieldPosition.
Defines constants that are used as attribute keys in the AttributedCharacterIterator returned from NumberFormat.formatToCharacterIterator and as field identifiers in FieldPosition.
Signals that an error has been reached unexpectedly while parsing.
Signals that an error has been reached unexpectedly while parsing.
ParsePosition is a simple class used by Format and its subclasses to keep track of the current position during parsing. The parseObject method in the various Format classes requires a ParsePosition object as an argument.
By design, as you parse through a string with different formats, you can use the same ParsePosition, since the index parameter records the current position.
ParsePosition is a simple class used by Format and its subclasses to keep track of the current position during parsing. The parseObject method in the various Format classes requires a ParsePosition object as an argument. By design, as you parse through a string with different formats, you can use the same ParsePosition, since the index parameter records the current position.
The RuleBasedCollator class is a concrete subclass of Collator that provides a simple, data-driven, table collator. With this class you can create a customized table-based Collator. RuleBasedCollator maps characters to sort keys.
RuleBasedCollator has the following restrictions for efficiency (other subclasses may be used for more complex languages) :
If a special collation rule controlled by a <modifier> is specified it applies to the whole collator object. All non-mentioned characters are at the end of the collation order.
The collation table is composed of a list of collation rules, where each rule is of one of three forms:
<modifier> <relation> <text-argument> <reset> <text-argument> The definitions of the rule elements is as follows:
Text-Argument: A text-argument is any sequence of characters, excluding special characters (that is, common whitespace characters [0009-000D, 0020] and rule syntax characters [0021-002F, 003A-0040, 005B-0060, 007B-007E]). If those characters are desired, you can put them in single quotes (e.g. ampersand => '&'). Note that unquoted white space characters are ignored; e.g. b c is treated as bc. Modifier: There are currently two modifiers that turn on special collation rules.
'@' : Turns on backwards sorting of accents (secondary
differences), as in French.
'!' : Turns on Thai/Lao vowel-consonant swapping. If this
rule is in force when a Thai vowel of the range
\U0E40-\U0E44 precedes a Thai consonant of the range
\U0E01-\U0E2E OR a Lao vowel of the range \U0EC0-\U0EC4
precedes a Lao consonant of the range \U0E81-\U0EAE then
the vowel is placed after the consonant for collation
purposes.
'@' : Indicates that accents are sorted backwards, as in French.
Relation: The relations are the following:
'<' : Greater, as a letter difference (primary)
';' : Greater, as an accent difference (secondary)
',' : Greater, as a case difference (tertiary)
'=' : Equal
Reset: There is a single reset which is used primarily for contractions and expansions, but which can also be used to add a modification at the end of a set of rules. '&' : Indicates that the next rule follows the position to where the reset text-argument would be sorted.
This sounds more complicated than it is in practice. For example, the following are equivalent ways of expressing the same thing:
a < b < c a < b & b < c a < c & a < b
Notice that the order is important, as the subsequent item goes immediately after the text-argument. The following are not equivalent:
a < b & a < c a < c & a < b
Either the text-argument must already be present in the sequence, or some initial substring of the text-argument must be present. (e.g. "a < b & ae < e" is valid since "a" is present in the sequence before "ae" is reset). In this latter case, "ae" is not entered and treated as a single character; instead, "e" is sorted as if it were expanded to two characters: "a" followed by an "e". This difference appears in natural languages: in traditional Spanish "ch" is treated as though it contracts to a single character (expressed as "c < ch < d"), while in traditional German a-umlaut is treated as though it expanded to two characters (expressed as "a,A < b,B ... &ae;\u00e3&AE;\u00c3"). [\u00e3 and \u00c3 are, of course, the escape sequences for a-umlaut.]
Ignorable Characters
For ignorable characters, the first rule must start with a relation (the examples we have used above are really fragments; "a < b" really should be "< a < b"). If, however, the first relation is not "<", then all the all text-arguments up to the first "<" are ignorable. For example, ", - < a < b" makes "-" an ignorable character, as we saw earlier in the word "black-birds". In the samples for different languages, you see that most accents are ignorable.
Normalization and Accents
RuleBasedCollator automatically processes its rule table to include both pre-composed and combining-character versions of accented characters. Even if the provided rule string contains only base characters and separate combining accent characters, the pre-composed accented characters matching all canonical combinations of characters from the rule string will be entered in the table.
This allows you to use a RuleBasedCollator to compare accented strings even when the collator is set to NO_DECOMPOSITION. There are two caveats, however. First, if the strings to be collated contain combining sequences that may not be in canonical order, you should set the collator to CANONICAL_DECOMPOSITION or FULL_DECOMPOSITION to enable sorting of combining sequences. Second, if the strings contain characters with compatibility decompositions (such as full-width and half-width forms), you must use FULL_DECOMPOSITION, since the rule tables only include canonical mappings.
Errors
The following are errors:
A text-argument contains unquoted punctuation symbols
(e.g. "a < b-c < d").
A relation or reset character not followed by a text-argument
(e.g. "a < ,b").
A reset where the text-argument (or an initial substring of the
text-argument) is not already in the sequence.
(e.g. "a < b & e < f")
If you produce one of these errors, a RuleBasedCollator throws a ParseException.
Examples Simple: "< a < b < c < d" Norwegian: "< a, A < b, B < c, C < d, D < e, E < f, F < g, G < h, H < i, I < j, J < k, K < l, L < m, M < n, N < o, O < p, P < q, Q < r, R < s, S < t, T < u, U < v, V < w, W < x, X < y, Y < z, Z < \u00E6, \u00C6 < \u00F8, \u00D8 < \u00E5 = a\u030A, \u00C5 = A\u030A; aa, AA"
To create a RuleBasedCollator object with specialized rules tailored to your needs, you construct the RuleBasedCollator with the rules contained in a String object. For example:
String simple = "< a< b< c< d"; RuleBasedCollator mySimple = new RuleBasedCollator(simple);
Or:
String Norwegian = "< a, A < b, B < c, C < d, D < e, E < f, F < g, G < h, H < i, I" "< j, J < k, K < l, L < m, M < n, N < o, O < p, P < q, Q < r, R" "< s, S < t, T < u, U < v, V < w, W < x, X < y, Y < z, Z" "< \u00E6, \u00C6" // Latin letter ae & AE "< \u00F8, \u00D8" // Latin letter o & O with stroke "< \u00E5 = a\u030A," // Latin letter a with ring above " \u00C5 = A\u030A;" // Latin letter A with ring above " aa, AA"; RuleBasedCollator myNorwegian = new RuleBasedCollator(Norwegian);
A new collation rules string can be created by concatenating rules strings. For example, the rules returned by getRules() could be concatenated to combine multiple RuleBasedCollators.
The following example demonstrates how to change the order of non-spacing accents,
// old rule String oldRules = "=\u0301;\u0300;\u0302;\u0308" // main accents ";\u0327;\u0303;\u0304;\u0305" // main accents ";\u0306;\u0307;\u0309;\u030A" // main accents ";\u030B;\u030C;\u030D;\u030E" // main accents ";\u030F;\u0310;\u0311;\u0312" // main accents "< a , A ; ae, AE ; \u00e6 , \u00c6" "< b , B < c, C < e, E & C < d, D"; // change the order of accent characters String addOn = "& \u0300 ; \u0308 ; \u0302"; RuleBasedCollator myCollator = new RuleBasedCollator(oldRules addOn);
The RuleBasedCollator class is a concrete subclass of Collator that provides a simple, data-driven, table collator. With this class you can create a customized table-based Collator. RuleBasedCollator maps characters to sort keys. RuleBasedCollator has the following restrictions for efficiency (other subclasses may be used for more complex languages) : If a special collation rule controlled by a <modifier> is specified it applies to the whole collator object. All non-mentioned characters are at the end of the collation order. The collation table is composed of a list of collation rules, where each rule is of one of three forms: <modifier> <relation> <text-argument> <reset> <text-argument> The definitions of the rule elements is as follows: Text-Argument: A text-argument is any sequence of characters, excluding special characters (that is, common whitespace characters [0009-000D, 0020] and rule syntax characters [0021-002F, 003A-0040, 005B-0060, 007B-007E]). If those characters are desired, you can put them in single quotes (e.g. ampersand => '&'). Note that unquoted white space characters are ignored; e.g. b c is treated as bc. Modifier: There are currently two modifiers that turn on special collation rules. '@' : Turns on backwards sorting of accents (secondary differences), as in French. '!' : Turns on Thai/Lao vowel-consonant swapping. If this rule is in force when a Thai vowel of the range \U0E40-\U0E44 precedes a Thai consonant of the range \U0E01-\U0E2E OR a Lao vowel of the range \U0EC0-\U0EC4 precedes a Lao consonant of the range \U0E81-\U0EAE then the vowel is placed after the consonant for collation purposes. '@' : Indicates that accents are sorted backwards, as in French. Relation: The relations are the following: '<' : Greater, as a letter difference (primary) ';' : Greater, as an accent difference (secondary) ',' : Greater, as a case difference (tertiary) '=' : Equal Reset: There is a single reset which is used primarily for contractions and expansions, but which can also be used to add a modification at the end of a set of rules. '&' : Indicates that the next rule follows the position to where the reset text-argument would be sorted. This sounds more complicated than it is in practice. For example, the following are equivalent ways of expressing the same thing: a < b < c a < b & b < c a < c & a < b Notice that the order is important, as the subsequent item goes immediately after the text-argument. The following are not equivalent: a < b & a < c a < c & a < b Either the text-argument must already be present in the sequence, or some initial substring of the text-argument must be present. (e.g. "a < b & ae < e" is valid since "a" is present in the sequence before "ae" is reset). In this latter case, "ae" is not entered and treated as a single character; instead, "e" is sorted as if it were expanded to two characters: "a" followed by an "e". This difference appears in natural languages: in traditional Spanish "ch" is treated as though it contracts to a single character (expressed as "c < ch < d"), while in traditional German a-umlaut is treated as though it expanded to two characters (expressed as "a,A < b,B ... &ae;\u00e3&AE;\u00c3"). [\u00e3 and \u00c3 are, of course, the escape sequences for a-umlaut.] Ignorable Characters For ignorable characters, the first rule must start with a relation (the examples we have used above are really fragments; "a < b" really should be "< a < b"). If, however, the first relation is not "<", then all the all text-arguments up to the first "<" are ignorable. For example, ", - < a < b" makes "-" an ignorable character, as we saw earlier in the word "black-birds". In the samples for different languages, you see that most accents are ignorable. Normalization and Accents RuleBasedCollator automatically processes its rule table to include both pre-composed and combining-character versions of accented characters. Even if the provided rule string contains only base characters and separate combining accent characters, the pre-composed accented characters matching all canonical combinations of characters from the rule string will be entered in the table. This allows you to use a RuleBasedCollator to compare accented strings even when the collator is set to NO_DECOMPOSITION. There are two caveats, however. First, if the strings to be collated contain combining sequences that may not be in canonical order, you should set the collator to CANONICAL_DECOMPOSITION or FULL_DECOMPOSITION to enable sorting of combining sequences. Second, if the strings contain characters with compatibility decompositions (such as full-width and half-width forms), you must use FULL_DECOMPOSITION, since the rule tables only include canonical mappings. Errors The following are errors: A text-argument contains unquoted punctuation symbols (e.g. "a < b-c < d"). A relation or reset character not followed by a text-argument (e.g. "a < ,b"). A reset where the text-argument (or an initial substring of the text-argument) is not already in the sequence. (e.g. "a < b & e < f") If you produce one of these errors, a RuleBasedCollator throws a ParseException. Examples Simple: "< a < b < c < d" Norwegian: "< a, A < b, B < c, C < d, D < e, E < f, F < g, G < h, H < i, I < j, J < k, K < l, L < m, M < n, N < o, O < p, P < q, Q < r, R < s, S < t, T < u, U < v, V < w, W < x, X < y, Y < z, Z < \u00E6, \u00C6 < \u00F8, \u00D8 < \u00E5 = a\u030A, \u00C5 = A\u030A; aa, AA" To create a RuleBasedCollator object with specialized rules tailored to your needs, you construct the RuleBasedCollator with the rules contained in a String object. For example: String simple = "< a< b< c< d"; RuleBasedCollator mySimple = new RuleBasedCollator(simple); Or: String Norwegian = "< a, A < b, B < c, C < d, D < e, E < f, F < g, G < h, H < i, I" "< j, J < k, K < l, L < m, M < n, N < o, O < p, P < q, Q < r, R" "< s, S < t, T < u, U < v, V < w, W < x, X < y, Y < z, Z" "< \u00E6, \u00C6" // Latin letter ae & AE "< \u00F8, \u00D8" // Latin letter o & O with stroke "< \u00E5 = a\u030A," // Latin letter a with ring above " \u00C5 = A\u030A;" // Latin letter A with ring above " aa, AA"; RuleBasedCollator myNorwegian = new RuleBasedCollator(Norwegian); A new collation rules string can be created by concatenating rules strings. For example, the rules returned by getRules() could be concatenated to combine multiple RuleBasedCollators. The following example demonstrates how to change the order of non-spacing accents, // old rule String oldRules = "=\u0301;\u0300;\u0302;\u0308" // main accents ";\u0327;\u0303;\u0304;\u0305" // main accents ";\u0306;\u0307;\u0309;\u030A" // main accents ";\u030B;\u030C;\u030D;\u030E" // main accents ";\u030F;\u0310;\u0311;\u0312" // main accents "< a , A ; ae, AE ; \u00e6 , \u00c6" "< b , B < c, C < e, E & C < d, D"; // change the order of accent characters String addOn = "& \u0300 ; \u0308 ; \u0302"; RuleBasedCollator myCollator = new RuleBasedCollator(oldRules addOn);
SimpleDateFormat is a concrete class for formatting and parsing dates in a locale-sensitive manner. It allows for formatting (date → text), parsing (text → date), and normalization.
SimpleDateFormat allows you to start by choosing any user-defined patterns for date-time formatting. However, you are encouraged to create a date-time formatter with either getTimeInstance, getDateInstance, or getDateTimeInstance in DateFormat. Each of these class methods can return a date/time formatter initialized with a default format pattern. You may modify the format pattern using the applyPattern methods as desired. For more information on using these methods, see DateFormat.
Date and Time Patterns
Date and time formats are specified by date and time pattern strings. Within date and time pattern strings, unquoted letters from 'A' to 'Z' and from 'a' to 'z' are interpreted as pattern letters representing the components of a date or time string. Text can be quoted using single quotes (') to avoid interpretation. "''" represents a single quote. All other characters are not interpreted; they're simply copied into the output string during formatting or matched against the input string during parsing.
The following pattern letters are defined (all other characters from 'A' to 'Z' and from 'a' to 'z' are reserved):
Letter
Date or Time Component
Presentation
Examples
G
Era designator
Text
AD
y
Year
Year
1996; 96
Y
Week year
Year
2009; 09
M
Month in year (context sensitive)
Month
July; Jul; 07
L
Month in year (standalone form)
Month
July; Jul; 07
w
Week in year
Number
27
W
Week in month
Number
2
D
Day in year
Number
189
d
Day in month
Number
10
F
Day of week in month
Number
2
E
Day name in week
Text
Tuesday; Tue
u
Day number of week (1 = Monday, ..., 7 = Sunday)
Number
1
a
Am/pm marker
Text
PM
H
Hour in day (0-23)
Number
0
k
Hour in day (1-24)
Number
24
K
Hour in am/pm (0-11)
Number
0
h
Hour in am/pm (1-12)
Number
12
m
Minute in hour
Number
30
s
Second in minute
Number
55
S
Millisecond
Number
978
z
Time zone
General time zone
Pacific Standard Time; PST; GMT-08:00
Z
Time zone
RFC 822 time zone
-0800
X
Time zone
ISO 8601 time zone
-08; -0800; -08:00
Pattern letters are usually repeated, as their number determines the exact presentation:
Text: For formatting, if the number of pattern letters is 4 or more, the full form is used; otherwise a short or abbreviated form is used if available. For parsing, both forms are accepted, independent of the number of pattern letters. Number: For formatting, the number of pattern letters is the minimum number of digits, and shorter numbers are zero-padded to this amount. For parsing, the number of pattern letters is ignored unless it's needed to separate two adjacent fields. Year: If the formatter's Calendar is the Gregorian calendar, the following rules are applied.
For formatting, if the number of pattern letters is 2, the year
is truncated to 2 digits; otherwise it is interpreted as a
number.
For parsing, if the number of pattern letters is more than 2,
the year is interpreted literally, regardless of the number of
digits. So using the pattern "MM/dd/yyyy", "01/11/12" parses to
Jan 11, 12 A.D.
For parsing with the abbreviated year pattern ("y" or "yy"),
SimpleDateFormat must interpret the abbreviated year
relative to some century. It does this by adjusting dates to be
within 80 years before and 20 years after the time the SimpleDateFormat
instance is created. For example, using a pattern of "MM/dd/yy" and a
SimpleDateFormat instance created on Jan 1, 1997, the string
"01/11/12" would be interpreted as Jan 11, 2012 while the string "05/04/64"
would be interpreted as May 4, 1964.
During parsing, only strings consisting of exactly two digits, as defined by
Character.isDigit(char), will be parsed into the default century.
Any other numeric string, such as a one digit string, a three or more digit
string, or a two digit string that isn't all digits (for example, "-1"), is
interpreted literally. So "01/02/3" or "01/02/003" are parsed, using the
same pattern, as Jan 2, 3 AD. Likewise, "01/02/-3" is parsed as Jan 2, 4 BC.
Otherwise, calendar system specific forms are applied.
For both formatting and parsing, if the number of pattern
letters is 4 or more, a calendar specific long form is used. Otherwise, a calendar
specific short or abbreviated form
is used.
If week year 'Y' is specified and the calendar doesn't support any week
years, the calendar year ('y') is used instead. The
support of week years can be tested with a call to getCalendar().isWeekDateSupported().
Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number.
Letter M produces context-sensitive month names, such as the
embedded form of names. If a DateFormatSymbols has been set
explicitly with constructor SimpleDateFormat(String,
DateFormatSymbols) or method setDateFormatSymbols(DateFormatSymbols), the month names given by
the DateFormatSymbols are used.
Letter L produces the standalone form of month names.
General time zone: Time zones are interpreted as text if they have names. For time zones representing a GMT offset value, the following syntax is used:
GMTOffsetTimeZone:
GMT Sign Hours : Minutes
Sign: one of
+ -
Hours:
Digit
Digit Digit
Minutes:
Digit Digit
Digit: one of
0 1 2 3 4 5 6 7 8 9
Hours must be between 0 and 23, and Minutes must be between
00 and 59. The format is locale independent and digits must be taken
from the Basic Latin block of the Unicode standard.
For parsing, RFC 822 time zones are also
accepted.
RFC 822 time zone: For formatting, the RFC 822 4-digit time zone format is used:
RFC822TimeZone:
Sign TwoDigitHours Minutes
TwoDigitHours:
Digit Digit
TwoDigitHours must be between 00 and 23. Other definitions
are as for general time zones.
For parsing, general time zones are also
accepted.
ISO 8601 Time zone: The number of pattern letters designates the format for both formatting and parsing as follows:
ISO8601TimeZone:
OneLetterISO8601TimeZone
TwoLetterISO8601TimeZone
ThreeLetterISO8601TimeZone
OneLetterISO8601TimeZone:
Sign TwoDigitHours
Z
TwoLetterISO8601TimeZone:
Sign TwoDigitHours Minutes
Z
ThreeLetterISO8601TimeZone:
Sign TwoDigitHours : Minutes
Z
Other definitions are as for general time zones or
RFC 822 time zones.
For formatting, if the offset value from GMT is 0, "Z" is
produced. If the number of pattern letters is 1, any fraction of an hour
is ignored. For example, if the pattern is "X" and the time zone is
"GMT+05:30", "+05" is produced.
For parsing, "Z" is parsed as the UTC time zone designator.
General time zones are not accepted.
If the number of pattern letters is 4 or more, IllegalArgumentException is thrown when constructing a SimpleDateFormat or applying a
pattern.
SimpleDateFormat also supports localized date and time pattern strings. In these strings, the pattern letters described above may be replaced with other, locale dependent, pattern letters. SimpleDateFormat does not deal with the localization of text other than the pattern letters; that's up to the client of the class.
Examples
The following examples show how date and time patterns are interpreted in the U.S. locale. The given date and time are 2001-07-04 12:08:56 local time in the U.S. Pacific Time time zone.
Date and Time Pattern
Result
"yyyy.MM.dd G 'at' HH:mm:ss z"
2001.07.04 AD at 12:08:56 PDT
"EEE, MMM d, ''yy"
Wed, Jul 4, '01
"h:mm a"
12:08 PM
"hh 'o''clock' a, zzzz"
12 o'clock PM, Pacific Daylight Time
"K:mm a, z"
0:08 PM, PDT
"yyyyy.MMMMM.dd GGG hh:mm aaa"
02001.July.04 AD 12:08 PM
"EEE, d MMM yyyy HH:mm:ss Z"
Wed, 4 Jul 2001 12:08:56 -0700
"yyMMddHHmmssZ"
010704120856-0700
"yyyy-MM-dd'T'HH:mm:ss.SSSZ"
2001-07-04T12:08:56.235-0700
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
2001-07-04T12:08:56.235-07:00
"YYYY-'W'ww-u"
2001-W27-3
Synchronization
Date formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
SimpleDateFormat is a concrete class for formatting and parsing dates in a locale-sensitive manner. It allows for formatting (date → text), parsing (text → date), and normalization. SimpleDateFormat allows you to start by choosing any user-defined patterns for date-time formatting. However, you are encouraged to create a date-time formatter with either getTimeInstance, getDateInstance, or getDateTimeInstance in DateFormat. Each of these class methods can return a date/time formatter initialized with a default format pattern. You may modify the format pattern using the applyPattern methods as desired. For more information on using these methods, see DateFormat. Date and Time Patterns Date and time formats are specified by date and time pattern strings. Within date and time pattern strings, unquoted letters from 'A' to 'Z' and from 'a' to 'z' are interpreted as pattern letters representing the components of a date or time string. Text can be quoted using single quotes (') to avoid interpretation. "''" represents a single quote. All other characters are not interpreted; they're simply copied into the output string during formatting or matched against the input string during parsing. The following pattern letters are defined (all other characters from 'A' to 'Z' and from 'a' to 'z' are reserved): Letter Date or Time Component Presentation Examples G Era designator Text AD y Year Year 1996; 96 Y Week year Year 2009; 09 M Month in year (context sensitive) Month July; Jul; 07 L Month in year (standalone form) Month July; Jul; 07 w Week in year Number 27 W Week in month Number 2 D Day in year Number 189 d Day in month Number 10 F Day of week in month Number 2 E Day name in week Text Tuesday; Tue u Day number of week (1 = Monday, ..., 7 = Sunday) Number 1 a Am/pm marker Text PM H Hour in day (0-23) Number 0 k Hour in day (1-24) Number 24 K Hour in am/pm (0-11) Number 0 h Hour in am/pm (1-12) Number 12 m Minute in hour Number 30 s Second in minute Number 55 S Millisecond Number 978 z Time zone General time zone Pacific Standard Time; PST; GMT-08:00 Z Time zone RFC 822 time zone -0800 X Time zone ISO 8601 time zone -08; -0800; -08:00 Pattern letters are usually repeated, as their number determines the exact presentation: Text: For formatting, if the number of pattern letters is 4 or more, the full form is used; otherwise a short or abbreviated form is used if available. For parsing, both forms are accepted, independent of the number of pattern letters. Number: For formatting, the number of pattern letters is the minimum number of digits, and shorter numbers are zero-padded to this amount. For parsing, the number of pattern letters is ignored unless it's needed to separate two adjacent fields. Year: If the formatter's Calendar is the Gregorian calendar, the following rules are applied. For formatting, if the number of pattern letters is 2, the year is truncated to 2 digits; otherwise it is interpreted as a number. For parsing, if the number of pattern letters is more than 2, the year is interpreted literally, regardless of the number of digits. So using the pattern "MM/dd/yyyy", "01/11/12" parses to Jan 11, 12 A.D. For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created. For example, using a pattern of "MM/dd/yy" and a SimpleDateFormat instance created on Jan 1, 1997, the string "01/11/12" would be interpreted as Jan 11, 2012 while the string "05/04/64" would be interpreted as May 4, 1964. During parsing, only strings consisting of exactly two digits, as defined by Character.isDigit(char), will be parsed into the default century. Any other numeric string, such as a one digit string, a three or more digit string, or a two digit string that isn't all digits (for example, "-1"), is interpreted literally. So "01/02/3" or "01/02/003" are parsed, using the same pattern, as Jan 2, 3 AD. Likewise, "01/02/-3" is parsed as Jan 2, 4 BC. Otherwise, calendar system specific forms are applied. For both formatting and parsing, if the number of pattern letters is 4 or more, a calendar specific long form is used. Otherwise, a calendar specific short or abbreviated form is used. If week year 'Y' is specified and the calendar doesn't support any week years, the calendar year ('y') is used instead. The support of week years can be tested with a call to getCalendar().isWeekDateSupported(). Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. Letter M produces context-sensitive month names, such as the embedded form of names. If a DateFormatSymbols has been set explicitly with constructor SimpleDateFormat(String, DateFormatSymbols) or method setDateFormatSymbols(DateFormatSymbols), the month names given by the DateFormatSymbols are used. Letter L produces the standalone form of month names. General time zone: Time zones are interpreted as text if they have names. For time zones representing a GMT offset value, the following syntax is used: GMTOffsetTimeZone: GMT Sign Hours : Minutes Sign: one of + - Hours: Digit Digit Digit Minutes: Digit Digit Digit: one of 0 1 2 3 4 5 6 7 8 9 Hours must be between 0 and 23, and Minutes must be between 00 and 59. The format is locale independent and digits must be taken from the Basic Latin block of the Unicode standard. For parsing, RFC 822 time zones are also accepted. RFC 822 time zone: For formatting, the RFC 822 4-digit time zone format is used: RFC822TimeZone: Sign TwoDigitHours Minutes TwoDigitHours: Digit Digit TwoDigitHours must be between 00 and 23. Other definitions are as for general time zones. For parsing, general time zones are also accepted. ISO 8601 Time zone: The number of pattern letters designates the format for both formatting and parsing as follows: ISO8601TimeZone: OneLetterISO8601TimeZone TwoLetterISO8601TimeZone ThreeLetterISO8601TimeZone OneLetterISO8601TimeZone: Sign TwoDigitHours Z TwoLetterISO8601TimeZone: Sign TwoDigitHours Minutes Z ThreeLetterISO8601TimeZone: Sign TwoDigitHours : Minutes Z Other definitions are as for general time zones or RFC 822 time zones. For formatting, if the offset value from GMT is 0, "Z" is produced. If the number of pattern letters is 1, any fraction of an hour is ignored. For example, if the pattern is "X" and the time zone is "GMT+05:30", "+05" is produced. For parsing, "Z" is parsed as the UTC time zone designator. General time zones are not accepted. If the number of pattern letters is 4 or more, IllegalArgumentException is thrown when constructing a SimpleDateFormat or applying a pattern. SimpleDateFormat also supports localized date and time pattern strings. In these strings, the pattern letters described above may be replaced with other, locale dependent, pattern letters. SimpleDateFormat does not deal with the localization of text other than the pattern letters; that's up to the client of the class. Examples The following examples show how date and time patterns are interpreted in the U.S. locale. The given date and time are 2001-07-04 12:08:56 local time in the U.S. Pacific Time time zone. Date and Time Pattern Result "yyyy.MM.dd G 'at' HH:mm:ss z" 2001.07.04 AD at 12:08:56 PDT "EEE, MMM d, ''yy" Wed, Jul 4, '01 "h:mm a" 12:08 PM "hh 'o''clock' a, zzzz" 12 o'clock PM, Pacific Daylight Time "K:mm a, z" 0:08 PM, PDT "yyyyy.MMMMM.dd GGG hh:mm aaa" 02001.July.04 AD 12:08 PM "EEE, d MMM yyyy HH:mm:ss Z" Wed, 4 Jul 2001 12:08:56 -0700 "yyMMddHHmmssZ" 010704120856-0700 "yyyy-MM-dd'T'HH:mm:ss.SSSZ" 2001-07-04T12:08:56.235-0700 "yyyy-MM-dd'T'HH:mm:ss.SSSXXX" 2001-07-04T12:08:56.235-07:00 "YYYY-'W'ww-u" 2001-W27-3 Synchronization Date formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
An abstract class for service providers that provide concrete implementations of the BreakIterator class.
An abstract class for service providers that provide concrete implementations of the BreakIterator class.
An abstract class for service providers that provide concrete implementations of the Collator class.
An abstract class for service providers that provide concrete implementations of the Collator class.
No vars found in this namespace.
An abstract class for service providers that provide concrete implementations of the DateFormat class.
An abstract class for service providers that provide concrete implementations of the DateFormat class.
An abstract class for service providers that provide instances of the DateFormatSymbols class.
An abstract class for service providers that provide instances of the DateFormatSymbols class.
An abstract class for service providers that provide instances of the DecimalFormatSymbols class.
The requested Locale may contain an extension for specifying the desired numbering system. For example, "ar-u-nu-arab" (in the BCP 47 language tag form) specifies Arabic with the Arabic-Indic digits and symbols, while "ar-u-nu-latn" specifies Arabic with the Latin digits and symbols. Refer to the Unicode Locale Data Markup Language (LDML) specification for numbering systems.
An abstract class for service providers that provide instances of the DecimalFormatSymbols class. The requested Locale may contain an extension for specifying the desired numbering system. For example, "ar-u-nu-arab" (in the BCP 47 language tag form) specifies Arabic with the Arabic-Indic digits and symbols, while "ar-u-nu-latn" specifies Arabic with the Latin digits and symbols. Refer to the Unicode Locale Data Markup Language (LDML) specification for numbering systems.
An abstract class for service providers that provide concrete implementations of the NumberFormat class.
An abstract class for service providers that provide concrete implementations of the NumberFormat class.
StringCharacterIterator implements the CharacterIterator protocol for a String. The StringCharacterIterator class iterates over the entire String.
StringCharacterIterator implements the CharacterIterator protocol for a String. The StringCharacterIterator class iterates over the entire String.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close