Package org.mozilla.universalchardet
Class UniversalDetector
java.lang.Object
org.mozilla.universalchardet.UniversalDetector
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate String
private boolean
private CharsetProber
private boolean
private UniversalDetector.InputState
private byte
private CharsetListener
static final float
private boolean
private CharsetProber[]
static final float
private boolean
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
dataEnd()
Marks end of data reading.static String
detectCharset
(File file) Gets the charset of a File.static String
detectCharset
(InputStream inputStream) Gets the charset of content from InputStream.static String
detectCharset
(Path path) Gets the charset of a Path.static String
detectCharsetFromBOM
(byte[] buf) private static String
detectCharsetFromBOM
(byte[] buf, int offset) void
handleData
(byte[] buf) Feed the detector with more datavoid
handleData
(byte[] buf, int offset, int length) Feed the detector with more databoolean
isDone()
final void
reset()
Resets detector to be used again.void
setListener
(CharsetListener listener)
-
Field Details
-
SHORTCUT_THRESHOLD
public static final float SHORTCUT_THRESHOLD- See Also:
-
MINIMUM_THRESHOLD
public static final float MINIMUM_THRESHOLD- See Also:
-
inputState
-
done
private boolean done -
start
private boolean start -
gotData
private boolean gotData -
onlyPrintableASCII
private boolean onlyPrintableASCII -
lastChar
private byte lastChar -
detectedCharset
-
probers
-
escCharsetProber
-
listener
-
-
Constructor Details
-
UniversalDetector
public UniversalDetector() -
UniversalDetector
- Parameters:
listener
- a listener object that is notified of the detected encocoding. Can be null.
-
-
Method Details
-
isDone
public boolean isDone() -
getDetectedCharset
- Returns:
- The detected encoding is returned. If the detector couldn't determine what encoding was used, null is returned.
-
setListener
-
getListener
-
handleData
public void handleData(byte[] buf) Feed the detector with more data- Parameters:
buf
- The buffer containing the data
-
handleData
public void handleData(byte[] buf, int offset, int length) Feed the detector with more data- Parameters:
buf
- Buffer with the dataoffset
- initial position of data in buflength
- length of data
-
detectCharsetFromBOM
-
detectCharsetFromBOM
-
dataEnd
public void dataEnd()Marks end of data reading. Finish calculations. -
reset
public final void reset()Resets detector to be used again. -
detectCharset
Gets the charset of a File.- Parameters:
file
- The file to check charset for- Returns:
- The charset of the file, null if cannot be determined
- Throws:
IOException
- if some IO error occurs
-
detectCharset
Gets the charset of a Path.- Parameters:
path
- The path to file to check charset for- Returns:
- The charset of the file, null if cannot be determined
- Throws:
IOException
- if some IO error occurs
-
detectCharset
Gets the charset of content from InputStream.- Parameters:
inputStream
- InputStream containing text file- Returns:
- The charset of the file, null if cannot be determined
- Throws:
IOException
- if some IO error occurs
-