Free Toolkit

Encoding DetectorDetect text file character encoding (UTF-8, UTF-16, ASCII, Latin-1).

Encoding Detector illustration
🔄

Encoding Detector

Detect text file character encoding (UTF-8, UTF-16, ASCII, Latin-1).

How to Use
1

Upload text file

Drop or select a text file to analyze.

2

View encoding result

See detected encoding, BOM status, and confidence level.

3

Preview content

View a preview of the decoded text content.

What Is Encoding Detector?

Encoding Detector analyzes text files to determine their character encoding. It checks for Byte Order Marks (BOM) for definitive encoding identification, then uses heuristic analysis for files without BOM. The tool detects UTF-8, UTF-16 (LE/BE), UTF-32 (LE/BE), ASCII, and ISO-8859-1/Windows-1252 encodings. Results include the detected encoding, confidence level, BOM details, analysis explanation, and a decoded content preview.

Why Use Our Encoding Detector?

  • Detects encoding via BOM and heuristic byte analysis.
  • Supports UTF-8, UTF-16, UTF-32, ASCII, and Latin-1/Windows-1252.
  • Shows confidence level and detection method details.
  • Includes decoded content preview to verify detection accuracy.

Common Use Cases

Character Issues

Diagnose mojibake and character display issues by identifying the correct file encoding.

Data Import

Determine file encoding before importing text data to ensure correct character handling.

Legacy Files

Identify encoding of legacy text files that may use non-UTF-8 encodings.

Development

Verify encoding of source code files, CSV data, and configuration files.

Technical Guide

The detector uses a multi-stage approach: 1. BOM Detection: Checks the first 4 bytes for known BOM sequences (UTF-8: EF BB BF, UTF-16 LE: FF FE, UTF-16 BE: FE FF, UTF-32 LE: FF FE 00 00, UTF-32 BE: 00 00 FE FF). BOM presence provides high-confidence detection. 2. UTF-16 Heuristic: Analyzes null byte patterns. UTF-16 files have frequent null bytes in even or odd positions corresponding to ASCII characters encoded in 16-bit. 3. UTF-8 Validation: Validates multi-byte sequences. Valid UTF-8 has specific patterns: 110xxxxx 10xxxxxx for 2-byte, 1110xxxx 10xxxxxx 10xxxxxx for 3-byte, etc. 4. ASCII Detection: If all bytes are in the 0x00-0x7F range, the file is pure ASCII (which is also valid UTF-8). 5. Latin-1 Fallback: If bytes exist in the 0x80-0xFF range but don't form valid UTF-8 sequences, ISO-8859-1/Windows-1252 is likely. Only the first 8KB of the file is analyzed for performance.

Tips & Best Practices

  • 1
    BOM detection provides the highest confidence — files with BOM are definitively identified.
  • 2
    UTF-8 without BOM is detected by validating multi-byte sequences.
  • 3
    ISO-8859-1 and Windows-1252 are detected as a fallback when UTF-8 validation fails.
  • 4
    The content preview helps verify the detection is correct — look for garbled characters.

Related Tools

Frequently Asked Questions

QHow accurate is the detection?
BOM-based detection is 100% accurate. Heuristic detection for UTF-8 is very reliable. Latin-1/Windows-1252 detection is a fallback.
QWhat is a BOM?
A Byte Order Mark is a special byte sequence at the start of a file that identifies its encoding.
QCan it detect Shift-JIS or GB2312?
Currently, the detector focuses on Unicode encodings and Latin-1. East Asian encodings are not specifically detected.
QHow much of the file is analyzed?
The first 8KB (8192 bytes) are analyzed, which is sufficient for reliable encoding detection.
QWhat about mixed encoding files?
The detector assumes a single encoding per file. Mixed encoding files will show the dominant encoding.

About Encoding Detector

Encoding Detector is a free online tool from FreeToolkit.ai. All processing happens directly in your browser — your data never leaves your device. No registration required. No ads. Just fast, reliable tools.