Subtitle text

Subtitle encoding: UTF-8, BOM & garbled text

Last updated: 2026-06-11

Garbled subtitles — Ã© instead of é, or ï»¿ at the top of the file — almost always come from one cause: the file was saved in one text encoding and read in another. The fix is to save your .srt or .vtt as UTF-8 and make sure the player reads UTF-8. UTF-8 represents every script in a single file, which is why it is the recommended encoding for all subtitles. WebVTT requires it; SRT works best as UTF-8 without a byte-order mark.

SRT and VTT are plain text, so the bytes on disk are just characters under some encoding rule. Get the rule right and accents, Cyrillic, Arabic, and CJK all show correctly.

What encoding actually is

An encoding is the agreement that maps characters to bytes. When a file is written, each character becomes one or more bytes; when it is read, those bytes are turned back into characters. If the reader assumes a different encoding than the writer used, the bytes are misinterpreted and you get the wrong characters. Plain ASCII letters (A–Z, digits) survive most encodings unchanged, which is why only accented and non-Latin characters tend to break.

Why UTF-8 is the right choice

UTF-8 can encode every Unicode character, so one file holds Latin, Cyrillic, Greek, Arabic, Hebrew, Hindi, and CJK text without switching encodings. It is backward-compatible with ASCII, so plain English subtitles look identical whether read as ASCII or UTF-8. The WebVTT spec mandates UTF-8. For SRT, UTF-8 is the safe default across VLC, mobile players, and editors. Single-byte encodings like Windows-1252 cannot hold non-Western scripts and are the usual source of breakage.

The byte-order mark (BOM)

A BOM is an invisible marker some editors place at the very start of a file. In UTF-8 it is the three bytes EF BB BF. WebVTT parsers strip it, so it is harmless in .vtt. In .srt it is safer to leave it out: a leading BOM can make a strict player read the first cue index (1) as text and drop cue 1, or show ï»¿on screen. Choose "UTF-8" rather than "UTF-8 with BOM" when your editor offers both.

Garbled-text reference table

Use this to diagnose what you are seeing.

You see	Should be	Cause
Ã©	é	UTF-8 read as Latin-1 / 1252
ï»¿	(nothing)	UTF-8 BOM read as text
Ã±	ñ	UTF-8 read as Latin-1 / 1252
�	any character	Byte invalid in the assumed encoding
????	CJK / Arabic text	Saved in a non-Unicode encoding

How to fix garbled subtitles

First try setting the player to UTF-8 — VLC has a subtitle encoding option under preferences. If the file itself is wrong, open it in a text editor that shows encoding, confirm the characters look right in some encoding, then re-save as UTF-8 without a BOM. If you cannot find a setting that makes it readable, the file may have been double-encoded and needs converting from its original encoding to UTF-8. Translating the file through a tool that always outputs UTF-8 also normalizes it. After fixing encoding, confirm the format is still valid against the SRT format or the WebVTT format.

Key facts

Save subtitles as UTF-8 — it covers every script in one file.
WebVTT requires UTF-8; SRT works best as UTF-8 without a BOM.
BOM in UTF-8 = bytes EF BB BF; harmless in VTT, drop it for SRT.
Ã© means UTF-8 was read as Latin-1 / Windows-1252.
ï»¿ on screen is a BOM shown as text.
SRT and VTT are plain text; only the save encoding matters.

Definitions

Encoding: The rule mapping characters to bytes. The reader must use the same rule as the writer or text garbles.
UTF-8: A Unicode encoding covering every script. The recommended encoding for all subtitle files.
BOM: Byte-order mark, bytes EF BB BF in UTF-8. Marks the file as Unicode; best omitted for SRT.
Mojibake: Garbled text from an encoding mismatch, e.g. Ã© for é or ï»¿ for a BOM.
Windows-1252: A single-byte Western encoding. A common wrong default that mangles accents and non-Latin scripts.
Latin-1 (ISO-8859-1): Another single-byte encoding. Reading UTF-8 as Latin-1 produces the classic Ã© mojibake.

Related guides

FAQ

Why are my subtitles showing garbled characters?+

The file was saved in one encoding and read in another. Accented and non-Latin characters then display as wrong symbols (mojibake) like Ã© instead of é. Saving the .srt or .vtt as UTF-8 and telling the player to read UTF-8 fixes it.

What encoding should subtitle files use?+

UTF-8. It can represent every script — Latin, Cyrillic, Greek, Arabic, Hebrew, CJK — in one file. WebVTT requires UTF-8; SRT works best as UTF-8 without a byte-order mark (BOM).

What is a BOM and should subtitle files have one?+

A byte-order mark (BOM) is an invisible marker (EF BB BF in UTF-8) at the start of a file. WebVTT parsers strip it, so it is harmless there. For SRT it is safer to omit it, because a leading BOM can make some players read the first cue index as text and drop cue 1.

What does ï»¿ at the start of a subtitle mean?+

Those three characters are a UTF-8 BOM being shown as text because the player read the file as a single-byte encoding like Windows-1252. Remove the BOM or set the player to UTF-8.

The file is UTF-8 but the player is reading it as Latin-1 or Windows-1252. In UTF-8, é is two bytes; read one byte at a time as Latin-1 they become Ã©. Tell the player to use UTF-8, or re-save the file in the encoding the player expects.

Are SRT and VTT files binary or text?+

Both are plain text. You can open and edit them in any text editor. The only catch is choosing the right encoding when you save — UTF-8 — so characters survive.

Does translating a subtitle file change its encoding?+

A good translator outputs UTF-8 regardless of the input encoding, so the translated file is clean even if the original was mislabeled. The text changes; the encoding is normalized to UTF-8.