UTF-8

From IndieWeb


UTF-8 is a way to encode Unicode characters in variable number of bytes per character. This is known as a multi-byte encoding scheme. UTF-8 is the most widely used encoding scheme for HTML pages on the web.[1]

Using UTF-8

When writing your HTML, or your scripting language that generates the HTML (PHP, Python, etc.) set the encoding in your text editor to UTF-8. Then we need to tell the browser when it receives the HTML that we are using UTF-8. There are two ways of doing this. Firstly set the Content-Type HTTP response header, e.g.

Content-Type: text/html; charset=utf-8

Secondly to include the charset within the HTML document. The recommended way to do this in a HTML5 document is to use a meta tag early on like so:

<!DOCYTPE html>
<html>
  <head>
    <meta charset="UTF-8">
    ...

Warning, if these charset values don’t match, the browser will prioritise the charset defined in the HTTP header over any charset defined within the document itself.

References