language may refer to human (or natural) languages or computer (often programming) languages.
Why marking up
Before you consider marking up your page with the appropriate language-tags, consider why you are marking up. Don't just mark up because you can markup.
When marking up a h-entry of a post with a
lang attribute, you enable users of a reader to filter out a certain language they don't speak. Thus making it possible to follow a user only in a specific language you speak.
Pelle Wessman on chat: "on Twitter I often don't follow people that tweet too much in a language I don't understand and I hold back on tweeting in swedish because I know it might likewise annoy others"
Twitter does filter on language in Search, but not on the timeline.
Screen readers / text to speach
When someone uses a screen reader, the marked up language can be used to select the right pronunciation rules.
- This post by Sebastiaan Andeweg is a Dutch transcription of English and would thus be best marked up as 'nl', to guide screen readers toward the right pronunciation.
- Martijn van der Ven used to mark up his name with
lang="nl"to guide screen readers towards the right pronunciation of his name.
Translation software can translate certain posts or texts if it knows the language.
- Most translation software can probably detect the language too?
How to mark up
You can specify the language of a HTML document, or a part of it, by using the
lang="??" attribute, where
?? is the language-code for your language. For English, this is
HTML also allows you to mark the language of the target of a hyperlink using the
HTML 5 has also introduced a
translate attribute that allows you to specify that a piece of text ought to not be automatically translated.
- There are thoughts on how to parse
- PHP: http://pear.php.net/package/Text_LanguageDetect/
- Python: https://pypi.python.org/pypi/langdetect/
- JS: https://github.com/wooorm/franc
Christian Weiske uses language detection to automatically create the <html lang="??"> attribute for blog posts from the post's title.
Q: Why detect instead of adding manually?
- Less tedious, less prone to errors
Q: Why detect yourself, if others can detect too?
- Because sometimes they don't, but do things with the lang-attribute.
- Detect once while publishing vs. detect again and again and again and again