Czech is rumoured to be fantastically difficult to learn, and why invasions of the Czech Republic (and Czechoslovakia) have always failed. That may or may not be true. However, elements of the Czech language, or Cesky XXX or Cestina XXX are often easy to learn compared to other supposedly simpler languages. This page addresses just the characters of the Czech alphabet, or Ceska Abeceda XXXX.
To Quote WikiTravel about the Czech Republic ...
The language is very difficult for English-speakers to grasp, and Czech is considered to be one of the most difficult languages in the world to master. However, if you can learn the alphabet (and the corresponding letters with accents), then pronunciation is easy as it is always the same - Czechs pronounce every letter of a word, with the stress always falling on the first syllable. The combination of consonants in some words may seem mind-bogglingly hard, but it is worth the effort!
The nice thing about the characters in the Czech alphabet is that they always represent the same sound. This makes things easy to pronounce, because you can just say the characters and the workd is spoken correctly. Admittedly, there are some special rules on some adjacent characters which make this slightly more difficult... but it is straight forward and rule-based.
In addition to the normal Roman
(XXX or should that be Latin ???)
characters, Czech adds a couple more characters to get the unique sounds. It uses a character combination 'ch' for one of these characters. For the remainder it uses diacritical marks over other characters to say that it is a different character. The diacritical marks tend to either lengthen the sound of a vowel, or soften the sound of a consonant. Howeveer, those are actually different sounds in Czech.
Czech has 3 diacritical marks.
Diacritical marks "change" the pronunciation of characters in
In Czech the diacritical marks indicate different characters.
As another change, the character combination 'ch' is its
own single character, the 'ch' character.
The 3 marks, more or less in frequency of occurence on
characters of the alphabet are ...
XXX find HTML char equivalents for markups and return
Wikipedia links to explain this stuff in detail.
Some Czech characters are too tall to have the Čarka above them,
so the Čarka follows the characters:
d' -> ď
t' -> ť.
The kroužek is used only with "U" in Czech. It is actually the same sound as a "U" with a čarka. The difference is that the ů is used inside words, while Ú / ú is used at the start of words.
I've put together the following table from a bunch of resources I found and conglomerated into one place.
RFC1345 XXX url is a standardized ASCII (simple text) representation of accented characters. In theory, I could write an RFC1345 converter to make by ascii text into ISO8859 or Unicode Czech text. The only problem is that of distinguishing "'" accents versus single "'" quote characters. Because of that problem, I tend to just process the characters to remove their diacritical marks. This is simple and it works with ASCII browsers. It also can make the Czech words unreadable... either way you lose.
In the standards world ...
The ISO 639 abbreviation for the Czech language is "cs". The two
letter ISO 3166 country code for Czech Republic is "CZ".
Note that the ISO 639/ISO 3166 convention is that language names are written in lower case and country codes are written in upper case.
You can find more
information about Czech and Slovak character
encoding in the document
The Czech and Slovak Character Encoding Mess Explained.
Also in Czech and Slovak Accent Marks.
The Bolo encoding is a pre-processing I am working on in my Czech language source web pages. I use it to avoid having to remember numbers of text such as "ř" aka "ř" aka "&rhacek". I thought of adding Čarka and Kroužek names, but that just gets too complicated. The only goofy ones are ď and ť which should probably be called hacek names ... but I'm staying with appearance when possible for now. OK, so that didn't hold long because of upper and lowercase T haček characters. So yes, my formatter converts 'ť' to '&thacek;' now. The document processor converts the markup to &name; or &#code; phrases as needed, so you shouldn't see them in my web pages. Except here!