TCS(1)TCS(1)
NAME
tcs – translate character sets
SYNOPSIS
tcs
[
-slcv
]
[
-f
ics
]
[
-t
ocs
]
[
file ...
]
DESCRIPTION
Tcs
interprets the named
file(s)
(standard input default) as a stream of characters from the
ics
character set or format, converts them to runes,
and then converts them into a stream of characters from the
ocs
character set or format on the standard output.
The default value for
ics
and
ocs
is
utf,
the
UTF
encoding described in
utf(6).
The
-l
option lists the character sets known to
tcs.
Processing continues in the face of conversion errors (the
-s
option prevents reporting of these errors).
The
-c
option forces the output to contain only correctly converted characters;
otherwise,
Runeerror
(0xFFFD)
characters will be substituted for
UTF
encoding errors and unknown characters.
The
-v
option generates various diagnostic and summary information on standard error,
or makes the
-l
output more verbose.
Tcs
recognizes an ever changing list of character sets.
In particular, it supports a variety of Russian and Japanese encodings.
Some of the supported encodings are
utf
The Plan 9
UTF
encoding, known by ISO as UTF-8
utf1
The deprecated original
UTF
encoding from ISO 10646
ascii
7-bit ASCII
8859-1
Latin-1 (Central European)
8859-2
Latin-2 (Czech .. Slovak)
8859-3
Latin-3 (Dutch .. Turkish)
8859-4
Latin-4 (Scandinavian)
8859-5
Part 5 (Cyrillic)
8859-6
Part 6 (Arabic)
8859-7
Part 7 (Greek)
8859-8
Part 8 (Hebrew)
8859-9
Latin-5 (Finnish .. Portuguese)
html
Unicode as encoded by HTML
koi8
KOI-8 (GOST 19769-74)
jis-kanji
ISO 2022-JP
ujis
EUC-JX: JIS 0208
ms-kanji
Microsoft, or Shift-JIS
jis
(from only) guesses between ISO 2022-JP, EUC or Shift-Jis
gb
Chinese national standard (GB2312-80)
big5
Big 5 (HKU version)
unicode
Unicode Standard 1.0
tis
Thai character set plus
ASCII
(TIS 620-1986)
msdos
IBM PC: CP 437
atari
Atari-ST character set
EXAMPLES
tcs -f 8859-1
Convert 8859-1 (Latin-1) characters into
UTF
format.
tcs -s -f jis
Convert characters encoded in one of several shift JIS encodings into
UTF
format.
Unknown Kanji will be converted into
0xFFFD
characters.
tcs -t html
Convert UTF into character set-independent HTML.
tcs -lv
Print an up to date list of the supported character sets.
SOURCE
/sys/src/cmd/tcs
SEE
ascii(1),
rune(2),
utf(6).