1
<!-- ------------------------------------------------------------
2
$Id: index.html,v 1.29 2004/01/16 12:29:21 nrt Exp $
3
Copyright: NARITA Tomio
4
------------------------------------------------------------ -->
6
<!-- ------------------------------------------------------------ -->
8
<TITLE> LV Homepage </TITLE>
10
<!-- ------------------------------------------------------------ -->
11
<BODY BGCOLOR=#ffffe0 TEXT=#c00090 LINK=#0090c0 VLINK=#e000a8 ALINK=#00c090>
14
<FONT SIZE=-2>All rights reserved. Copyright (C) 1996-2004 by NARITA Tomio</FONT> <BR>
15
Last modified at Jan.16th,2004.
21
<H1> <IMG SRC="/~nrt/icons/redball.gif" ALT="">
27
<FONT SIZE=+2>lv - <I>a Powerful Multilingual File Viewer / Grep</I></FONT>
29
<FONT SIZE=+1> The latest version is ver 4.51:
30
<A HREF="#download"> Download </A> </FONT>
35
<A NAME="tableofcontents">
36
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
37
Table of Contents </H2>
42
<LI> <A HREF="#copyright"> Copyright </A>
43
<LI> <A HREF="#feature"> Feature </A>
44
<LI> <A HREF="#download"> Download lv </A>
45
<LI> <A HREF="#install"> Installation </A>
46
<LI> <A HREF="#usage"> Usage </A>
48
<LI> <A HREF="#execution"> How to run lv? </A>
49
<LI> <A HREF="#option"> Command line options </A>
50
<LI> <A HREF="#configuration"> Configuration </A>
51
<LI> <A HREF="#command"> Run-time commands </A>
52
<LI> <A HREF="#search"> How to input search strings? </A>
53
<LI> <A HREF="#regexp"> Regular expressions </A>
55
<LI> <A HREF="#limitations"> Limitations </A>
56
<LI> <A HREF="#codingSystem"> Coding systems </A>
58
<LI> <A HREF="#iso2022"> ISO 2022 based coding systems </A>
60
<LI> <A HREF="#iso2022cn"> iso-2022-cn </A>
61
<LI> <A HREF="#iso2022jp"> iso-2022-jp </A>
62
<LI> <A HREF="#iso2022kr"> iso-2022-kr </A>
64
<LI> <A HREF="#euc"> Extended Unix Code </A>
66
<LI> <A HREF="#eucchina"> euc-china </A>
67
<LI> <A HREF="#eucjapan"> euc-japan </A>
68
<LI> <A HREF="#euckorea"> euc-korea </A>
69
<LI> <A HREF="#euctaiwan"> euc-taiwan </A>
71
<LI> <A HREF="#utf"> UCS transformation format </A>
73
<LI> <A HREF="#utf7"> UTF-7 </A>
74
<LI> <A HREF="#utf8"> UTF-8 </A>
76
<LI> <A HREF="#otherCodingsystem"> Other coding systems </A>
78
<LI> <A HREF="#iso8859"> iso-8859-* </A>
79
<LI> <A HREF="#shiftjis"> shift-jis </A>
80
<LI> <A HREF="#big5"> big5 </A>
81
<LI> <A HREF="#hz"> HZ </A>
82
<LI> <A HREF="#raw"> raw mode </A>
85
<LI> <A HREF="#aboutCodingSystem"> Annotation about encoding/decoding scheme </A>
87
<LI> <A HREF="#invalid"> Handling of invalid codes </A>
88
<LI> <A HREF="#backspace"> Backspace </A>
89
<LI> <A HREF="#binaryFile"> How to look in a binary file? </A>
91
<LI> <A HREF="#autoSelect"> Auto selection of a coding system </A>
93
<LI> <A HREF="#defaultCodingSystem"> Default coding system </A>
94
<LI> <A HREF="#selectionMethod"> How does lv select a coding system? </A>
96
<LI> <A HREF="#color"> Extension for text decoration </A>
97
<LI> <A HREF="#customize"> Customization </A>
98
<!-- <LI> <A HREF="#bug"> Known bugs </A> -->
99
<LI> <A HREF="#bugreport"> Bug report </A>
100
<LI> <A HREF="relnote.html"> Release note </A>
101
<LI> <A HREF="#acknowledgment"> Acknowledgement </A>
102
<LI> <A HREF="#ref"> Reference </A>
109
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
117
All rights reserved. Copyright (C) 1996-2004 by NARITA Tomio.
119
This program is free software; you can redistribute it and/or modify
120
it under the terms of the GNU General Public License as published by
121
the Free Software Foundation; either version 2 of the License, or
122
(at your option) any later version.
124
This program is distributed in the hope that it will be useful,
125
but WITHOUT ANY WARRANTY; without even the implied warranty of
126
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
127
GNU General Public License for more details.
129
You should have received a copy of the GNU General Public License
130
along with this program; if not, write to the Free Software
131
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
134
See also <A HREF="GPL.txt">GNU General Public License Version 2</A>.
140
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
145
<LI> <H3> Multilingual file viewer </H3>
146
<I>lv</I> is a powerful multilingual file viewer.
147
Apparently, lv looks like <I>less</I> (1),
148
a representative file viewer on UNIX as you know,
149
so UNIX people (and <I>less</I> people on other OSs)
150
don't have to learn a burdensome new interface.
151
lv can be used on MSDOS ANSI terminals and almost all UNIX platforms.
152
lv is a currently growing software,
153
so your feedback is welcome
154
and helpful for us to refine the future lv.
156
<LI> <H3> Multiple coding systems </H3>
157
lv can decode and encode multilingual streams
158
through many coding systems, for example,
159
ISO 2022 based coding systems such as iso-2022-jp,
160
and EUC (Extended Unix Code) like euc-japan.
162
localized coding systems
163
such as shift-jis, big5 and HZ are also supported.
164
lv can be used not only as a file viewer
165
but also as a coding-system translation filter
166
like <I>nkf</I> (1) and <I>tcs</I> (1).
168
<LI> <H3> Multilingual regular expressions / Multilingual grep </H3>
169
lv can recognize multi-bytes patterns as regular expressions,
170
and lv also provides multilingual <I>grep</I> (1) functionality
171
by giving it another name, <I>lgrep</I>.
172
Pattern matching is conducted in the charset level,
173
so an EUC fragment, for example,
174
can be found in the ISO 2022 tailored streams, of course.
176
<LI> <H3> Supporting the Unicode standard </H3>
177
lv provides Unicode facilities
178
which enables you to handle Unicode streams encoded in UTF-7 or UTF-8,
179
and lv can also convert their code-points
180
between Unicode and other charsets.
181
So you can display Unicode or foreign texts on your terminal,
182
using the code conversion function
183
to your favorite charsets via Unicode.
184
(However, MSDOS version of lv has none of the Unicode facility.)
186
<LI> <H3> ANSI escape sequence through </H3>
187
lv can recognize ANSI escape sequences for text decoration.
188
So you can look ANSI-decorated streams
189
such as colored source codes generated by another software
190
just like intended image on ANSI terminals.
192
<LI> <H3> Completely original </H3>
193
lv is a completely original software
194
including no code drawn from <I>less</I> and <I>grep</I>
195
and other programs at all.
201
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
206
<LI> Multilingual sample image <BR>
207
<A HREF="hello.sample.gif"> <B>``Hello''s</B> on <I> kterm </I> with lv (gif 15Kbytes) </A> <A HREF="hello.sample"> (Original text from Mule demo) </A>
213
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
218
You can download lv archive.
219
Changes between older versions are described in
220
<A HREF="relnote.html">release note</A>
225
<LI> <A HREF="/~nrt/freeware/lv451.tar.gz">
226
lv v.4.51 (tar and gzip compressed) </A> <BR>
227
<LI> <A HREF="/~nrt/freeware/lv450.tar.gz">
228
lv v.4.50 (tar and gzip compressed) </A> <BR>
234
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
239
Standard installation:
242
<LI> Expand lv archive, using gunzip/tar.
243
<LI> Change your working directory to ``(extracted sub directory)/build''.
244
<LI> Execute ``../src/configure'' to configure compiler flags.
245
<LI> Launch ``make''.
246
<LI> Then, launch ``make install'' as root.
253
<A HREF="http://www.tokyoweb.or.jp/lsi-j/freesoft/lsic330c.lzh">
256
(limited and freeware version of <I>LSI C-86</I> for sample usage).
259
<LI> Expand lv archive, using gunzip/tar.
260
<LI> Change your working directory to ``(extracted sub directory)/src''.
261
<LI> Launch ``make -f Makefile.dos''.
262
<LI> Copy ``lv.hlp'', brief help description, to the same directory
266
MSDOS version of lv directly outputs ANSI escape sequences
267
without regard to termcap and terminfo.
268
Perhaps you need an ANSI escape sequence driver named ``ANSI.SYS''
269
(or more sophisticated one) on MSDOS
270
including DOS prompt on MS-Windoze.
271
Since Windoze-NT does not seem to prepare such drivers
272
for DOS prompt in default,
273
please look into the driver configuration
274
when lv fails to handle the terminal capability correctly.
280
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
286
<LI> <H3> How to launch lv? </H3>
288
When you just wish to display a file on a terminal,
289
please launch lv from command line like this:
292
% lv [options] files ... <BR>
295
Or, using redirect or pipe-line:
298
% another_command | lv [options] <BR>
299
% lv [options] < file
302
Compressed files that have suffix ``gz'', ``z'', or ``GZ'', ``Z'' are
303
extracted by lv using <I>zcat</I> (1),
304
and ``bz2'' or ``BZ2'' with <I>bzcat</I> (1).
305
Please install <I>zcat</I> and <I>bzcat</I> that can expand all of them.
307
In case that standard output is not connected to an ordinal terminal
308
but to redirect or pipe-line,
309
lv works as a coding-system or code-points conversion filter
310
like <I>nkf</I> (1) and <I>tcs</I> (1).
312
lv also works like <I>grep</I> (1)
313
by giving it another name, <I>lgrep</I>.
314
Please install symbolic (or hard) link
315
whose name is <I>lgrep</I> to <I>lv</I> (1).
316
Or, <I>lgrep</I> functionality is also turned on the option '-g'.
317
lgrep is used like below:
320
% lgrep [options] <B>grep_pattern</B> files ... <BR>
321
% another_command | lgrep [options] <B>grep_pattern</B> <BR>
322
% lgrep [options] <B>grep_pattern</B> < file
325
The coding-system of <B>grep_pattern</B> can be specified
326
as ``keyboard coding system'' (see below).
329
<LI> <H3> Command line options </H3>
333
<DT> -A<coding-system>
334
<DD> Set all coding systems to coding-system.
335
<DT> -I<coding-system>
336
<DD> Set input coding system to coding-system.
337
<DT> -K<coding-system>
338
<DD> Set keyboard coding system to coding-system.
339
If it is not set, output coding system will be applied to it.
340
<DT> -O<coding-system>
341
<DD> Set output coding system to coding-system.
342
<DT> -P<coding-system>
343
<DD> Set pathname coding system to coding-system.
344
<DT> -D<coding-system>
345
<DD> Set default EUC coding system to coding-system.
347
<DL> <DT> <H3> coding-system </H3> <DD>
349
<LI> a: auto-select <BR>
350
Its entity is iso-2022-kr
351
until an 8bit code is found.
355
<LI> e: Extended Unix Code
362
<LI> u: UCS transformation format
367
<LI> l: iso-8859-1..9
369
<LI> l1..9: iso-8859-1..9
371
<LI> lb,ld,le,lf,lg: iso-8859-11,13,14,15,16
376
<LI> r: raw mode <BR>
377
No decoding and encoding are performed.
381
<H3> Coding-system translations / Code-points conversions: </H3>
383
iso-2022-cn, -jp, -kr can be converted into euc-china or -taiwan,
384
euc-japan, euc-korea, respectively (and vice versa).
385
shift-jis uses the same internal code-points
386
as iso-2022-jp and euc-japan.
388
Since big5 characters can be converted into CNS 11643-1992
389
with negligible incompleteness,
390
big5 streams can be translated into iso-2022-cn or euc-taiwan
391
(and vice versa) with code-points conversion.
392
Note that the iso-2022-cn referred here is not GB sequence,
394
You should remember that lv cannot translate big5 into GB directly.
396
The search function of lv may not work correctly when lv additionally
397
performs ``code-points'' conversion
398
(not ``coding-system'' translation),
399
because visible code and internal code are different from each other.
400
lv will try to avoid this problem with
401
converting charsets of search patterns automatically,
402
but this function is not always perfect.
404
<DT> -W<number> <DD> Screen width
405
<DT> -H<number> <DD> Screen height
406
<DT> -E'<editor>' <DD> Editor name (default 'vi -c %d') <BR>
407
``%d'' means the line number of current position in a file.
408
<DT> -q <DD> Assert there is delete/insert-lines control <BR>
409
Please set this option on a MSDOS ANSI terminal
410
that has capability to delete and/or insert lines.
411
As to termcap and terminfo version,
412
it will be set automatically.
414
<DT> -Ss<seq> <DD> Set ANSI Standout sequence to <seq> (default "7")
415
<DT> -Sr<seq> <DD> Set ANSI Reverse sequence to <seq> (default "7")
416
<DT> -Sb<seq> <DD> Set ANSI Blink sequence to <seq> (default "5")
417
<DT> -Su<seq> <DD> Set ANSI Underline sequence to <seq> (default "4")
418
<DT> -Sh<seq> <DD> Set ANSI Highlight sequence to <seq> (default "1") <BR>
419
These sequences are inserted
420
between ``<TT>ESC [</TT>'' and ``<TT>m</TT>''
421
to construct full ANSI escape sequences.
423
<DT> -T<number> <DD>
424
Set Threshold-code which divides Unicode code-points in
425
two regions. Characters belonging to the lower region are
426
assumed to have a width of one, and the higher characters
427
are equated to a width of two. (Default: 12288, = 0x3000)
429
Force Unicode code-points which have the same glyphs as
430
iso-8859-* to be Mapped to iso-8859-* in a conversion from
431
Unicode to another character set which also has the
432
corresponding code-points, in particular, Asian charsets.
434
<DT> -a <DD> Adjust character set for search pattern (default)
435
<DT> -c <DD> Allow ANSI escape sequences for text decoration (Color)
436
<DT> -d, -i <DD> Make regexp-searches ignore case (case folD search)
438
<DT> -f <DD> Substitute Fixed strings for regular expressions
439
<DT> -k <DD> Convert X0201 Katakana to X0208
440
<DT> -l <DD> Allow physical lines of each logical line printed
441
on the screen to be concatenated for cut and paste
443
<DT> -s <DD> Force old pages to be swept out from the screen Smoothly
444
<DT> -u <DD> Unify several character sets, eg. JIS X0208 and C6226.
445
In addition, lv equates ISO 646 variants,
447
and unknown charsets with ASCII.
448
<DT> -g <DD> Turn on lgrep mode.
449
<DT> -n <DD> Prefix each line of output with the line number within its input file on lgrep.
450
<DT> -v <DD> Invert the sense of matching on lgrep.
451
<DT> -z <DD> Enable HZ auto-detection (also enabled by run-time C-t).
453
<DT> -+ <DD> Clear all options <BR>
454
You can also turn OFF specified options,
455
using ``+<option>'' like +c, +d, ... +z.
457
<DT> - <DD> Treat the following arguments as filenames
459
<DT> -V <DD> Show lv version
460
<DT> -h <DD> Show this help
463
<A NAME="configuration">
464
<LI> <H3> Configuration </H3>
466
Options can be described in the configuration file ``.lv''
467
(``_lv'' on MSDOS) located at you home directory. If and only if you
468
use MSDOS, you can locate ``_lv'' at current working directory.
469
They can be also described in the environment variable LV.
471
Every configuration will be overloaded in the following order if there is.
472
Command line options are always read finally.
475
<LI> .lv located at your home directory
476
<LI> (_lv located at current working directory: MSDOS only)
477
<LI> Environment variable LV
478
<LI> Command line options
484
<LI> MSDOS (Input is shift-jis, Screen height is 25 lines, Highlight seq is "1;45", Underline seq is "1")<BR>
485
<TT> set LV=-Is -H25 -Sh1;45 -Su1 </TT>
487
<LI> UNIX csh (Input is HZ-enabled auto-select, Output and Keyboard is both iso-2022-cn) <BR>
488
<TT> setenv LV '-z -Oc -Dec' </TT>
492
<LI> <H3> Run-time commands </H3>
496
<DT> 0-9: <DD> Argument
497
<DT> g, <: <DD> Jump to the line number (default: top of the file)
498
<DT> G, >: <DD> Jump to the line number (default: bottom of the file)
499
<DT> p: <DD> Jump to the percentage position in line numbers (0-100)
500
<DT> b, C-b: <DD> Previous page
501
<DT> u, C-u: <DD> Previous half page
502
<DT> k, w, C-k, y, C-y, C-p: <DD> Previous line
503
<DT> j, C-j, e, C-e, C-n, CR: <DD> Next line
504
<DT> d, C-d: <DD> Next half page
505
<DT> f, C-f, C-v, SP: <DD> Next page
506
<DT> F: <DD> Jump to the end of file, and wait for a data to be
507
appended to the file until interrupted.
508
<DT> /<string>: <DD> Find a string in the forward direction (regular expression)
509
<DT> ?<string>: <DD> Find a string in the backward direction (regular expression)
510
<DT> n: <DD> Repeat previous search in the forward direction
511
<DT> N: <DD> Repeat previous search in the backward direction (not REVERSE)
512
<DT> C-l: <DD> Redisplay all lines
513
<DT> r, C-r: <DD> Refresh screen and memory
514
<DT> R: <DD> Reload the current file
515
<DT> :n: <DD> Examine the next file
516
<DT> :p: <DD> Examine the previous file
517
<DT> t: <DD> Toggle input coding systems
518
<DT> T: <DD> Toggle input coding systems reversely
519
<DT> C-t: <DD> Toggle HZ decoding mode
520
<DT> v: <DD> Launch the editor defined by option -E
521
<DT> C-g, =: <DD> Show file information (filename, position, coding system)
522
<DT> V: <DD> Show LV version
523
<DT> C-z: <DD> Suspend (call SHELL or ``command.com'' under MSDOS)
525
<DT> UP/DOWN: <DD> Previous/Next line
526
<DT> LEFT/RIGHT: <DD> Previous/Next half page
527
<DT> PageUp/PageDown: <DD> Previous/Next page
531
<LI> <H3> How to input search strings? </H3>
533
You can input a string which consists of multi-bytes characters
534
and search the string as a regular expression.
535
lv's regular expression is similar to Mule's one.
537
The following keys have special meanings in the keyboard input:
540
<DT> C-m, Enter <DD> Enter the current string
541
<DT> C-h, BS, DEL <DD> Delete one character (backspace)
542
<DT> C-u <DD> Cancel the current string and try again
543
<DT> C-p <DD> Restore a few old strings incrementally (history)
548
<LI> <H3> Regular expressions </H3>
551
<LI> `. (period)' <BR>
552
matches any single character.
554
``a.b'' matches any three-character string which begins with
555
`a' and ends with `b'.
557
constructs repetition of an expression more than 0 times.
559
``ab*'' matches `a', `ab' `abb', etc.
561
constructs repetition of an expression more than once.
563
``ab+'' matches `ab', `abb', but not `a'.
565
matches the preceding expression either once or not at all.
567
``ca?r'' matches `car' or `cr'; nothing else.
569
makes a character set.
571
``[ab]+'' matches any string composed of just `a's and `b's.
572
You can also include character ranges in a character set,
573
by writing two characters with a `-' between them.
575
``[a-z]'' matches any lower-case letter.
576
If the characters implies a multi-bytes charset,
577
lv makes a multi-bytes range,
578
ordering code-points as unsigned integer.
579
Mutually overlapping ranges (or charset) are not guaranteed.
581
makes a complemented character set.
583
``[^a-z0-9A-Z]'' matches all characters
584
*except* letters and digits.
586
matches the empty string at the beginning of a line.
588
is similar to `^' but matches only at the end of a line.
590
quotes the special characters.
592
matches characters each of which has a width of 1 column.
594
matches characters each of which has a width of 2 columns.
596
specifies an alternative.
598
``foo\|bar'' matches either `foo' or `bar' but no other string.
599
<LI> `\( ... \)' <BR>
600
\(, \) is a grouping construct.
602
``ba\(na\)*'' matches `ba', `bana', `banana', etc.
608
<A NAME="limitations">
609
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
614
<LI> <H3> Up to 8192 bytes per a logical line </H3>
615
lv manages file location pointers logically,
616
separating LOGICAL lines by LF (line feed) or CR (carriage return),
618
The length of a logical line is limited up to 8192 bytes.
619
And lv insert a LF forcibly when a line has a length over 8192 bytes.
620
Note that all of CRs or CR/LF are replaced with single LF on UNIX
623
CRs are inserted before every LFs without thinking.
625
<LI> <H3> Physical lines per a logical line </H3>
626
A logical line is divided into PHYSICAL lines
627
to fall into the screen width.
628
lv limits physical lines up to "characters / 16" lines length
629
per a logical line for management of them.
630
Note that when a logical line has more lines,
631
the rest of the limit are truncated and not displayed at all.
633
<LI> <H3> Limitation of encoding space </H3>
634
Encoding space is limited upto "characters * 4" bytes length
635
for each decoded string.
636
Even if encoded string would be longer than that,
637
the encoding process is dropped at the limit.
639
<LI> <H3> Limitation of the number of logical lines </H3>
640
The number of logical lines is also limited.
642
lv can handle up to about 2 Giga lines on UNIX
643
(65000 lines on MSDOS).
644
Note that lines which exceed this limitation cannot be displayed at all.
649
<A NAME="codingSystem">
650
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
655
<LI> <H3> ISO 2022 based coding systems </H3>
657
lv handles ISO 2022 based coding systems as
658
they are stateless on the logical line level.
659
So you have to specify a coding system before decoding,
660
and lv maybe adds redundant codes during encoding.
664
<LI> iso-2022-cn <BR>
666
RFC 1922 tailored coding system.
668
<TABLE BORDER="2" CELLSPACING="2" CELLPADDING="2">
669
<TR> <TH> <TH> G0 <TH> G1 <TH> G2 <TH> G3
670
<TR> <TD> Designation <TD> ASCII <TD> GB 2312-80, CNS 11643-1992 Plane 1, ISO-IR-165 <TD> CNS 11643-1992 Plane 2 <TD> CNS 11643-1992 Plane 3..7
674
<LI> iso-2022-jp <BR>
676
RFC 1468 and 1554 tailored coding system.
677
All 94charsets use G0, and all 96charsets use G2 with single shift
681
<LI> iso-2022-kr <BR>
683
RFC 1557 tailored coding system.
684
All charsets except ASCII use only G1 with locking shift
689
<LI> <H3> Extended Unix Code </H3>
691
lv can decode mixture texts of euc-* and iso-2022-*,
692
when you select euc-* as the input coding system.
698
<TABLE BORDER="2" CELLSPACING="2" CELLPADDING="2">
699
<TR> <TH> <TH> G0 <TH> G1 <TH> G2 <TH> G3
700
<TR> <TD> Designation <TD> ASCII <TD> GB 2312-80 <TD> not used <TD> not used
706
<TABLE BORDER="2" CELLSPACING="2" CELLPADDING="2">
707
<TR> <TH> <TH> G0 <TH> G1 <TH> G2 <TH> G3
708
<TR> <TD> Designation <TD> ASCII <TD> JIS X 0208 <TD> JIS X 0201 Katakana <TD> JIS X 0212
714
<TABLE BORDER="2" CELLSPACING="2" CELLPADDING="2">
715
<TR> <TH> <TH> G0 <TH> G1 <TH> G2 <TH> G3
716
<TR> <TD> Designation <TD> ASCII <TD> KS C 5601-1987 <TD> not used <TD> not used
722
<TABLE BORDER="2" CELLSPACING="2" CELLPADDING="2">
723
<TR> <TH> <TH> G0 <TH> G1 <TH> G2 <TH> G3
724
<TR> <TD> Designation <TD> ASCII <TD> CNS 11643 Plane 1 <TD> CNS 11643 Plane 2-7 <TD> not used
729
<LI> <H3> UCS transformation format </H3>
735
A Mail-Safe Transformation Format of Unicode.
736
See RFC 1642 (Experimental) and
737
<A HREF="http://www.cm.spyglass.com/unicode/standard/utf7.html">
744
8bit Unicode encoding.
746
<A HREF="http://www.cm.spyglass.com/unicode/standard/wg2n1036.html">
747
UCS Transformation Format 8 (UTF-8).
751
lv can convert character codesets
752
between Unicode and the following charsets:
753
GB 2312-80, JIS X 0208, JIS X 0212, KSC 5601-1987,
754
Big Five, CNS 11643-1992 Plane 1-2,
757
Currently lv's mapping table is based on Unicode 1.1.
759
<TABLE BORDER="2" CELLSPACING="2" CELLPADDING="2">
760
<TR> <TH> Encoding <TH> Charset used for mapping from Unicode
761
<TR> <TD> iso-2022-cn <TD> GB 2312-80 (primary), CNS 11643-1992 (secondary), (ISO 8859-*)
762
<TR> <TD> iso-2022-jp <TD> JIS X0208, JIS X0212, JIS X0201, (ISO 8859-*)
763
<TR> <TD> iso-2022-kr <TD> KSC 5601-1987, (ISO 8859-*)
764
<TR> <TD> euc-china <TD> GB 2312-80
765
<TR> <TD> euc-japan <TD> JIS X0208, JIS X0212, JIS X0201
766
<TR> <TD> euc-korea <TD> KSC 5601-1987
767
<TR> <TD> euc-taiwan <TD> CNS 11643-1992 Plane 1-2
768
<TR> <TD> shift-jis <TD> JIS X0208, JIS X0201
769
<TR> <TD> big5 <TD> Big Five
772
When you output Unicode CJK unified ideographs through iso-2022-cn,
773
GB 2312-80 is used primarily,
774
and the rest which are not included in GB
775
are mapped into CNS 11643-1992.
777
<A NAME="otherCodingsystem">
778
<LI> <H3> Other coding systems </H3>
784
ASCII and one of ISO 8859/1-16 are designated on G0:G1
785
invoked to GL:GR, respectively.
790
lv can decode mixture texts of shift-jis and iso-2022-jp,
791
when you select shift-jis as the input coding system.
793
Note that euc-japan and shift-jis are mutually exclusive for decoding.
798
Since big5 characters can be partially converted
799
into CNS 11643-1992 Plane 1-2,
800
lv can load big5 streams
801
and output them through ISO 2022 based coding systems or euc-taiwan.
802
Several big5 characters which have no correspondence to CNS
803
are output as ``?'' (question mark).
808
HZ is defined in RFC 1843.
809
It would consist of four escape sequences, ~~, ~{, ~}, and ~\n,
810
but lv does not support the last one, ~\n sequence,
812
You should remember that lv does not conform full of RFC 1843.
813
HZ will be decoded as euc-china in lv.
818
No decoding and encoding is performed.
824
<A NAME="aboutCodingSystem">
825
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
826
Annotation about encoding/decoding scheme </H2>
830
<LI> <H3> Handling of invalid codes </H3>
832
Characters belonging to invalid character sets, for example,
833
JIS X 0212 for shift-jis,
834
are printed as ASCII at its code-point
835
up to originally supposed width.
837
Invalid characters which cause error state
838
under specified coding system
839
might be ignored partially.
841
it will be output as a control character.
844
<LI> <H3> Backspace </H3>
846
BS (backspace) characters included in files
847
are interpreted as follows:
850
<LI> <char> BS <char> <BR>
851
Highlighted <char>
852
<LI> ``_'' BS <char> <BR>
853
Underlined <char>
854
<LI> ``o'' BS ``+'' <BR>
857
BS deletes a character on the left side of it.
860
<A NAME="binaryFile">
861
<LI> <H3> How to look in a binary file? </H3>
863
Decoding of lv is robust even for binary files.
864
You can look in a binary file and decode embedded strings in it.
866
there might be ignored characters if you decode binary files
867
through a particular coding system.
868
Option -Ir, raw decoding, saves such ignored characters other than CRs.
873
<A NAME="autoSelect">
874
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
875
Auto selection of a coding system </H2>
878
<A NAME="defaultCodingSystem">
879
<LI> <H3> Default coding system </H3>
881
Default input coding system is auto-select described below.
882
In auto selection state,
883
lv decodes an input stream as iso-2022-kr.
884
Default output coding system is iso-2022-jp on UNIX,
885
or shift-jis on MSDOS (as long as Japanese version of lv).
887
If you don't specify any input coding system,
888
that is, when auto-select is specified,
889
lv will select input coding system automatically.
891
<A NAME="selectionMethod">
892
<LI> <H3> How does lv select a coding system? </H3>
894
Auto selection state continues until an 8bit code is found,
895
and the auto selection of input coding system is performed on demand.
897
When a 8bit code is found during file loading
898
and the input coding syste is auto-select (its entity is iso-2022-kr),
899
lv examines ``the first line that contains the first 8bit code''.
900
Then lv tries several 8bit decodings as below:
903
<LI> simple euc decoding test (included euc-china and euc-korea)
904
<LI> euc-japan (or euc-taiwan) decoding test
905
<LI> big5 decoding test
906
<LI> shift-jis decoding test
907
<LI> utf-8 decoding test (only on platforms other than MSDOS)
910
The coding system cheking results are examined in the following order:
913
<LI> Only when there is no error state in simple euc decoding,
914
lv will assumes the input coding system is
915
default EUC coding system,
916
which is defined by option -D.
917
<LI> Only when there is no error state in euc-japan (or euc-taiwan) decoding,
918
lv will assumes the input coding system is euc-japan
920
Since there is no syntactical difference
921
between euc-taiwan and euc-japan,
922
this action is to be altered in Taiwanese environment.
923
<LI> Only when there is no error state in big5 decoding,
924
lv will assumes the input coding system is big5.
925
Since big5 sequences are similar to EUCs,
926
sometimes its streams will be misunderstood as EUCs.
927
<LI> Only when there is no error state in shift-jis decoding,
928
lv will assumes the input coding system is shift-jis.
929
Since shift-jis shares code-points with EUCs partially,
930
its streams may be possibly misunderstood as EUCs.
931
<LI> Only when there is no error state in utf-8 decoding,
932
lv will assumes the input coding system is utf-8.
933
Like big5 and shift-jis,
934
sometimes its steams will be misinterpreted
935
as another coding system.
937
lv will assumes the input coding system is
938
ISO 8859-1 (latin-1).
941
If a text contains only EUC code points,
942
it is hard to identify the language
943
the EUC coding system represents.
944
So lv provides default EUC coding system
945
used when lv chooses the input coding system from EUCs.
946
Default EUC coding system is set by option -D
947
(euc-japan on Japanese version LV).
949
You can toggle coding systems even while viewing a file
950
by run-time command `t' and `T',
951
which traverses through all coding sytems implemented in LV.
953
you can toggle HZ decoding mode by C-t on demand.
955
You should remember that
956
the auto-selection mechanism of LV works incorrectly in some cases.
958
if a text contains only JIS X 0201 Katakana in shift-jis,
959
it will be misinterpreted as euc-japan.
961
If the result of auto selection is incorrect
962
and you know the input coding system,
963
please set it by the option -I,
964
which disables auto selection.
970
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
971
Extension for text decoration </H2>
975
Option -c enables ANSI escape sequences
976
in the form of ESC [ ps ; ... ; ps m,
977
where <B>ps</B> takes following values:
992
<LI> 40-47: Reverse of 30-37
995
<LI> Every sequence is independent of one another.
996
lv will reset all values before new value is set.
998
multiple <B>ps</B>s are accepted within one sequence.
999
<LI> Every sequence is only effective within a logical line.
1000
On crossing logical lines,
1001
all attributes are reset automatically.
1002
Please recall that lv handles each logical line as stateless.
1003
<LI> You can specify one color at once.
1004
When multiple colors are specified,
1005
the last one is effective.
1006
<LI> As to reversed characters,
1007
a specified color is applied to the ``reversed background color''.
1008
You cannot specify the color of ``out-clipped characters''.
1009
<LI> You can customize actual sequences to be output to the screen.
1010
Please specify them by option -S.
1015
<A NAME="customize">
1016
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
1020
<LI> Customization for command key bindings <BR>
1021
Please modify the keybind table in keybind.h.
1023
<LI> Customization for terminal controls <BR>
1024
When you add a new terminal control,
1025
please add codes to console.c.
1026
When you wish to change interpretation of escape sequences,
1027
please modify console.c and escape.c.
1028
However, some ANSI escape sequences are configurable through options.
1030
<LI> Changing default screen size of MSDOS ANSI terminals <BR>
1031
Default screen size is 80 columns by 24 rows.
1033
please modify console.c.
1034
However, screen size can be specified through options.
1036
<LI> Changing default coding systems <BR>
1037
Currently, Japanese version of lv uses following values:
1040
<TABLE BORDER="2" CELLSPACING="2" CELLPADDING="2">
1041
<TR> <TH> <TH> MSDOS <TH> UNIX
1042
<TR> <TD> Input: <TD> auto-select <TD> auto-select
1043
<TR> <TD> Keyboard: <TD> shift-jis <TD> iso-2022-jp
1044
<TR> <TD> Output: <TD> shift-jis <TD> iso-2022-jp
1045
<TR> <TD> Pathname: <TD> shift-jis <TD> iso-2022-jp
1046
<TR> <TD> Default EUC: <TD> euc-japan <TD> euc-japan
1053
those coding systems can be specified through options.
1055
<LI> Customization for coding systems <BR>
1057
an ISO 2022 universal decoder,
1058
and EUC, HZ, shift-jis, big5, UTF-7, UTF-8 decoders are implemented.
1059
When you wish to add another coding systems,
1060
please add source codes,
1061
referencing ctable_t.h, ctable.c, encode.c, decode.c, iso2022.c, etc.
1063
<LI> Customization for character sets <BR>
1064
Please add your favorite character sets,
1065
referencing itable_t.h, itable.c, etc.
1066
Currently recognized character sets are itemized below.
1067
You have to specify code length (bytes) and graphical width (columns)
1068
of each character as attributes.
1069
There is no necessity that
1070
code length and graphical width equal each other.
1071
Current implementation does not support per character length,
1072
but you can specify the maximum length of them in itable,
1073
it may not cause problems.
1074
You cannot add charsets whose code length is more than 3 bytes.
1075
(If you desire to do it,
1076
you can add only little modification to lv,
1077
so up to 4bytes charsets can be supported by lv.)
1080
ISO 646 United States (ANSI X3.4-1968) <BR>
1081
JIS X0201-1976 Japanese Roman <BR>
1082
JIS X0201-1976 Japanese Katakana <BR>
1083
ISO 8859/1 Latin alphabet No.1 Right part <BR>
1084
ISO 8859/2 Latin alphabet No.2 Right part <BR>
1085
ISO 8859/3 Latin alphabet No.3 Right part <BR>
1086
ISO 8859/4 Latin alphabet No.4 Right part <BR>
1087
ISO 8859/5 Cyrillic alphabet <BR>
1088
ISO 8859/6 Arabic alphabet <BR>
1089
ISO 8859/7 Greek alphabet <BR>
1090
ISO 8859/8 Hebrew alphabet <BR>
1091
ISO 8859/9 Latin alphabet No.5 Right part <BR>
1092
ISO 8859/10 Latin alphabet No.6 Right part (Nordic) <BR>
1093
ISO 8859/11 Thai alphabet <BR>
1094
ISO 8859/13 Latin alphabet No.7 Right part (Baltic Rim) <BR>
1095
ISO 8859/14 Latin alphabet No.8 Right part (Celtic) <BR>
1096
ISO 8859/15 Latin alphabet No.9 Right part <BR>
1097
ISO 8859/16 Latin alphabet No.10 Right part <BR>
1098
JIS C 6226-1978 Japanese kanji <BR>
1099
GB 2312-80 Chinese hanzi <BR>
1100
JIS X 0208-1983 Japanese kanji <BR>
1101
KS C 5601-1987 Korean graphic charset <BR>
1102
JIS X 0212-1990 Supplementary charset <BR>
1104
CNS 11643-1992 Plane 1..7 <BR>
1105
JIS X 0213-2000 Plane 1..2 <BR>
1106
Big5 Traditional Chinese <BR>
1110
These charset are only recognized by lv,
1111
and it is depend on your terminal's capability
1112
that actually can display them or not.
1115
you can handle non-listed charsets above as latin-1
1116
in such case as a 8bit coding system is displayed on a 8bit terminal.
1117
(If there is no code conversion and each character has one column.)
1124
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
1129
<LI> No bugs are reported.
1135
<A NAME="bugreport">
1136
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
1140
Please send a bug report to
1141
<I><A HREF="mailto:nrt@ff.iij4u.or.jp">nrt@ff.iij4u.or.jp</A></I>
1142
when you find any bugs around lv.
1148
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
1153
<A HREF="relnote.html"> Click here.</A> (in Japanese)
1158
<A NAME="acknowledgment">
1159
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
1160
Acknowledgement </H2>
1164
I would like to express my $B46<U$N5$;}$A(B for everybody
1165
who works together in connection with lv,
1166
especially for package maintainers,
1168
and early beta testing members:
1170
$B8eF#$5$s(B(gotom@debian.or.jp) <BR>
1172
$BLnB<$5$s(B(nomu@ipl.mech.nagoya-u.ac.jp) <BR>
1173
$B@PDM$5$s(B(ishizuka@db.is.kyushu-u.ac.jp) <BR>
1174
$BLnCf$5$s(B(nona@in.it.okayama-u.ac.jp) <BR>
1175
$B>>86$5$s(B(moody@osk.threewebnet.or.jp) <BR>
1176
$BB<0f$5$s(B(murai@geophys.hokudai.ac.jp) <BR>
1182
<H2> <IMG SRC="/~nrt/icons/petit.blueball.gif" ALT="">
1187
<LI> JIS X 0202-1991 $B>pJs8r49MQId9f$N3HD%K!(B <BR>
1188
Information processing - ISO 7-bit and 8-bit coded character sets
1189
- Code extension techniques
1190
<LI> JIS X 0208-1990 $B>pJs8r49MQ4A;zId9f(B <BR>
1191
Code of the Japanese graphic character set for information interchange
1192
<LI> JIS X 0212-1990 $B>pJs8r49MQ4A;zId9f(B - $BJd=u4A;z(B <BR>
1193
Code of the supplementary Japanese graphic character set for
1194
information interchange
1195
<LI> RFC 1468 Japanese Character Encoding for Internet Messages
1196
<LI> RFC 1554 ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP
1197
<LI> RFC 1557 Korean Character Encoding for Internet Messages
1198
<LI> RFC 1843 HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII characters
1199
<LI> RFC 1922 Chinese Character Encoding for Internet Messages
1200
<LI> RFC 2152 UTF-7 A Mail-Safe Transformation Format of Unicode <BR>
1201
<LI> RFC 2279 UTF-8, a transformation format of ISO 10646
1202
<LI> Understanding Japanese Information Processing ($B!XF|K\8l>pJs=hM}!Y(B) <BR>
1203
<I> Ken Lunde </I> O'Reilly & Associates, Inc. ISBN 1-56592-043-0
1204
<LI> <A HREF="ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf">CJK.INF Version 2.1</A> (July 12, 1996) <BR>
1205
Online Companion to "Understanding Japanese Information Processing" <BR>
1207
<LI> <A HREF="http://www.unicode.org/unicode/onlinedat/online.html"> Unicode Mapping Data </A> at the Unicode Consortium web site.
1208
<LI> Compilers - Principles, Techniques, and Tools <BR>
1209
<I> Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman </I>
1210
Addison-Wesley, ISBN 0-201-10088-6
1217
<IMG SRC="/~nrt/icons/homepage.gif" ALIGN=right ALT="Back to ">
1220
email: nrt@ff.iij4u.or.jp <BR>
1221
Homepage: http://www.ff.iij4u.or.jp/~nrt/ <BR CLEAR=all>