~ubuntu-branches/ubuntu/trusty/lv/trusty-proposed

Viewing changes to .pc/20051030_%src%_~use_off_t.diff/index.html

Committer: Package Import Robot
Author(s): HIGUCHI Daisuke (VDR dai)
Date: 2014-03-15 03:42:28 UTC
Revision ID: package-import@ubuntu.com-20140315034228-6lcwgjiug8rym3wc

Tags: 4.51-2.2

* Non-maintainer upload.
* debian/control, debian/rules, debian/compat: use dh9.
* debian/control
  - add Vcs-* tags.
  - add Homepage: tag.
  - add ${misc:Depends} to Depends:.
  - add xz-utils to Recommends:.
* debian/source/format: set 3.0 (quilt).
* debian/patches/*: rename from debian/patch* and add DEP-3 headers.
* debian/copyright: convert to DEP-5.
* debian/patches/fix-hyphen-used-as-minus-sign.diff: new file.
* debian/lv.doc-base: new file.
* bump up Standards-Version 3.9.5

files added:
.pc

.pc/.quilt_patches

.pc/.quilt_series

.pc/.version

.pc/20050502_~+num-pat-option.diff

.pc/20050502_~+num-pat-option.diff/lv.1

.pc/20050502_~+num-pat-option.diff/src

.pc/20050502_~+num-pat-option.diff/src/command.c

.pc/20050502_~+num-pat-option.diff/src/command.h

.pc/20050502_~+num-pat-option.diff/src/conf.c

.pc/20050506_%src%file.c_~enable-fastio-use-fread.2.diff

.pc/20050506_%src%file.c_~enable-fastio-use-fread.2.diff/src

.pc/20050506_%src%file.c_~enable-fastio-use-fread.2.diff/src/command.c

.pc/20050506_%src%file.c_~enable-fastio-use-fread.2.diff/src/configure.in

.pc/20050506_%src%file.c_~enable-fastio-use-fread.2.diff/src/fetch.c

.pc/20050506_%src%file.c_~enable-fastio-use-fread.2.diff/src/file.c

.pc/20050506_%src%file.c_~enable-fastio-use-fread.2.diff/src/file.h

.pc/20051030_%src%_~use_off_t.diff

.pc/20051030_%src%_~use_off_t.diff/README

.pc/20051030_%src%_~use_off_t.diff/index.html

.pc/20051030_%src%_~use_off_t.diff/relnote.html

.pc/20051030_%src%_~use_off_t.diff/src

.pc/20051030_%src%_~use_off_t.diff/src/command.c

.pc/20051030_%src%_~use_off_t.diff/src/configure

.pc/20051030_%src%_~use_off_t.diff/src/configure.in

.pc/20051030_%src%_~use_off_t.diff/src/fetch.c

.pc/20051030_%src%_~use_off_t.diff/src/file.c

.pc/20051030_%src%_~use_off_t.diff/src/file.h

.pc/20051030_%src%_~use_off_t.diff/src/version.c

.pc/20051030_%src%_~use_off_t.diff/src/version.h

.pc/applied-patches

.pc/bts.660358.diff

.pc/bts.660358.diff/src

.pc/bts.660358.diff/src/stream.c

.pc/fix-hyphen-used-as-minus-sign.diff

.pc/fix-hyphen-used-as-minus-sign.diff/lv.1

.pc/misc.diff

.pc/misc.diff/lv.hlp

.pc/misc.diff/src

.pc/misc.diff/src/configure

debian/compat

debian/lv.doc-base

debian/patches

debian/patches/20050502_~+num-pat-option.diff

debian/patches/20050506_%src%file.c_~enable-fastio-use-fread.2.diff

debian/patches/20051030_%src%_~use_off_t.diff

debian/patches/bts.660358.diff

debian/patches/fix-hyphen-used-as-minus-sign.diff

debian/patches/misc.diff

debian/patches/series

debian/source

debian/source/format

files removed:
debian/@patch.lv.20050502_~+num-pat-option

debian/@patch.lv.20050506_%src%file.c_~enable-fastio-use-fread.2

debian/@patch.lv.20051030_%src%_~use_off_t

debian/@patch.lv.660358.diff

debian/patch.lv.misc

files modified:
README

debian/changelog

debian/control

debian/copyright

debian/rules

index.html

lv.1

lv.hlp

relnote.html

src/command.c

src/command.h

src/conf.c

src/configure

src/configure.in

src/fetch.c

src/file.c

src/file.h

src/stream.c

src/version.c

src/version.h

Show diffs side-by-side

added added

removed removed

.pc/20051030_%src%_~use_off_t.diff/index.html

<!-- ------------------------------------------------------------

$Id: index.html,v 1.29 2004/01/16 12:29:21 nrt Exp $

------------------------------------------------------------ -->

<HTML>

<HEAD>

<TITLE> LV Homepage </TITLE>

</HEAD>

Last modified at Jan.16th,2004.

<HR>

LV Homepage

</H1>

<P>

<FONT SIZE=+2>lv - <I>a Powerful Multilingual File Viewer / Grep</I></FONT>

<P>

<FONT SIZE=+1> The latest version is ver 4.51:

<A HREF="#download"> Download </A> </FONT>

</DL>

<HR>

Table of Contents </H2>

</A>

<P>

<OL>

<LI> <A HREF="#copyright"> Copyright </A>

<LI> <A HREF="#feature"> Feature </A>

<LI> <A HREF="#download"> Download lv </A>

<LI> <A HREF="#install"> Installation </A>

<LI> <A HREF="#usage"> Usage </A>

<UL>

<LI> <A HREF="#option"> Command line options </A>

<LI> <A HREF="#configuration"> Configuration </A>

<LI> <A HREF="#command"> Run-time commands </A>

<LI> <A HREF="#search"> How to input search strings? </A>

<LI> <A HREF="#regexp"> Regular expressions </A>

</UL>

<LI> <A HREF="#limitations"> Limitations </A>

<LI> <A HREF="#codingSystem"> Coding systems </A>

<UL>

<LI> <A HREF="#iso2022"> ISO 2022 based coding systems </A>

<UL>

</UL>

<LI> <A HREF="#euc"> Extended Unix Code </A>

<UL>

<LI> <A HREF="#eucchina"> euc-china </A>

<LI> <A HREF="#eucjapan"> euc-japan </A>

<LI> <A HREF="#euckorea"> euc-korea </A>

<LI> <A HREF="#euctaiwan"> euc-taiwan </A>

</UL>

<LI> <A HREF="#utf"> UCS transformation format </A>

<UL>

</UL>

<LI> <A HREF="#otherCodingsystem"> Other coding systems </A>

<UL>

<LI> <A HREF="#shiftjis"> shift-jis </A>

</UL>

<LI> <A HREF="#aboutCodingSystem"> Annotation about encoding/decoding scheme </A>

<UL>

<LI> <A HREF="#invalid"> Handling of invalid codes </A>

<LI> <A HREF="#backspace"> Backspace </A>

<LI> <A HREF="#binaryFile"> How to look in a binary file? </A>

</UL>

<LI> <A HREF="#autoSelect"> Auto selection of a coding system </A>

<UL>

<LI> <A HREF="#defaultCodingSystem"> Default coding system </A>

<LI> <A HREF="#selectionMethod"> How does lv select a coding system? </A>

</UL>

<LI> <A HREF="#color"> Extension for text decoration </A>

<LI> <A HREF="#customize"> Customization </A>

<LI> <A HREF="#bugreport"> Bug report </A>

100

<LI> <A HREF="relnote.html"> Release note </A>

101

<LI> <A HREF="#acknowledgment"> Acknowledgement </A>

102

<LI> <A HREF="#ref"> Reference </A>

103

</OL>

104

</DL>

105

106

<HR>

107

108

109

110

111

</A>

112

113

<P>

114

115

116

<PRE>

117

118

119

This program is free software; you can redistribute it and/or modify

120

it under the terms of the GNU General Public License as published by

121

the Free Software Foundation; either version 2 of the License, or

122

(at your option) any later version.

123

124

This program is distributed in the hope that it will be useful,

125

but WITHOUT ANY WARRANTY; without even the implied warranty of

126

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

127

GNU General Public License for more details.

128

129

You should have received a copy of the GNU General Public License

130

along with this program; if not, write to the Free Software

131

Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

132

</PRE>

133

<P>

134

See also <A HREF="GPL.txt">GNU General Public License Version 2</A>.

135

</DL>

136

137

<HR>

138

139

140

141

Feature </H2>

142

</A>

143

144

<UL>

145

<LI> <H3> Multilingual file viewer </H3>

146

<I>lv</I> is a powerful multilingual file viewer.

147

Apparently, lv looks like <I>less</I> (1),

148

a representative file viewer on UNIX as you know,

149

so UNIX people (and <I>less</I> people on other OSs)

150

don't have to learn a burdensome new interface.

151

lv can be used on MSDOS ANSI terminals and almost all UNIX platforms.

152

lv is a currently growing software,

153

so your feedback is welcome

154

and helpful for us to refine the future lv.

155

<P>

156

<LI> <H3> Multiple coding systems </H3>

157

lv can decode and encode multilingual streams

158

through many coding systems, for example,

159

ISO 2022 based coding systems such as iso-2022-jp,

160

and EUC (Extended Unix Code) like euc-japan.

161

Furthermore,

162

localized coding systems

163

such as shift-jis, big5 and HZ are also supported.

164

lv can be used not only as a file viewer

165

but also as a coding-system translation filter

166

like <I>nkf</I> (1) and <I>tcs</I> (1).

167

<P>

168

<LI> <H3> Multilingual regular expressions / Multilingual grep </H3>

169

lv can recognize multi-bytes patterns as regular expressions,

170

and lv also provides multilingual <I>grep</I> (1) functionality

171

by giving it another name, <I>lgrep</I>.

172

Pattern matching is conducted in the charset level,

173

so an EUC fragment, for example,

174

can be found in the ISO 2022 tailored streams, of course.

175

<P>

176

<LI> <H3> Supporting the Unicode standard </H3>

177

lv provides Unicode facilities

178

which enables you to handle Unicode streams encoded in UTF-7 or UTF-8,

179

and lv can also convert their code-points

180

between Unicode and other charsets.

181

So you can display Unicode or foreign texts on your terminal,

182

using the code conversion function

183

to your favorite charsets via Unicode.

184

(However, MSDOS version of lv has none of the Unicode facility.)

185

<P>

186

<LI> <H3> ANSI escape sequence through </H3>

187

lv can recognize ANSI escape sequences for text decoration.

188

So you can look ANSI-decorated streams

189

such as colored source codes generated by another software

190

just like intended image on ANSI terminals.

191

<P>

192

<LI> <H3> Completely original </H3>

193

lv is a completely original software

194

including no code drawn from <I>less</I> and <I>grep</I>

195

and other programs at all.

196

</UL>

197

198

<HR>

199

200

201

202

Sample Images </H2>

203

</A>

204

205

<UL>

206

<LI> Multilingual sample image <BR>

207

<A HREF="hello.sample.gif"> <B>``Hello''s</B> on <I> kterm </I> with lv (gif 15Kbytes) </A> <A HREF="hello.sample"> (Original text from Mule demo) </A>

208

</UL>

209

210

<HR>

211

212

213

214

Download lv </H2>

215

</A>

216

217

218

You can download lv archive.

219

Changes between older versions are described in

220

<A HREF="relnote.html">release note</A>

221

(in Japanese).

222

</DL>

223

224

<UL>

225

226

lv v.4.51 (tar and gzip compressed) </A> <BR>

227

228

lv v.4.50 (tar and gzip compressed) </A> <BR>

229

</UL>

230

231

<HR>

232

233

234

235

Installation </H2>

236

</A>

237

238

<UL>

239

Standard installation:

240

<P>

241

<OL>

242

<LI> Expand lv archive, using gunzip/tar.

243

<LI> Change your working directory to ``(extracted sub directory)/build''.

244

<LI> Execute ``../src/configure'' to configure compiler flags.

245

<LI> Launch ``make''.

246

<LI> Then, launch ``make install'' as root.

247

</OL>

248

<P>

249

MSDOS installation:

250

<P>

251

Before making lv,

252

you need to install

253

254

LSI C-86 Compiler

255

</A>

256

(limited and freeware version of <I>LSI C-86</I> for sample usage).

257

<P>

258

<OL>

259

<LI> Expand lv archive, using gunzip/tar.

260

<LI> Change your working directory to ``(extracted sub directory)/src''.

261

<LI> Launch ``make -f Makefile.dos''.

262

<LI> Copy ``lv.hlp'', brief help description, to the same directory

263

as lv.exe settled.

264

</OL>

265

<P>

266

MSDOS version of lv directly outputs ANSI escape sequences

267

without regard to termcap and terminfo.

268

Perhaps you need an ANSI escape sequence driver named ``ANSI.SYS''

269

(or more sophisticated one) on MSDOS

270

including DOS prompt on MS-Windoze.

271

Since Windoze-NT does not seem to prepare such drivers

272

for DOS prompt in default,

273

please look into the driver configuration

274

when lv fails to handle the terminal capability correctly.

275

</UL>

276

277

<HR>

278

279

280

281

Usage </H2>

282

</A>

283

284

<UL>

285

286

<LI> <H3> How to launch lv? </H3>

287

</A>

288

When you just wish to display a file on a terminal,

289

please launch lv from command line like this:

290

<P>

291

292

% lv [options] files ... <BR>

293

</DL>

294

<P>

295

Or, using redirect or pipe-line:

296

<P>

297

298

% another_command | lv [options] <BR>

299

% lv [options] < file

300

</DL>

301

<P>

302

Compressed files that have suffix ``gz'', ``z'', or ``GZ'', ``Z'' are

303

extracted by lv using <I>zcat</I> (1),

304

and ``bz2'' or ``BZ2'' with <I>bzcat</I> (1).

305

Please install <I>zcat</I> and <I>bzcat</I> that can expand all of them.

306

<P>

307

In case that standard output is not connected to an ordinal terminal

308

but to redirect or pipe-line,

309

lv works as a coding-system or code-points conversion filter

310

like <I>nkf</I> (1) and <I>tcs</I> (1).

311

<P>

312

lv also works like <I>grep</I> (1)

313

by giving it another name, <I>lgrep</I>.

314

Please install symbolic (or hard) link

315

whose name is <I>lgrep</I> to <I>lv</I> (1).

316

Or, <I>lgrep</I> functionality is also turned on the option '-g'.

317

lgrep is used like below:

318

<P>

319

320

% lgrep [options] <B>grep_pattern</B> files ... <BR>

321

% another_command | lgrep [options] <B>grep_pattern</B> <BR>

322

% lgrep [options] <B>grep_pattern</B> < file

323

</DL>

324

<P>

325

The coding-system of <B>grep_pattern</B> can be specified

326

as ``keyboard coding system'' (see below).

327

<P>

328

329

<LI> <H3> Command line options </H3>

330

</A>

331

<P>

332

<DL>

333

334

<DD> Set all coding systems to coding-system.

335

336

<DD> Set input coding system to coding-system.

337

338

<DD> Set keyboard coding system to coding-system.

339

If it is not set, output coding system will be applied to it.

340

341

<DD> Set output coding system to coding-system.

342

343

<DD> Set pathname coding system to coding-system.

344

345

<DD> Set default EUC coding system to coding-system.

346

<P>

347

<DL> <DT> <H3> coding-system </H3> <DD>

348

<UL>

349

<LI> a: auto-select <BR>

350

Its entity is iso-2022-kr

351

until an 8bit code is found.

352

<LI> c: iso-2022-cn

353

<LI> j: iso-2022-jp

354

<LI> k: iso-2022-kr

355

<LI> e: Extended Unix Code

356

<UL>

357

<LI> ec: euc-china

358

<LI> ej: euc-japan

359

<LI> ek: euc-korea

360

<LI> et: euc-taiwan

361

</UL>

362

<LI> u: UCS transformation format

363

<UL>

364

<LI> u7: UTF-7

365

<LI> u8: UTF-8

366

</UL>

367

<LI> l: iso-8859-1..9

368

<UL>

369

<LI> l1..9: iso-8859-1..9

370

<LI> l0: iso-8859-10

371

<LI> lb,ld,le,lf,lg: iso-8859-11,13,14,15,16

372

</UL>

373

<LI> s: shift-jis

374

<LI> b: big5

375

<LI> h: HZ

376

377

No decoding and encoding are performed.

378

</UL>

379

</DL>

380

<P>

381

<H3> Coding-system translations / Code-points conversions: </H3>

382

<P>

383

iso-2022-cn, -jp, -kr can be converted into euc-china or -taiwan,

384

euc-japan, euc-korea, respectively (and vice versa).

385

shift-jis uses the same internal code-points

386

as iso-2022-jp and euc-japan.

387

<P>

388

Since big5 characters can be converted into CNS 11643-1992

389

with negligible incompleteness,

390

big5 streams can be translated into iso-2022-cn or euc-taiwan

391

(and vice versa) with code-points conversion.

392

Note that the iso-2022-cn referred here is not GB sequence,

393

only just CNS one.

394

You should remember that lv cannot translate big5 into GB directly.

395

<P>

396

The search function of lv may not work correctly when lv additionally

397

performs ``code-points'' conversion

398

(not ``coding-system'' translation),

399

because visible code and internal code are different from each other.

400

lv will try to avoid this problem with

401

converting charsets of search patterns automatically,

402

but this function is not always perfect.

403

<P>

404

<DT> -W<number> <DD> Screen width

405

<DT> -H<number> <DD> Screen height

406

<DT> -E'<editor>' <DD> Editor name (default 'vi -c %d') <BR>

407

``%d'' means the line number of current position in a file.

408

<DT> -q <DD> Assert there is delete/insert-lines control <BR>

409

Please set this option on a MSDOS ANSI terminal

410

that has capability to delete and/or insert lines.

411

As to termcap and terminfo version,

412

it will be set automatically.

413

<P>

414

<DT> -Ss<seq> <DD> Set ANSI Standout sequence to <seq> (default "7")

415

<DT> -Sr<seq> <DD> Set ANSI Reverse sequence to <seq> (default "7")

416

<DT> -Sb<seq> <DD> Set ANSI Blink sequence to <seq> (default "5")

417

<DT> -Su<seq> <DD> Set ANSI Underline sequence to <seq> (default "4")

418

<DT> -Sh<seq> <DD> Set ANSI Highlight sequence to <seq> (default "1") <BR>

419

These sequences are inserted

420

between ``<TT>ESC [</TT>'' and ``<TT>m</TT>''

421

to construct full ANSI escape sequences.

422

<P>

423

424

Set Threshold-code which divides Unicode code-points in

425

two regions. Characters belonging to the lower region are

426

assumed to have a width of one, and the higher characters

427

are equated to a width of two. (Default: 12288, = 0x3000)

428

429

Force Unicode code-points which have the same glyphs as

430

iso-8859-* to be Mapped to iso-8859-* in a conversion from

431

Unicode to another character set which also has the

432

corresponding code-points, in particular, Asian charsets.

433

<P>

434

<DT> -a <DD> Adjust character set for search pattern (default)

435

<DT> -c <DD> Allow ANSI escape sequences for text decoration (Color)

436

<DT> -d, -i <DD> Make regexp-searches ignore case (case folD search)

437

(default)

438

<DT> -f <DD> Substitute Fixed strings for regular expressions

439

<DT> -k <DD> Convert X0201 Katakana to X0208

440

<DT> -l <DD> Allow physical lines of each logical line printed

441

on the screen to be concatenated for cut and paste

442

after screen refresh

443

<DT> -s <DD> Force old pages to be swept out from the screen Smoothly

444

<DT> -u <DD> Unify several character sets, eg. JIS X0208 and C6226.

445

In addition, lv equates ISO 646 variants,

446

eg. JIS X0201-Roman,

447

and unknown charsets with ASCII.

448

<DT> -g <DD> Turn on lgrep mode.

449

<DT> -n <DD> Prefix each line of output with the line number within its input file on lgrep.

450

<DT> -v <DD> Invert the sense of matching on lgrep.

451

<DT> -z <DD> Enable HZ auto-detection (also enabled by run-time C-t).

452

<P>

453

<DT> -+ <DD> Clear all options <BR>

454

You can also turn OFF specified options,

455

using ``+<option>'' like +c, +d, ... +z.

456

<P>

457

<DT> - <DD> Treat the following arguments as filenames

458

<P>

459

<DT> -V <DD> Show lv version

460

<DT> -h <DD> Show this help

461

</DL>

462

<P>

463

464

<LI> <H3> Configuration </H3>

465

</A>

466

Options can be described in the configuration file ``.lv''

467

(``_lv'' on MSDOS) located at you home directory. If and only if you

468

use MSDOS, you can locate ``_lv'' at current working directory.

469

They can be also described in the environment variable LV.

470

<P>

471

Every configuration will be overloaded in the following order if there is.

472

Command line options are always read finally.

473

<P>

474

<OL>

475

<LI> .lv located at your home directory

476

<LI> (_lv located at current working directory: MSDOS only)

477

<LI> Environment variable LV

478

<LI> Command line options

479

</OL>

480

<P>

481

Examples:

482

<P>

483

<UL>

484

<LI> MSDOS (Input is shift-jis, Screen height is 25 lines, Highlight seq is "1;45", Underline seq is "1")<BR>

485

486

<P>

487

<LI> UNIX csh (Input is HZ-enabled auto-select, Output and Keyboard is both iso-2022-cn) <BR>

488

<TT> setenv LV '-z -Oc -Dec' </TT>

489

</UL>

490

<P>

491

492

<LI> <H3> Run-time commands </H3>

493

</A>

494

<P>

495

<DL>

496

<DT> 0-9: <DD> Argument

497

<DT> g, <: <DD> Jump to the line number (default: top of the file)

498

<DT> G, >: <DD> Jump to the line number (default: bottom of the file)

499

<DT> p: <DD> Jump to the percentage position in line numbers (0-100)

500

<DT> b, C-b: <DD> Previous page

501

<DT> u, C-u: <DD> Previous half page

502

<DT> k, w, C-k, y, C-y, C-p: <DD> Previous line

503

<DT> j, C-j, e, C-e, C-n, CR: <DD> Next line

504

<DT> d, C-d: <DD> Next half page

505

<DT> f, C-f, C-v, SP: <DD> Next page

506

<DT> F: <DD> Jump to the end of file, and wait for a data to be

507

appended to the file until interrupted.

508

<DT> /<string>: <DD> Find a string in the forward direction (regular expression)

509

<DT> ?<string>: <DD> Find a string in the backward direction (regular expression)

510

<DT> n: <DD> Repeat previous search in the forward direction

511

<DT> N: <DD> Repeat previous search in the backward direction (not REVERSE)

512

<DT> C-l: <DD> Redisplay all lines

513

<DT> r, C-r: <DD> Refresh screen and memory

514

<DT> R: <DD> Reload the current file

515

<DT> :n: <DD> Examine the next file

516

<DT> :p: <DD> Examine the previous file

517

<DT> t: <DD> Toggle input coding systems

518

<DT> T: <DD> Toggle input coding systems reversely

519

<DT> C-t: <DD> Toggle HZ decoding mode

520

<DT> v: <DD> Launch the editor defined by option -E

521

<DT> C-g, =: <DD> Show file information (filename, position, coding system)

522

<DT> V: <DD> Show LV version

523

<DT> C-z: <DD> Suspend (call SHELL or ``command.com'' under MSDOS)

524

<DT> q, Q: <DD> Quit

525

<DT> UP/DOWN: <DD> Previous/Next line

526

<DT> LEFT/RIGHT: <DD> Previous/Next half page

527

<DT> PageUp/PageDown: <DD> Previous/Next page

528

</DL>

529

<P>

530

531

<LI> <H3> How to input search strings? </H3>

532

</A>

533

You can input a string which consists of multi-bytes characters

534

and search the string as a regular expression.

535

lv's regular expression is similar to Mule's one.

536

<P>

537

The following keys have special meanings in the keyboard input:

538

<P>

539

<DL>

540

<DT> C-m, Enter <DD> Enter the current string

541

<DT> C-h, BS, DEL <DD> Delete one character (backspace)

542

<DT> C-u <DD> Cancel the current string and try again

543

<DT> C-p <DD> Restore a few old strings incrementally (history)

544

<DT> C-g <DD> Quit

545

</DL>

546

<P>

547

548

<LI> <H3> Regular expressions </H3>

549

</A>

550

<UL>

551

<LI> `. (period)' <BR>

552

matches any single character.

553

For example,

554

``a.b'' matches any three-character string which begins with

555

`a' and ends with `b'.

556

557

constructs repetition of an expression more than 0 times.

558

For example,

559

``ab*'' matches `a', `ab' `abb', etc.

560

561

constructs repetition of an expression more than once.

562

For example,

563

``ab+'' matches `ab', `abb', but not `a'.

564

565

matches the preceding expression either once or not at all.

566

For example,

567

``ca?r'' matches `car' or `cr'; nothing else.

568

569

makes a character set.

570

For example,

571

``[ab]+'' matches any string composed of just `a's and `b's.

572

You can also include character ranges in a character set,

573

by writing two characters with a `-' between them.

574

For example,

575

``[a-z]'' matches any lower-case letter.

576

If the characters implies a multi-bytes charset,

577

lv makes a multi-bytes range,

578

ordering code-points as unsigned integer.

579

Mutually overlapping ranges (or charset) are not guaranteed.

580

581

makes a complemented character set.

582

For example,

583

``[^a-z0-9A-Z]'' matches all characters

584

*except* letters and digits.

585

586

matches the empty string at the beginning of a line.

587

588

is similar to `^' but matches only at the end of a line.

589

590

quotes the special characters.

591

592

matches characters each of which has a width of 1 column.

593

594

matches characters each of which has a width of 2 columns.

595

596

specifies an alternative.

597

For example,

598

``foo\|bar'' matches either `foo' or `bar' but no other string.

599

600

$, $ is a grouping construct.

601

For example,

602

``ba$na$*'' matches `ba', `bana', `banana', etc.

603

</UL>

604

</UL>

605

606

<HR>

607

608

609

610

Limitations </H2>

611

</A>

612

613

<UL>

614

<LI> <H3> Up to 8192 bytes per a logical line </H3>

615

lv manages file location pointers logically,

616

separating LOGICAL lines by LF (line feed) or CR (carriage return),

617

or CR/LF.

618

The length of a logical line is limited up to 8192 bytes.

619

And lv insert a LF forcibly when a line has a length over 8192 bytes.

620

Note that all of CRs or CR/LF are replaced with single LF on UNIX

621

during decoding.

622

As to MSDOS,

623

CRs are inserted before every LFs without thinking.

624

<P>

625

<LI> <H3> Physical lines per a logical line </H3>

626

A logical line is divided into PHYSICAL lines

627

to fall into the screen width.

628

lv limits physical lines up to "characters / 16" lines length

629

per a logical line for management of them.

630

Note that when a logical line has more lines,

631

the rest of the limit are truncated and not displayed at all.

632

<P>

633

<LI> <H3> Limitation of encoding space </H3>

634

Encoding space is limited upto "characters * 4" bytes length

635

for each decoded string.

636

Even if encoded string would be longer than that,

637

the encoding process is dropped at the limit.

638

<P>

639

<LI> <H3> Limitation of the number of logical lines </H3>

640

The number of logical lines is also limited.

641

Currently,

642

lv can handle up to about 2 Giga lines on UNIX

643

(65000 lines on MSDOS).

644

Note that lines which exceed this limitation cannot be displayed at all.

645

</UL>

646

647

<HR>

648

649

650

651

Coding systems </H2>

652

</A>

653

<UL>

654

655

<LI> <H3> ISO 2022 based coding systems </H3>

656

</A>

657

lv handles ISO 2022 based coding systems as

658

they are stateless on the logical line level.

659

So you have to specify a coding system before decoding,

660

and lv maybe adds redundant codes during encoding.

661

<P>

662

<UL>

663

664

665

</A>

666

RFC 1922 tailored coding system.

667

<P>

668

669

<TR> <TH> <TH> G0 <TH> G1 <TH> G2 <TH> G3

670

<TR> <TD> Designation <TD> ASCII <TD> GB 2312-80, CNS 11643-1992 Plane 1, ISO-IR-165 <TD> CNS 11643-1992 Plane 2 <TD> CNS 11643-1992 Plane 3..7

671

</TABLE>

672

<P>

673

674

675

</A>

676

RFC 1468 and 1554 tailored coding system.

677

All 94charsets use G0, and all 96charsets use G2 with single shift

678

inside lv.

679

<P>

680

681

682

</A>

683

RFC 1557 tailored coding system.

684

All charsets except ASCII use only G1 with locking shift

685

inside lv.

686

</UL>

687

<P>

688

689

<LI> <H3> Extended Unix Code </H3>

690

</A>

691

lv can decode mixture texts of euc-* and iso-2022-*,

692

when you select euc-* as the input coding system.

693

<P>

694

<UL>

695

696

<LI> euc-china <BR>

697

</A>

698

699

<TR> <TH> <TH> G0 <TH> G1 <TH> G2 <TH> G3

700

<TR> <TD> Designation <TD> ASCII <TD> GB 2312-80 <TD> not used <TD> not used

701

</TABLE>

702

<P>

703

704

<LI> euc-japan <BR>

705

</A>

706

707

<TR> <TH> <TH> G0 <TH> G1 <TH> G2 <TH> G3

708

<TR> <TD> Designation <TD> ASCII <TD> JIS X 0208 <TD> JIS X 0201 Katakana <TD> JIS X 0212

709

</TABLE>

710

<P>

711

712

<LI> euc-korea <BR>

713

</A>

714

715

<TR> <TH> <TH> G0 <TH> G1 <TH> G2 <TH> G3

716

<TR> <TD> Designation <TD> ASCII <TD> KS C 5601-1987 <TD> not used <TD> not used

717

</TABLE>

718

<P>

719

720

<LI> euc-taiwan <BR>

721

</A>

722

723

<TR> <TH> <TH> G0 <TH> G1 <TH> G2 <TH> G3

724

<TR> <TD> Designation <TD> ASCII <TD> CNS 11643 Plane 1 <TD> CNS 11643 Plane 2-7 <TD> not used

725

</TABLE>

726

</UL>

727

<P>

728

729

<LI> <H3> UCS transformation format </H3>

730

</A>

731

<UL>

732

733

734

</A>

735

A Mail-Safe Transformation Format of Unicode.

736

See RFC 1642 (Experimental) and

737

738

UTF-7 Encoding Form

739

</A>.

740

<P>

741

742

743

</A>

744

8bit Unicode encoding.

745

See

746

747

UCS Transformation Format 8 (UTF-8).

748

</A>

749

</UL>

750

<P>

751

lv can convert character codesets

752

between Unicode and the following charsets:

753

GB 2312-80, JIS X 0208, JIS X 0212, KSC 5601-1987,

754

Big Five, CNS 11643-1992 Plane 1-2,

755

and ISO 8859-1..16.

756

<P>

757

Currently lv's mapping table is based on Unicode 1.1.

758

<P>

759

760

<TR> <TH> Encoding <TH> Charset used for mapping from Unicode

761

<TR> <TD> iso-2022-cn <TD> GB 2312-80 (primary), CNS 11643-1992 (secondary), (ISO 8859-*)

762

<TR> <TD> iso-2022-jp <TD> JIS X0208, JIS X0212, JIS X0201, (ISO 8859-*)

763

<TR> <TD> iso-2022-kr <TD> KSC 5601-1987, (ISO 8859-*)

764

<TR> <TD> euc-china <TD> GB 2312-80

765

<TR> <TD> euc-japan <TD> JIS X0208, JIS X0212, JIS X0201

766

<TR> <TD> euc-korea <TD> KSC 5601-1987

767

<TR> <TD> euc-taiwan <TD> CNS 11643-1992 Plane 1-2

768

<TR> <TD> shift-jis <TD> JIS X0208, JIS X0201

769

<TR> <TD> big5 <TD> Big Five

770

</TABLE>

771

<P>

772

When you output Unicode CJK unified ideographs through iso-2022-cn,

773

GB 2312-80 is used primarily,

774

and the rest which are not included in GB

775

are mapped into CNS 11643-1992.

776

<P>

777

778

<LI> <H3> Other coding systems </H3>

779

</A>

780

<UL>

781

782

783

</A>

784

ASCII and one of ISO 8859/1-16 are designated on G0:G1

785

invoked to GL:GR, respectively.

786

<P>

787

788

<LI> shift-jis <BR>

789

</A>

790

lv can decode mixture texts of shift-jis and iso-2022-jp,

791

when you select shift-jis as the input coding system.

792

<P>

793

Note that euc-japan and shift-jis are mutually exclusive for decoding.

794

<P>

795

796

797

</A>

798

Since big5 characters can be partially converted

799

into CNS 11643-1992 Plane 1-2,

800

lv can load big5 streams

801

and output them through ISO 2022 based coding systems or euc-taiwan.

802

Several big5 characters which have no correspondence to CNS

803

are output as ``?'' (question mark).

804

<P>

805

806

807

</A>

808

HZ is defined in RFC 1843.

809

It would consist of four escape sequences, ~~, ~{, ~}, and ~\n,

810

but lv does not support the last one, ~\n sequence,

811

and leaves it alone.

812

You should remember that lv does not conform full of RFC 1843.

813

HZ will be decoded as euc-china in lv.

814

<P>

815

816

817

</A>

818

No decoding and encoding is performed.

819

</UL>

820

</UL>

821

822

<HR>

823

824

825

826

Annotation about encoding/decoding scheme </H2>

827

</A>

828

<UL>

829

830

<LI> <H3> Handling of invalid codes </H3>

831

</A>

832

Characters belonging to invalid character sets, for example,

833

JIS X 0212 for shift-jis,

834

are printed as ASCII at its code-point

835

up to originally supposed width.

836

<P>

837

Invalid characters which cause error state

838

under specified coding system

839

might be ignored partially.

840

If it is printable,

841

it will be output as a control character.

842

<P>

843

844

<LI> <H3> Backspace </H3>

845

</A>

846

BS (backspace) characters included in files

847

are interpreted as follows:

848

<P>

849

<UL>

850

851

Highlighted <char>

852

853

Underlined <char>

854

855

Highlighted ``o''

856

<LI> Otherwise <BR>

857

BS deletes a character on the left side of it.

858

</UL>

859

<P>

860

861

<LI> <H3> How to look in a binary file? </H3>

862

</A>

863

Decoding of lv is robust even for binary files.

864

You can look in a binary file and decode embedded strings in it.

865

However,

866

there might be ignored characters if you decode binary files

867

through a particular coding system.

868

Option -Ir, raw decoding, saves such ignored characters other than CRs.

869

</UL>

870

871

<HR>

872

873

874

875

Auto selection of a coding system </H2>

876

</A>

877

<UL>

878

879

<LI> <H3> Default coding system </H3>

880

</A>

881

Default input coding system is auto-select described below.

882

In auto selection state,

883

lv decodes an input stream as iso-2022-kr.

884

Default output coding system is iso-2022-jp on UNIX,

885

or shift-jis on MSDOS (as long as Japanese version of lv).

886

<P>

887

If you don't specify any input coding system,

888

that is, when auto-select is specified,

889

lv will select input coding system automatically.

890

<P>

891

892

<LI> <H3> How does lv select a coding system? </H3>

893

</A>

894

Auto selection state continues until an 8bit code is found,

895

and the auto selection of input coding system is performed on demand.

896

<P>

897

When a 8bit code is found during file loading

898

and the input coding syste is auto-select (its entity is iso-2022-kr),

899

lv examines ``the first line that contains the first 8bit code''.

900

Then lv tries several 8bit decodings as below:

901

<P>

902

<UL>

903

<LI> simple euc decoding test (included euc-china and euc-korea)

904

<LI> euc-japan (or euc-taiwan) decoding test

905

<LI> big5 decoding test

906

<LI> shift-jis decoding test

907

<LI> utf-8 decoding test (only on platforms other than MSDOS)

908

</UL>

909

<P>

910

The coding system cheking results are examined in the following order:

911

<P>

912

<OL>

913

<LI> Only when there is no error state in simple euc decoding,

914

lv will assumes the input coding system is

915

default EUC coding system,

916

which is defined by option -D.

917

<LI> Only when there is no error state in euc-japan (or euc-taiwan) decoding,

918

lv will assumes the input coding system is euc-japan

919

(Japanese version).

920

Since there is no syntactical difference

921

between euc-taiwan and euc-japan,

922

this action is to be altered in Taiwanese environment.

923

<LI> Only when there is no error state in big5 decoding,

924

lv will assumes the input coding system is big5.

925

Since big5 sequences are similar to EUCs,

926

sometimes its streams will be misunderstood as EUCs.

927

<LI> Only when there is no error state in shift-jis decoding,

928

lv will assumes the input coding system is shift-jis.

929

Since shift-jis shares code-points with EUCs partially,

930

its streams may be possibly misunderstood as EUCs.

931

<LI> Only when there is no error state in utf-8 decoding,

932

lv will assumes the input coding system is utf-8.

933

Like big5 and shift-jis,

934

sometimes its steams will be misinterpreted

935

as another coding system.

936

<LI> Otherwise,

937

lv will assumes the input coding system is

938

ISO 8859-1 (latin-1).

939

</OL>

940

<P>

941

If a text contains only EUC code points,

942

it is hard to identify the language

943

the EUC coding system represents.

944

So lv provides default EUC coding system

945

used when lv chooses the input coding system from EUCs.

946

Default EUC coding system is set by option -D

947

(euc-japan on Japanese version LV).

948

<P>

949

You can toggle coding systems even while viewing a file

950

by run-time command `t' and `T',

951

which traverses through all coding sytems implemented in LV.

952

In addition,

953

you can toggle HZ decoding mode by C-t on demand.

954

<P>

955

You should remember that

956

the auto-selection mechanism of LV works incorrectly in some cases.

957

Especially,

958

if a text contains only JIS X 0201 Katakana in shift-jis,

959

it will be misinterpreted as euc-japan.

960

<P>

961

If the result of auto selection is incorrect

962

and you know the input coding system,

963

please set it by the option -I,

964

which disables auto selection.

965

</UL>

966

967

<HR>

968

969

970

971

Extension for text decoration </H2>

972

</A>

973

<UL>

974

<LI>

975

Option -c enables ANSI escape sequences

976

in the form of ESC [ ps ; ... ; ps m,

977

where <B>ps</B> takes following values:

978

<P>

979

<UL>

980

<LI> 1: Highlight

981

<LI> 4: Underline

982

<LI> 5: Blink

983

<LI> 7: Reverse

984

<LI> 30: Black

985

<LI> 31: Red

986

<LI> 32: Green

987

<LI> 33: Yellow

988

<LI> 34: Blue

989

<LI> 35: Magenta

990

<LI> 36: Cyan

991

<LI> 37: White

992

<LI> 40-47: Reverse of 30-37

993

</UL>

994

<P>

995

<LI> Every sequence is independent of one another.

996

lv will reset all values before new value is set.

997

Meanwhile,

998

multiple <B>ps</B>s are accepted within one sequence.

999

<LI> Every sequence is only effective within a logical line.

1000

On crossing logical lines,

1001

all attributes are reset automatically.

1002

Please recall that lv handles each logical line as stateless.

1003

<LI> You can specify one color at once.

1004

When multiple colors are specified,

1005

the last one is effective.

1006

<LI> As to reversed characters,

1007

a specified color is applied to the ``reversed background color''.

1008

You cannot specify the color of ``out-clipped characters''.

1009

<LI> You can customize actual sequences to be output to the screen.

1010

Please specify them by option -S.

1011

</UL>

1012

1013

<HR>

1014

1015

1016

1017

Customization </H2>

1018

</A>

1019

<UL>

1020

<LI> Customization for command key bindings <BR>

1021

Please modify the keybind table in keybind.h.

1022

<P>

1023

<LI> Customization for terminal controls <BR>

1024

When you add a new terminal control,

1025

please add codes to console.c.

1026

When you wish to change interpretation of escape sequences,

1027

please modify console.c and escape.c.

1028

However, some ANSI escape sequences are configurable through options.

1029

<P>

1030

<LI> Changing default screen size of MSDOS ANSI terminals <BR>

1031

Default screen size is 80 columns by 24 rows.

1032

To change this,

1033

please modify console.c.

1034

However, screen size can be specified through options.

1035

<P>

1036

<LI> Changing default coding systems <BR>

1037

Currently, Japanese version of lv uses following values:

1038

<P>

1039

1040

1041

<TR> <TH> <TH> MSDOS <TH> UNIX

1042

<TR> <TD> Input: <TD> auto-select <TD> auto-select

1043

<TR> <TD> Keyboard: <TD> shift-jis <TD> iso-2022-jp

1044

<TR> <TD> Output: <TD> shift-jis <TD> iso-2022-jp

1045

<TR> <TD> Pathname: <TD> shift-jis <TD> iso-2022-jp

1046

<TR> <TD> Default EUC: <TD> euc-japan <TD> euc-japan

1047

</TABLE>

1048

</DL>

1049

<P>

1050

To change above,

1051

please modify lv.c.

1052

However,

1053

those coding systems can be specified through options.

1054

<P>

1055

<LI> Customization for coding systems <BR>

1056

Currently,

1057

an ISO 2022 universal decoder,

1058

and EUC, HZ, shift-jis, big5, UTF-7, UTF-8 decoders are implemented.

1059

When you wish to add another coding systems,

1060

please add source codes,

1061

referencing ctable_t.h, ctable.c, encode.c, decode.c, iso2022.c, etc.

1062

<P>

1063

<LI> Customization for character sets <BR>

1064

Please add your favorite character sets,

1065

referencing itable_t.h, itable.c, etc.

1066

Currently recognized character sets are itemized below.

1067

You have to specify code length (bytes) and graphical width (columns)

1068

of each character as attributes.

1069

There is no necessity that

1070

code length and graphical width equal each other.

1071

Current implementation does not support per character length,

1072

but you can specify the maximum length of them in itable,

1073

it may not cause problems.

1074

You cannot add charsets whose code length is more than 3 bytes.

1075

(If you desire to do it,

1076

you can add only little modification to lv,

1077

so up to 4bytes charsets can be supported by lv.)

1078

<P>

1079

1080

ISO 646 United States (ANSI X3.4-1968) <BR>

1081

JIS X0201-1976 Japanese Roman <BR>

1082

JIS X0201-1976 Japanese Katakana <BR>

1083

ISO 8859/1 Latin alphabet No.1 Right part <BR>

1084

ISO 8859/2 Latin alphabet No.2 Right part <BR>

1085

ISO 8859/3 Latin alphabet No.3 Right part <BR>

1086

ISO 8859/4 Latin alphabet No.4 Right part <BR>

1087

ISO 8859/5 Cyrillic alphabet <BR>

1088

ISO 8859/6 Arabic alphabet <BR>

1089

ISO 8859/7 Greek alphabet <BR>

1090

ISO 8859/8 Hebrew alphabet <BR>

1091

ISO 8859/9 Latin alphabet No.5 Right part <BR>

1092

ISO 8859/10 Latin alphabet No.6 Right part (Nordic) <BR>

1093

ISO 8859/11 Thai alphabet <BR>

1094

ISO 8859/13 Latin alphabet No.7 Right part (Baltic Rim) <BR>

1095

ISO 8859/14 Latin alphabet No.8 Right part (Celtic) <BR>

1096

ISO 8859/15 Latin alphabet No.9 Right part <BR>

1097

ISO 8859/16 Latin alphabet No.10 Right part <BR>

1098

JIS C 6226-1978 Japanese kanji <BR>

1099

GB 2312-80 Chinese hanzi <BR>

1100

JIS X 0208-1983 Japanese kanji <BR>

1101

KS C 5601-1987 Korean graphic charset <BR>

1102

JIS X 0212-1990 Supplementary charset <BR>

1103

ISO-IR-165 <BR>

1104

CNS 11643-1992 Plane 1..7 <BR>

1105

JIS X 0213-2000 Plane 1..2 <BR>

1106

Big5 Traditional Chinese <BR>

1107

Unicode 1.1 <BR>

1108

</DL>

1109

<P>

1110

These charset are only recognized by lv,

1111

and it is depend on your terminal's capability

1112

that actually can display them or not.

1113

<P>

1114

Inversely,

1115

you can handle non-listed charsets above as latin-1

1116

in such case as a 8bit coding system is displayed on a 8bit terminal.

1117

(If there is no code conversion and each character has one column.)

1118

</UL>

1119

1120

<!--

1121

<HR>

1122

1123

1124

1125

Known bugs </H2>

1126

</A>

1127

1128

<UL>

1129

<LI> No bugs are reported.

1130

</UL>

1131

-->

1132

1133

<HR>

1134

1135

1136

1137

Bug report </H2>

1138

1139

1140

Please send a bug report to

1141

1142

when you find any bugs around lv.

1143

</DL>

1144

1145

<HR>

1146

1147

1148

1149

Release note </H2>

1150

</A>

1151

1152

1153

<A HREF="relnote.html"> Click here.</A> (in Japanese)

1154

</DL>

1155

1156

<HR>

1157

1158

1159

1160

Acknowledgement </H2>

1161

</A>

1162

1163

1164

I would like to express my $B46<U$N5$;}$A(B for everybody

1165

who works together in connection with lv,

1166

especially for package maintainers,

1167

bug reporters,

1168

and early beta testing members:

1169

<P>

1170

$B8eF#$5$s(B(gotom@debian.or.jp) <BR>

1171

<P>

1172

$BLnB<$5$s(B(nomu@ipl.mech.nagoya-u.ac.jp) <BR>

1173

$B@PDM$5$s(B(ishizuka@db.is.kyushu-u.ac.jp) <BR>

1174

$BLnCf$5$s(B(nona@in.it.okayama-u.ac.jp) <BR>

1175

$B>>86$5$s(B(moody@osk.threewebnet.or.jp) <BR>

1176

$BB<0f$5$s(B(murai@geophys.hokudai.ac.jp) <BR>

1177

</DL>

1178

1179

<HR>

1180

1181

1182

1183

Reference </H2>

1184

</A>

1185

1186

<UL>

1187

1188

Information processing - ISO 7-bit and 8-bit coded character sets

1189

- Code extension techniques

1190

1191

Code of the Japanese graphic character set for information interchange

1192

1193

Code of the supplementary Japanese graphic character set for

1194

information interchange

1195

<LI> RFC 1468 Japanese Character Encoding for Internet Messages

1196

<LI> RFC 1554 ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP

1197

<LI> RFC 1557 Korean Character Encoding for Internet Messages

1198

<LI> RFC 1843 HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII characters

1199

<LI> RFC 1922 Chinese Character Encoding for Internet Messages

1200

<LI> RFC 2152 UTF-7 A Mail-Safe Transformation Format of Unicode <BR>

1201

<LI> RFC 2279 UTF-8, a transformation format of ISO 10646

1202

<LI> Understanding Japanese Information Processing ($B!XF|K\8l>pJs=hM}!Y(B) <BR>

1203

<I> Ken Lunde </I> O'Reilly & Associates, Inc. ISBN 1-56592-043-0

1204

<LI> <A HREF="ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf">CJK.INF Version 2.1</A> (July 12, 1996) <BR>

1205

Online Companion to "Understanding Japanese Information Processing" <BR>

1206

<I> Ken Lunde </I>

1207

<LI> <A HREF="http://www.unicode.org/unicode/onlinedat/online.html"> Unicode Mapping Data </A> at the Unicode Consortium web site.

1208

<LI> Compilers - Principles, Techniques, and Tools <BR>

1209

<I> Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman </I>

1210

Addison-Wesley, ISBN 0-201-10088-6

1211

</UL>

1212

1213

<HR>

1214

1215

1216

1217

1218

NARITA Tomio

1219

</A> <BR>

1220

email: nrt@ff.iij4u.or.jp <BR>

1221

Homepage: http://www.ff.iij4u.or.jp/~nrt/ <BR CLEAR=all>

1222

</ADDRESS>

1223

1224

</BODY>

1225

</HTML>

Older »