~ubuntu-branches/ubuntu/trusty/librep/trusty

« back to all changes in this revision

Viewing changes to src/README.regexp

Committer: Bazaar Package Importer
Author(s): Christian Marillat
Date: 2001-11-13 15:06:22 UTC
Revision ID: james.westby@ubuntu.com-20011113150622-vgmgmk6srj3kldr3

Tags: upstream-0.15.2

Import upstream version 0.15.2

files added:

.gdbinit

AUTHORS

BUGS

COPYING

ChangeLog

HACKING

INSTALL

Makedefs.in

Makefile.in

NEWS

README

THANKS

TODO

TREE

aclocal.m4

autogen.sh

build-info

config.guess

config.h.in

config.sub

configure

configure.in

doc/embed-1

doc/embed-2

doc/embed-3

doc/gc-protection

emulate-gnu-tar

install-aliases

install-sh

intl

intl/ChangeLog

intl/Makefile.in

intl/VERSION

intl/bindtextdom.c

intl/cat-compat.c

intl/dcgettext.c

intl/dgettext.c

intl/explodename.c

intl/finddomain.c

intl/gettext.c

intl/gettext.h

intl/gettextP.h

intl/hash-string.h

intl/intl-compat.c

intl/l10nflist.c

intl/libgettext.h

intl/linux-msg.sed

intl/loadinfo.h

intl/loadmsgcat.c

intl/localealias.c

intl/po2tbl.sed.in

intl/textdomain.c

intl/xopen-msg.sed

librep.spec

librep.spec.in

lisp

lisp/ChangeLog

lisp/Makefile.in

lisp/rep

lisp/rep.jl

lisp/rep/data

lisp/rep/data.jl

lisp/rep/data/objects.jl

lisp/rep/data/queues.jl

lisp/rep/data/records.jl

lisp/rep/data/ring.jl

lisp/rep/data/sort.jl

lisp/rep/data/string-util.jl

lisp/rep/data/symbol-table.jl

lisp/rep/i18n

lisp/rep/i18n/xgettext.jl

lisp/rep/io

lisp/rep/io/file-handlers

lisp/rep/io/file-handlers.jl

lisp/rep/io/file-handlers/remote

lisp/rep/io/file-handlers/remote.jl

lisp/rep/io/file-handlers/remote/ftp.jl

lisp/rep/io/file-handlers/remote/rcp.jl

lisp/rep/io/file-handlers/remote/rep.jl

lisp/rep/io/file-handlers/remote/utils.jl

lisp/rep/io/file-handlers/tar.jl

lisp/rep/io/file-handlers/tilde.jl

lisp/rep/io/files.jl

lisp/rep/io/streams.jl

lisp/rep/lang

lisp/rep/lang/backquote.jl

lisp/rep/lang/compat-doc.jl

lisp/rep/lang/debugger.jl

lisp/rep/lang/define.jl

lisp/rep/lang/doc.jl

lisp/rep/lang/error-helper.jl

lisp/rep/lang/interpreter.jl

lisp/rep/lang/math.jl

lisp/rep/lang/profiler.jl

lisp/rep/mail

lisp/rep/mail/addr.jl

lisp/rep/module-system.jl

lisp/rep/net

lisp/rep/net/domain-name.jl

lisp/rep/net/rpc.jl

lisp/rep/regexp.jl

lisp/rep/structures.jl

lisp/rep/system

lisp/rep/system.jl

lisp/rep/system/environ.jl

lisp/rep/system/pwd-prompt.jl

lisp/rep/test

lisp/rep/test/autoload.jl

lisp/rep/test/data.jl

lisp/rep/test/framework.jl

lisp/rep/threads

lisp/rep/threads/condition-variable.jl

lisp/rep/threads/message-port.jl

lisp/rep/threads/mutex.jl

lisp/rep/threads/utils.jl

lisp/rep/user.jl

lisp/rep/util

lisp/rep/util/autoloader.jl

lisp/rep/util/base64.jl

lisp/rep/util/date.jl

lisp/rep/util/gaol.jl

lisp/rep/util/ispell.jl

lisp/rep/util/memoize.jl

lisp/rep/util/repl.jl

lisp/rep/util/time.jl

lisp/rep/vm

lisp/rep/vm/assembler.jl

lisp/rep/vm/bytecode-defs.jl

lisp/rep/vm/bytecodes.jl

lisp/rep/vm/compiler

lisp/rep/vm/compiler.jl

lisp/rep/vm/compiler/basic.jl

lisp/rep/vm/compiler/bindings.jl

lisp/rep/vm/compiler/inline.jl

lisp/rep/vm/compiler/lap.jl

lisp/rep/vm/compiler/modules.jl

lisp/rep/vm/compiler/no-lang.jl

lisp/rep/vm/compiler/rep.jl

lisp/rep/vm/compiler/scheme.jl

lisp/rep/vm/compiler/src.jl

lisp/rep/vm/compiler/unscheme.jl

lisp/rep/vm/compiler/utils.jl

lisp/rep/vm/disassembler.jl

lisp/rep/vm/peephole.jl

lisp/rep/www

lisp/rep/www/cgi-get.jl

lisp/rep/www/fetch-url.jl

lisp/rep/www/quote-url.jl

lisp/scheme

lisp/scheme.jl

lisp/scheme/data.jl

lisp/scheme/misc.jl

lisp/scheme/syntax-funs.jl

lisp/scheme/syntax.jl

lisp/scheme/utils.jl

lisp/unscheme

lisp/unscheme.jl

lisp/unscheme/data.jl

lisp/unscheme/misc.jl

lisp/unscheme/syntax-funs.jl

lisp/unscheme/syntax.jl

ltmain.sh

man/Makefile.in

man/interface.texi

man/lang.texi

man/librep.texi

man/news.texi

man/repl.texi

mkinstalldirs

rep-debugger.el

rep.m4

rules.mk.sh

src/ChangeLog

src/Makefile.in

src/README.regexp

src/README.sdbm

src/alloca.c

src/bytecodes.h

src/continuations.c

src/datums.c

src/debug-buffer.c

src/dlmalloc.c

src/fake-libexec

src/files.c

src/find.c

src/fluids.c

src/getpagesize.h

src/gettext.c

src/librep.sym

src/lisp.c

src/lispcmds.c

src/lispmach.c

src/lispmach.h

src/macros.c

src/main.c

src/md5.c

src/md5.h

src/memcmp.c

src/message.c

src/misc.c

src/numbers.c

src/origin.c

src/readline.c

src/realpath.c

src/record-profile.c

src/regexp.3

src/regexp.c

src/regsub.c

src/rep-config.sh

src/rep-md5.c

src/rep-remote.c

src/rep-xgettext.jl

src/rep.c

src/rep.h

src/rep_config.h.in

src/rep_lisp.h

src/rep_regexp.h

src/rep_subrs.h

src/repdoc.c

src/repgdbm.c

src/repint.h

src/repint_subrs.h

src/repsdbm.c

src/safemach.c

src/sdbm.3

src/sdbm.c

src/sdbm.h

src/sdbm_hash.c

src/sdbm_pair.c

src/sdbm_pair.h

src/sdbm_tune.h

src/sockets.c

src/streams.c

src/structures.c

src/symbols.c

src/tables.c

src/timers.c

src/tuples.c

src/unix_defs.h

src/unix_dl.c

src/unix_files.c

src/unix_main.c

src/unix_processes.c

src/values.c

src/weak-refs.c

test

Show diffs side-by-side

added added

removed removed

src/README.regexp

This is a version of Henry Spencer's famous regexp implementation. I've

modified it to meet my needs, this is what I've done:

2) added a new function regsublen(), this performs a dry run of the

regsub() function returning the length of the string needed to hold

the output from regsub().

3) changed regexec(prog,str) to regexec2(prog,str,eflags) with macro for

regexec(). This is so I can have the flag REG_NOTBOL which signifies

that the string passed to regexec[2]() is not actually the start of a

line.

4) support for case-insignificant matching (with the flag REG_NOCASE)

5) split the definition of a compiled regexp from regexp.c into

a new file regprog.h

6) created a new file regjade.c which uses the regexec() structure to

match regexp against editor buffers in place.

7) Altered the regexp structure to allow storing of subexpressions as

positions in a Jade buffer. Also altered calling conventions of

regsub() and regsublen() to support this.

8) support \w, \W, \s, \S, \d, \D, \b, \B, *?, +?, ?? syntax (as in Perl)

And probably some other things as well. Obviously all errors are my

responsibility. The original README follows,

John

This is a nearly-public-domain reimplementation of the V8 regexp(3) package.

It gives C programs the ability to use egrep-style regular expressions, and

does it in a much cleaner fashion than the analogous routines in SysV.

Written by Henry Spencer. Not derived from licensed software.

Permission is granted to anyone to use this software for any

purpose on any computer system, and to redistribute it freely,

subject to the following restrictions:

1. The author is not responsible for the consequences of use of

this software, no matter how awful, even if they arise

from defects in it.

2. The origin of this software must not be misrepresented, either

by explicit claim or by omission.

3. Altered versions must be plainly marked as such, and must not

be misrepresented as being the original software.

Barring a couple of small items in the BUGS list, this implementation is

believed 100% compatible with V8. It should even be binary-compatible,

sort of, since the only fields in a "struct regexp" that other people have

any business touching are declared in exactly the same way at the same

location in the struct (the beginning).

This implementation is *NOT* AT&T/Bell code, and is not derived from licensed

software. Even though U of T is a V8 licensee. This software is based on

a V8 manual page sent to me by Dennis Ritchie (the manual page enclosed

here is a complete rewrite and hence is not covered by AT&T copyright).

The software was nearly complete at the time of arrival of our V8 tape.

I haven't even looked at V8 yet, although a friend elsewhere at U of T has

been kind enough to run a few test programs using the V8 regexp(3) to resolve

a few fine points. I admit to some familiarity with regular-expression

implementations of the past, but the only one that this code traces any

ancestry to is the one published in Kernighan & Plauger (from which this

one draws ideas but not code).

Simplistically: put this stuff into a source directory, copy regexp.h into

/usr/include, inspect Makefile for compilation options that need changing

to suit your local environment, and then do "make r". This compiles the

regexp(3) functions, compiles a test program, and runs a large set of

regression tests. If there are no complaints, then put regexp.o, regsub.o,

and regerror.o into your C library, and regexp.3 into your manual-pages

directory.

Note that if you don't put regexp.h into /usr/include *before* compiling,

you'll have to add "-I." to CFLAGS before compiling.

The files are:

Makefile instructions to make everything

regexp.3 manual page

regexp.h header file, for /usr/include

regexp.c source for regcomp() and regexec()

regsub.c source for regsub()

regerror.c source for default regerror()

regmagic.h internal header file

try.c source for test program

timer.c source for timing program

tests test list for try and timer

This implementation uses nondeterministic automata rather than the

deterministic ones found in some other implementations, which makes it

simpler, smaller, and faster at compiling regular expressions, but slower

at executing them. In theory, anyway. This implementation does employ

some special-case optimizations to make the simpler cases (which do make

up the bulk of regular expressions actually used) run quickly. In general,

if you want blazing speed you're in the wrong place. Replacing the insides

of egrep with this stuff is probably a mistake; if you want your own egrep

100

you're going to have to do a lot more work. But if you want to use regular

101

expressions a little bit in something else, you're in luck. Note that many

102

existing text editors use nondeterministic regular-expression implementations,

103

so you're in good company.

104

105

This stuff should be pretty portable, given appropriate option settings.

106

If your chars have less than 8 bits, you're going to have to change the

107

internal representation of the automaton, although knowledge of the details

108

of this is fairly localized. There are no "reserved" char values except for

109

NUL, and no special significance is attached to the top bit of chars.

110

The string(3) functions are used a fair bit, on the grounds that they are

111

probably faster than coding the operations in line. Some attempts at code

112

tuning have been made, but this is invariably a bit machine-specific.

Older »