~ubuntu-branches/ubuntu/hoary/pcre3/hoary-security

Viewing changes to doc/html/pcretest.html

Committer: Bazaar Package Importer
Author(s): Andreas Metzler
Date: 2004-03-12 13:23:02 UTC
mfrom: (1.1.1 upstream)
Revision ID: james.westby@ubuntu.com-20040312132302-id6ksx1l8dwssbw9

Tags: 4.5-1.1

http://bugs.debian.org/237265

http://bugs.debian.org/237564

* NMU to fix rc-bugs.
* Update libtool related files to fix build-error on mips, keep original
config.in, as it is no generated file. (Closes: #237265)
* pcregrep replaces pgrep. (Closes: #237564)
* Bump shlibs, pcre 4.5 includes two new functions.
* Let pgrep's /usr/share/doc symlink point to the package it depends on,
pcregrep.

files added:
debian/README.Versioning.libtool

debian/compat

debian/dirs

debian/libpcre3-dev.dirs

debian/libpcre3-dev.files

debian/libpcre3-dev.manpages

debian/libpcre3.dirs

debian/libpcre3.docs

debian/libpcre3.files

debian/pcre-config.1

debian/pcregrep.dirs

debian/pcregrep.files

debian/pcregrep.install

debian/pcregrep.links

debian/pgrep.links

debian/zpcregrep

doc/html

doc/html/index.html

doc/html/pcre.html

doc/html/pcre_compile.html

doc/html/pcre_config.html

doc/html/pcre_copy_named_substring.html

doc/html/pcre_copy_substring.html

doc/html/pcre_exec.html

doc/html/pcre_free_substring.html

doc/html/pcre_free_substring_list.html

doc/html/pcre_fullinfo.html

doc/html/pcre_get_named_substring.html

doc/html/pcre_get_stringnumber.html

doc/html/pcre_get_substring.html

doc/html/pcre_get_substring_list.html

doc/html/pcre_info.html

doc/html/pcre_maketables.html

doc/html/pcre_study.html

doc/html/pcre_version.html

doc/html/pcreapi.html

doc/html/pcrebuild.html

doc/html/pcrecallout.html

doc/html/pcrecompat.html

doc/html/pcregrep.html

doc/html/pcrepattern.html

doc/html/pcreperform.html

doc/html/pcreposix.html

doc/html/pcresample.html

doc/html/pcretest.html

doc/pcre_compile.3

doc/pcre_config.3

doc/pcre_copy_named_substring.3

doc/pcre_copy_substring.3

doc/pcre_exec.3

doc/pcre_free_substring.3

doc/pcre_free_substring_list.3

doc/pcre_fullinfo.3

doc/pcre_get_named_substring.3

doc/pcre_get_stringnumber.3

doc/pcre_get_substring.3

doc/pcre_get_substring_list.3

doc/pcre_info.3

doc/pcre_maketables.3

doc/pcre_study.3

doc/pcre_version.3

doc/pcreapi.3

doc/pcrebuild.3

doc/pcrecallout.3

doc/pcrecompat.3

doc/pcrepattern.3

doc/pcreperform.3

doc/pcresample.3

doc/pcretest.1

libpcre.def

libpcreposix.def

makevp.bat

mkinstalldirs

pcredemo.c

printint.c

files removed:
Makefile

RunTest

debian/Makefile

debian/postinst

debian/postinst-lib

debian/prerm

debian/shlibs

dll.mk

doc/pcre-config.1

doc/pcre.7

doc/pcre.html

doc/pcregrep.html

doc/pcreposix.html

doc/pcreposix.txt

ltconfig

pcre-config

pcre.h

pcreposix.c.orig

perltest8

testdata/testinput6

testdata/testoutput6

files modified:
AUTHORS

COPYING

ChangeLog

LICENCE

Makefile.in

NEWS

NON-UNIX-USE

README

RunTest.in

config.guess *

config.in

config.sub *

configure

configure.in

debian/README.Debian

debian/changelog

debian/control

debian/copyright

debian/rules

dftables.c

doc/Tech.Notes

doc/pcre.3

doc/pcre.txt

doc/pcregrep.1

doc/pcregrep.txt

doc/pcreposix.3

doc/pcretest.txt

doc/perltest.txt

get.c

internal.h

ltmain.sh

maketables.c

pcre-config.in

pcre.c

pcre.def

pcre.in

pcregrep.c

pcreposix.c

pcreposix.h

pcretest.c

perltest

study.c

testdata/testinput1

testdata/testinput2

testdata/testinput3

testdata/testinput4

testdata/testinput5

testdata/testoutput1

testdata/testoutput2

testdata/testoutput3

testdata/testoutput4

testdata/testoutput5

Show diffs side-by-side

added added

removed removed

doc/html/pcretest.html

<html>

<head>

<title>pcretest specification</title>

</head>

This HTML document has been generated automatically from the original man page.

If there is any nonsense in it, please consult the man page, in case the

conversion went wrong.

<ul>

<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>

<li><a name="TOC2" href="#SEC2">OPTIONS</a>

<li><a name="TOC3" href="#SEC3">DESCRIPTION</a>

<li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a>

<li><a name="TOC5" href="#SEC5">CALLOUTS</a>

<li><a name="TOC6" href="#SEC6">DATA LINES</a>

<li><a name="TOC7" href="#SEC7">OUTPUT FROM PCRETEST</a>

<li><a name="TOC8" href="#SEC8">AUTHOR</a>

</ul>

<a name="SEC1" href="#TOC1">SYNOPSIS</a>

pcretest [-d] [-i] [-m] [-o osize] [-p] [-t] [source] [destination]

pcretest was written as a test program for the PCRE regular expression

library itself, but it can also be used for experimenting with regular

expressions. This document describes the features of the test program; for

details of the regular expressions themselves, see the

<a href="pcrepattern.html">pcrepattern</a>

documentation. For details of PCRE and its options, see the

<a href="pcreapi.html">pcreapi</a>

documentation.

<a name="SEC2" href="#TOC1">OPTIONS</a>

-C

Output the version number of the PCRE library, and all available information

about the optional features that are included, and then exit.

-d

Behave as if each regex had the /D modifier (see below); the internal

form is output after compilation.

-i

Behave as if each regex had the /I modifier; information about the

compiled pattern is given after compilation.

-m

Output the size of each compiled pattern after it has been compiled. This is

equivalent to adding /M to each regular expression. For compatibility with

earlier versions of pcretest, -s is a synonym for -m.

-o osize

Set the number of elements in the output vector that is used when calling PCRE

to be osize. The default value is 45, which is enough for 14 capturing

subexpressions. The vector size can be changed for individual matching calls by

including \O in the data line (see below).

-p

Behave as if each regex has /P modifier; the POSIX wrapper API is used

to call PCRE. None of the other options has any effect when -p is set.

-t

Run each compile, study, and match many times with a timer, and output

resulting time per compile or match (in milliseconds). Do not set -t with

-m, because you will then get the size output 20000 times and the timing

will be distorted.

<a name="SEC3" href="#TOC1">DESCRIPTION</a>

If pcretest is given two filename arguments, it reads from the first and

writes to the second. If it is given only one filename argument, it reads from

that file and writes to stdout. Otherwise, it reads from stdin and writes to

stdout, and prompts for each line of input, using "re>" to prompt for regular

expressions, and "data>" to prompt for data lines.

The program handles any number of sets of input on a single input file. Each

set starts with a regular expression, and continues with any number of data

lines to be matched against the pattern.

Each line is matched separately and independently. If you want to do

multiple-line matches, you have to use the \n escape sequence in a single line

of input to encode the newline characters. The maximum length of data line is

30,000 characters.

An empty line signals the end of the data lines, at which point a new regular

expression is read. The regular expressions are given enclosed in any

non-alphameric delimiters other than backslash, for example

<pre>

100

/(a|bc)x+yz/

101

</PRE>

102

103

104

White space before the initial delimiter is ignored. A regular expression may

105

be continued over several input lines, in which case the newline characters are

106

included within it. It is possible to include the delimiter within the pattern

107

by escaping it, for example

108

109

110

<pre>

111

/abc\/def/

112

</PRE>

113

114

115

If you do so, the escape and the delimiter form part of the pattern, but since

116

delimiters are always non-alphameric, this does not affect its interpretation.

117

If the terminating delimiter is immediately followed by a backslash, for

118

example,

119

120

121

<pre>

122

/abc/\

123

</PRE>

124

125

126

then a backslash is added to the end of the pattern. This is done to provide a

127

way of testing the error condition that arises if a pattern finishes with a

128

backslash, because

129

130

131

<pre>

132

/abc\/

133

</PRE>

134

135

136

is interpreted as the first line of a pattern that starts with "abc/", causing

137

pcretest to read the next line as a continuation of the regular expression.

138

139

<a name="SEC4" href="#TOC1">PATTERN MODIFIERS</a>

140

141

The pattern may be followed by i, m, s, or x to set the

142

PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options,

143

respectively. For example:

144

145

146

<pre>

147

/caseless/i

148

</PRE>

149

150

151

These modifier letters have the same effect as they do in Perl. There are

152

others that set PCRE options that do not correspond to anything in Perl:

153

/A, /E, /N, /U, and /X set PCRE_ANCHORED,

154

PCRE_DOLLAR_ENDONLY, PCRE_NO_AUTO_CAPTURE, PCRE_UNGREEDY, and PCRE_EXTRA

155

respectively.

156

157

158

Searching for all possible matches within each subject string can be requested

159

by the /g or /G modifier. After finding a match, PCRE is called

160

again to search the remainder of the subject string. The difference between

161

/g and /G is that the former uses the startoffset argument to

162

pcre_exec() to start searching at a new point within the entire string

163

(which is in effect what Perl does), whereas the latter passes over a shortened

164

substring. This makes a difference to the matching process if the pattern

165

begins with a lookbehind assertion (including \b or \B).

166

167

168

If any call to pcre_exec() in a /g or /G sequence matches an

169

empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED

170

flags set in order to search for another, non-empty, match at the same point.

171

If this second match fails, the start offset is advanced by one, and the normal

172

match is retried. This imitates the way Perl handles such cases when using the

173

/g modifier or the split() function.

174

175

176

There are a number of other modifiers for controlling the way pcretest

177

operates.

178

179

180

The /+ modifier requests that as well as outputting the substring that

181

matched the entire pattern, pcretest should in addition output the remainder of

182

the subject string. This is useful for tests where the subject contains

183

multiple copies of the same substring.

184

185

186

The /L modifier must be followed directly by the name of a locale, for

187

example,

188

189

190

<pre>

191

/pattern/Lfr

192

</PRE>

193

194

195

For this reason, it must be the last modifier letter. The given locale is set,

196

pcre_maketables() is called to build a set of character tables for the

197

locale, and this is then passed to pcre_compile() when compiling the

198

regular expression. Without an /L modifier, NULL is passed as the tables

199

pointer; that is, /L applies only to the expression on which it appears.

200

201

202

The /I modifier requests that pcretest output information about the

203

compiled expression (whether it is anchored, has a fixed first character, and

204

so on). It does this by calling pcre_fullinfo() after compiling an

205

expression, and outputting the information it gets back. If the pattern is

206

studied, the results of that are also output.

207

208

209

The /D modifier is a PCRE debugging feature, which also assumes /I.

210

It causes the internal form of compiled regular expressions to be output after

211

compilation. If the pattern was studied, the information returned is also

212

output.

213

214

215

The /S modifier causes pcre_study() to be called after the

216

expression has been compiled, and the results used when the expression is

217

matched.

218

219

220

The /M modifier causes the size of memory block used to hold the compiled

221

pattern to be output.

222

223

224

The /P modifier causes pcretest to call PCRE via the POSIX wrapper

225

API rather than its native API. When this is done, all other modifiers except

226

/i, /m, and /+ are ignored. REG_ICASE is set if /i is

227

present, and REG_NEWLINE is set if /m is present. The wrapper functions

228

force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.

229

230

231

The /8 modifier causes pcretest to call PCRE with the PCRE_UTF8

232

option set. This turns on support for UTF-8 character handling in PCRE,

233

provided that it was compiled with this support enabled. This modifier also

234

causes any non-printing characters in output strings to be printed using the

235

\x{hh...} notation if they are valid UTF-8 sequences.

236

237

238

If the /? modifier is used with /8, it causes pcretest to

239

call pcre_compile() with the PCRE_NO_UTF8_CHECK option, to suppress the

240

checking of the string for UTF-8 validity.

241

242

<a name="SEC5" href="#TOC1">CALLOUTS</a>

243

244

If the pattern contains any callout requests, pcretest's callout function

245

will be called. By default, it displays the callout number, and the start and

246

current positions in the text at the callout time. For example, the output

247

248

249

<pre>

250

--->pqrabcdef

251

0 ^ ^

252

</PRE>

253

254

255

indicates that callout number 0 occurred for a match attempt starting at the

256

fourth character of the subject string, when the pointer was at the seventh

257

character. The callout function returns zero (carry on matching) by default.

258

259

260

Inserting callouts may be helpful when using pcretest to check

261

complicated regular expressions. For further information about callouts, see

262

the

263

<a href="pcrecallout.html">pcrecallout</a>

264

documentation.

265

266

267

For testing the PCRE library, additional control of callout behaviour is

268

available via escape sequences in the data, as described in the following

269

section. In particular, it is possible to pass in a number as callout data (the

270

default is zero). If the callout function receives a non-zero number, it

271

returns that value instead of zero.

272

273

<a name="SEC6" href="#TOC1">DATA LINES</a>

274

275

Before each data line is passed to pcre_exec(), leading and trailing

276

whitespace is removed, and it is then scanned for \ escapes. Some of these are

277

pretty esoteric features, intended for checking out some of the more

278

complicated features of PCRE. If you are just testing "ordinary" regular

279

expressions, you probably don't need any of these. The following escapes are

280

recognized:

281

282

283

<pre>

284

\a alarm (= BEL)

285

\b backspace

286

\e escape

287

\f formfeed

288

\n newline

289

\r carriage return

290

\t tab

291

\v vertical tab

292

\nnn octal character (up to 3 octal digits)

293

\xhh hexadecimal character (up to 2 hex digits)

294

\x{hh...} hexadecimal character, any number of digits

295

in UTF-8 mode

296

\A pass the PCRE_ANCHORED option to pcre_exec()

297

\B pass the PCRE_NOTBOL option to pcre_exec()

298

\Cdd call pcre_copy_substring() for substring dd

299

after a successful match (any decimal number

300

less than 32)

301

\Cname call pcre_copy_named_substring() for substring

302

"name" after a successful match (name termin-

303

ated by next non alphanumeric character)

304

\C+ show the current captured substrings at callout

305

time

306

\C- do not supply a callout function

307

\C!n return 1 instead of 0 when callout number n is

308

reached

309

\C!n!m return 1 instead of 0 when callout number n is

310

reached for the nth time

311

\C*n pass the number n (may be negative) as callout

312

data

313

\Gdd call pcre_get_substring() for substring dd

314

after a successful match (any decimal number

315

less than 32)

316

\Gname call pcre_get_named_substring() for substring

317

"name" after a successful match (name termin-

318

ated by next non-alphanumeric character)

319

\L call pcre_get_substringlist() after a

320

successful match

321

\M discover the minimum MATCH_LIMIT setting

322

\N pass the PCRE_NOTEMPTY option to pcre_exec()

323

\Odd set the size of the output vector passed to

324

pcre_exec() to dd (any number of decimal

325

digits)

326

\S output details of memory get/free calls during matching

327

\Z pass the PCRE_NOTEOL option to pcre_exec()

328

\? pass the PCRE_NO_UTF8_CHECK option to

329

pcre_exec()

330

</PRE>

331

332

333

If \M is present, pcretest calls pcre_exec() several times, with

334

different values in the match_limit field of the pcre_extra data

335

structure, until it finds the minimum number that is needed for

336

pcre_exec() to complete. This number is a measure of the amount of

337

recursion and backtracking that takes place, and checking it out can be

338

instructive. For most simple matches, the number is quite small, but for

339

patterns with very large numbers of matching possibilities, it can become large

340

very quickly with increasing length of subject string.

341

342

343

When \O is used, it may be higher or lower than the size set by the -O

344

option (or defaulted to 45); \O applies only to the call of pcre_exec()

345

for the line in which it appears.

346

347

348

A backslash followed by anything else just escapes the anything else. If the

349

very last character is a backslash, it is ignored. This gives a way of passing

350

an empty line as data, since a real empty line terminates the data input.

351

352

353

If /P was present on the regex, causing the POSIX wrapper API to be used,

354

only \B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL

355

to be passed to regexec() respectively.

356

357

358

The use of \x{hh...} to represent UTF-8 characters is not dependent on the use

359

of the /8 modifier on the pattern. It is recognized always. There may be

360

any number of hexadecimal digits inside the braces. The result is from one to

361

six bytes, encoded according to the UTF-8 rules.

362

363

<a name="SEC7" href="#TOC1">OUTPUT FROM PCRETEST</a>

364

365

When a match succeeds, pcretest outputs the list of captured substrings that

366

pcre_exec() returns, starting with number 0 for the string that matched

367

the whole pattern. Here is an example of an interactive pcretest run.

368

369

370

<pre>

371

$ pcretest

372

PCRE version 4.00 08-Jan-2003

373

</PRE>

374

375

376

<pre>

377

re> /^abc(\d+)/

378

data> abc123

379

0: abc123

380

1: 123

381

data> xyz

382

No match

383

</PRE>

384

385

386

If the strings contain any non-printing characters, they are output as \0x

387

escapes, or as \x{...} escapes if the /8 modifier was present on the

388

pattern. If the pattern has the /+ modifier, then the output for

389

substring 0 is followed by the the rest of the subject string, identified by

390

"0+" like this:

391

392

393

<pre>

394

re> /cat/+

395

data> cataract

396

0: cat

397

0+ aract

398

</PRE>

399

400

401

If the pattern has the /g or /G modifier, the results of successive

402

matching attempts are output in sequence, like this:

403

404

405

<pre>

406

re> /\Bi(\w\w)/g

407

data> Mississippi

408

0: iss

409

1: ss

410

0: iss

411

1: ss

412

0: ipp

413

1: pp

414

</PRE>

415

416

417

"No match" is output only if the first match attempt fails.

418

419

420

If any of the sequences \C, \G, or \L are present in a

421

data line that is successfully matched, the substrings extracted by the

422

convenience functions are output with C, G, or L after the string number

423

instead of a colon. This is in addition to the normal full list. The string

424

length (that is, the return from the extraction function) is given in

425

parentheses after each string for \C and \G.

426

427

428

Note that while patterns can be continued over several lines (a plain ">"

429

prompt is used for continuations), data lines may not. However newlines can be

430

included in data by means of the \n escape.

431

432

<a name="SEC8" href="#TOC1">AUTHOR</a>

433

434

Philip Hazel <ph10@cam.ac.uk>

435

436

University Computing Service,

437

438

Cambridge CB2 3QG, England.

439

440

441

Last updated: 09 December 2003

442

443

Older »