~ubuntu-branches/ubuntu/precise/perl/precise

« back to all changes in this revision

Viewing changes to pod/perlrebackslash.pod

Committer: Bazaar Package Importer
Author(s): Niko Tyni
Date: 2011-02-06 11:31:38 UTC
mto: (8.2.12 experimental) (1.1.12)
mto: This revision was merged to the branch mainline in revision 46.
Revision ID: james.westby@ubuntu.com-20110206113138-lzpm3g6rur7i3eyp

Tags: upstream-5.12.3

Import upstream version 5.12.3

files added:
cpan/CGI/t/headers.t

cpan/CGI/t/multipart_init.t

pod/perl5123delta.pod

files modified:
Cross/config.sh-arm-linux

Cross/config.sh-arm-linux-n770

INSTALL

MANIFEST

META.yml

Makefile.SH

NetWare/Makefile

NetWare/config_H.wc

Porting/config.sh

Porting/config_H

README.aix

README.haiku

README.os2

README.vms

README.vos

cpan/CGI/lib/CGI.pm

cpan/Module-Build/lib/Module/Build/Platform/cygwin.pm

dist/B-Deparse/Deparse.pm

dist/Module-CoreList/Changes

dist/Module-CoreList/lib/Module/CoreList.pm

dist/constant/t/constant.t

epoc/config.sh

epoc/createpkg.pl

ext/B/t/concise-xs.t

ext/Socket/Socket.pm

ext/Socket/Socket.xs

ext/VMS-Stdio/t/vms_stdio.t

gv.c

hints/catamount.sh

hints/vos.sh

lib/utf8_heavy.pl

patchlevel.h

perlio.c

plan9/config.plan9

plan9/config_sh.sample

pod.lst

pod/perl.pod

pod/perl5122delta.pod

pod/perldebguts.pod

pod/perlebcdic.pod

pod/perlhack.pod

pod/perlhist.pod

pod/perlpolicy.pod

pod/perlport.pod

pod/perlre.pod

pod/perlrebackslash.pod

pod/perlrecharclass.pod

pod/perlrepository.pod

pod/perlreref.pod

pod/perlunicode.pod

pod/perluniintro.pod

pod/perlvar.pod

pp_hot.c

t/op/sub_lval.t

t/re/regexp_unicode_prop.t

t/uni/class.t

vms/descrip_mms.template

vms/vms.c

win32/Makefile

win32/Makefile.ce

win32/makefile.mk

win32/pod.mak

Show diffs side-by-side

added added

removed removed

pod/perlrebackslash.pod

purpose of this document is to have a quick reference guide describing all

backslash and escape sequences.

=head2 The backslash

In a regular expression, the backslash can perform one of two tasks:

\A Beginning of string. Not in [].

\b Word/non-word boundary. (Backspace in []).

\B Not a word/non-word boundary. Not in [].

\cX Control-X (X can be any ASCII character).

\cX Control-X

\C Single octet, even under UTF-8. Not in [].

\d Character class for digits.

\D Character class for non-digits.

112

111

113

112

A handful of characters have a dedicated I<character escape>. The following

114

113

table shows them, along with their ASCII code points (in decimal and hex),

115

their ASCII name, the control escape (see below) and a short description.

114

their ASCII name, the control escape on ASCII platforms and a short

115

description. (For EBCDIC platforms, see L<perlebcdic/OPERATOR DIFFERENCES>.)

116

117

Seq. Code Point ASCII Cntr Description.

117

Seq. Code Point ASCII Cntrl Description.

118

Dec Hex

119

\a 7 07 BEL \cG alarm or bell

120

\b 8 08 BS \cH backspace [1]

128

129

=item [1]

130

131

C<\b> is only the backspace character inside a character class. Outside a

131

C<\b> is the backspace character only inside a character class. Outside a

132

character class, C<\b> is a word/non-word boundary.

133

134

=item [2]

135

136

C<\n> matches a logical newline. Perl will convert between C<\n> and your

137

OSses native newline character when reading from or writing to text files.

137

OS's native newline character when reading from or writing to text files.

138

139

=back

140

145

=head3 Control characters

146

147

C<\c> is used to denote a control character; the character following C<\c>

148

is the name of the control character. For instance, C</\cM/> matches the

149

character I<control-M> (a carriage return, code point 13). The case of the

150

character following C<\c> doesn't matter: C<\cM> and C<\cm> match the same

151

character.

148

determines the value of the construct. For example the value of C<\cA> is

149

C<chr(1)>, and the value of C<\cb> is C<chr(2)>, etc.

150

The gory details are in L<perlop/"Regexp Quote-Like Operators">. A complete

151

list of what C<chr(1)>, etc. means for ASCII and EBCDIC platforms is in

152

L<perlebcdic/OPERATOR DIFFERENCES>.

153

154

Note that C<\c\> alone at the end of a regular expression (or doubled-quoted

155

string) is not valid. The backslash must be followed by another character.

156

That is, C<\c\I<X>> means C<chr(28) . 'I<X>'> for all characters I<X>.

157

158

To write platform-independent code, you must use C<\N{I<NAME>}> instead, like

159

C<\N{ESCAPE}> or C<\N{U+001B}>, see L<charnames>.

152

160

153

161

Mnemonic: I<c>ontrol character.

154

162

158

166

159

167

=head3 Named or numbered characters

160

168

161

All Unicode characters have a Unicode name and numeric ordinal value. Use the

169

Unicode characters have a Unicode name and numeric ordinal value. Use the

162

170

C<\N{}> construct to specify a character by either of these values.

163

171

164

172

To specify by name, the name of the character goes between the curly braces.

171

179

desired character. It is customary (but not required) to use leading zeros to

172

180

pad the number to 4 digits. Thus C<\N{U+0041}> means

173

181

C<Latin Capital Letter A>, and you will rarely see it written without the two

174

leading zeros. C<\N{U+0041}> means C<A> even on EBCDIC machines (where the

175

ordinal value of C<A> is not 0x41).

182

leading zeros. C<\N{U+0041}> means "A" even on EBCDIC machines (where the

183

ordinal value of "A" is not 0x41).

176

184

177

185

It is even possible to give your own names to characters, and even to short

178

186

sequences of characters. For details, see L<charnames>.

236

244

237

245

=item 3

238

246

239

If the number following the backslash is N (decimal), and Perl already has

247

If the number following the backslash is N (in decimal), and Perl already has

240

248

seen N capture groups, Perl will consider this to be a backreference.

241

Otherwise, it will consider it to be an octal escape. Note that if N > 999,

242

Perl only takes the first three digits for the octal escape; the rest is

243

matched as is.

249

Otherwise, it will consider it to be an octal escape. Note that if N has more

250

than three digits, Perl only takes the first three for the octal escape;

251

the rest are matched as is.

244

252

245

253

my $pat = "(" x 999;

246

254

$pat .= "a";

335

343

include things like "letter", or "thai character". Capitalizing the

336

344

sequence to C<\PP> and C<\P{Property}> make the sequence match a character

337

345

that doesn't match the given Unicode property. For more details, see

338

L<perlrecharclass/Backslashed sequences> and

346

L<perlrecharclass/Backslash sequences> and

339

347

L<perlunicode/Unicode Character Properties>.

340

348

341

349

Mnemonic: I<p>roperty.

Older »