~dkuhlman/python-training-materials/Materials : revision 45

1

2

.. _lexical:

3

4

****************

5

Lexical analysis

6

****************

7

8

.. index::

9

single: lexical analysis

10

single: parser

11

single: token

12

13

A Python program is read by a *parser*. Input to the parser is a stream of

14

*tokens*, generated by the *lexical analyzer*. This chapter describes how the

15

lexical analyzer breaks a file into tokens.

16

17

Python uses the 7-bit ASCII character set for program text.

18

19

.. versionadded:: 2.3

20

An encoding declaration can be used to indicate that string literals and

21

comments use an encoding different from ASCII.

22

23

For compatibility with older versions, Python only warns if it finds 8-bit

24

characters; those warnings should be corrected by either declaring an explicit

25

encoding, or using escape sequences if those bytes are binary data, instead of

26

characters.

27

28

The run-time character set depends on the I/O devices connected to the program

29

but is generally a superset of ASCII.

30

31

**Future compatibility note:** It may be tempting to assume that the character

32

set for 8-bit characters is ISO Latin-1 (an ASCII superset that covers most

33

western languages that use the Latin alphabet), but it is possible that in the

34

future Unicode text editors will become common. These generally use the UTF-8

35

encoding, which is also an ASCII superset, but with very different use for the

36

characters with ordinals 128-255. While there is no consensus on this subject

37

yet, it is unwise to assume either Latin-1 or UTF-8, even though the current

38

implementation appears to favor Latin-1. This applies both to the source

39

character set and the run-time character set.

40

41

42

.. _line-structure:

43

44

Line structure

45

==============

46

47

.. index:: single: line structure

48

49

A Python program is divided into a number of *logical lines*.

50

51

52

.. _logical:

53

54

Logical lines

55

-------------

56

57

.. index::

58

single: logical line

59

single: physical line

60

single: line joining

61

single: NEWLINE token

62

63

The end of a logical line is represented by the token NEWLINE. Statements

64

cannot cross logical line boundaries except where NEWLINE is allowed by the

65

syntax (e.g., between statements in compound statements). A logical line is

66

constructed from one or more *physical lines* by following the explicit or

67

implicit *line joining* rules.

68

69

70

.. _physical:

71

72

Physical lines

73

--------------

74

75

A physical line is a sequence of characters terminated by an end-of-line

76

sequence. In source files, any of the standard platform line termination

77

sequences can be used - the Unix form using ASCII LF (linefeed), the Windows

78

form using the ASCII sequence CR LF (return followed by linefeed), or the old

79

Macintosh form using the ASCII CR (return) character. All of these forms can be

80

used equally, regardless of platform.

81

82

When embedding Python, source code strings should be passed to Python APIs using

83

the standard C conventions for newline characters (the ``\n`` character,

84

representing ASCII LF, is the line terminator).

85

86

87

.. _comments:

88

89

Comments

90

--------

91

92

.. index::

93

single: comment

94

single: hash character

95

96

A comment starts with a hash character (``#``) that is not part of a string

97

literal, and ends at the end of the physical line. A comment signifies the end

98

of the logical line unless the implicit line joining rules are invoked. Comments

99

are ignored by the syntax; they are not tokens.

100

101

102

.. _encodings:

103

104

Encoding declarations

105

---------------------

106

107

.. index:: source character set, encoding declarations (source file)

108

109

If a comment in the first or second line of the Python script matches the

110

regular expression ``coding[=:]\s*([-\w.]+)``, this comment is processed as an

111

encoding declaration; the first group of this expression names the encoding of

112

the source code file. The encoding declaration must appear on a line of its

113

own. If it is the second line, the first line must also be a comment-only line.

114

The recommended forms of an encoding expression are ::

115

116

# -*- coding: <encoding-name> -*-

117

118

which is recognized also by GNU Emacs, and ::

119

120

# vim:fileencoding=<encoding-name>

121

122

which is recognized by Bram Moolenaar's VIM. In addition, if the first bytes of

123

the file are the UTF-8 byte-order mark (``'\xef\xbb\xbf'``), the declared file

124

encoding is UTF-8 (this is supported, among others, by Microsoft's

125

:program:`notepad`).

126

127

If an encoding is declared, the encoding name must be recognized by Python. The

128

encoding is used for all lexical analysis, in particular to find the end of a

129

string, and to interpret the contents of Unicode literals. String literals are

130

converted to Unicode for syntactical analysis, then converted back to their

131

original encoding before interpretation starts.

132

133

.. XXX there should be a list of supported encodings.

134

135

136

.. _explicit-joining:

137

138

Explicit line joining

139

---------------------

140

141

.. index::

142

single: physical line

143

single: line joining

144

single: line continuation

145

single: backslash character

146

147

Two or more physical lines may be joined into logical lines using backslash

148

characters (``\``), as follows: when a physical line ends in a backslash that is

149

not part of a string literal or comment, it is joined with the following forming

150

a single logical line, deleting the backslash and the following end-of-line

151

character. For example::

152

153

if 1900 < year < 2100 and 1 <= month <= 12 \

154

and 1 <= day <= 31 and 0 <= hour < 24 \

155

and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date

156

return 1

157

158

A line ending in a backslash cannot carry a comment. A backslash does not

159

continue a comment. A backslash does not continue a token except for string

160

literals (i.e., tokens other than string literals cannot be split across

161

physical lines using a backslash). A backslash is illegal elsewhere on a line

162

outside a string literal.

163

164

165

.. _implicit-joining:

166

167

Implicit line joining

168

---------------------

169

170

Expressions in parentheses, square brackets or curly braces can be split over

171

more than one physical line without using backslashes. For example::

172

173

month_names = ['Januari', 'Februari', 'Maart', # These are the

174

'April', 'Mei', 'Juni', # Dutch names

175

'Juli', 'Augustus', 'September', # for the months

176

'Oktober', 'November', 'December'] # of the year

177

178

Implicitly continued lines can carry comments. The indentation of the

179

continuation lines is not important. Blank continuation lines are allowed.

180

There is no NEWLINE token between implicit continuation lines. Implicitly

181

continued lines can also occur within triple-quoted strings (see below); in that

182

case they cannot carry comments.

183

184

185

.. _blank-lines:

186

187

Blank lines

188

-----------

189

190

.. index:: single: blank line

191

192

A logical line that contains only spaces, tabs, formfeeds and possibly a

193

comment, is ignored (i.e., no NEWLINE token is generated). During interactive

194

input of statements, handling of a blank line may differ depending on the

195

implementation of the read-eval-print loop. In the standard implementation, an

196

entirely blank logical line (i.e. one containing not even whitespace or a

197

comment) terminates a multi-line statement.

198

199

200

.. _indentation:

201

202

Indentation

203

-----------

204

205

.. index::

206

single: indentation

207

single: whitespace

208

single: leading whitespace

209

single: space

210

single: tab

211

single: grouping

212

single: statement grouping

213

214

Leading whitespace (spaces and tabs) at the beginning of a logical line is used

215

to compute the indentation level of the line, which in turn is used to determine

216

the grouping of statements.

217

218

First, tabs are replaced (from left to right) by one to eight spaces such that

219

the total number of characters up to and including the replacement is a multiple

220

of eight (this is intended to be the same rule as used by Unix). The total

221

number of spaces preceding the first non-blank character then determines the

222

line's indentation. Indentation cannot be split over multiple physical lines

223

using backslashes; the whitespace up to the first backslash determines the

224

indentation.

225

226

**Cross-platform compatibility note:** because of the nature of text editors on

227

non-UNIX platforms, it is unwise to use a mixture of spaces and tabs for the

228

indentation in a single source file. It should also be noted that different

229

platforms may explicitly limit the maximum indentation level.

230

231

A formfeed character may be present at the start of the line; it will be ignored

232

for the indentation calculations above. Formfeed characters occurring elsewhere

233

in the leading whitespace have an undefined effect (for instance, they may reset

234

the space count to zero).

235

236

.. index::

237

single: INDENT token

238

single: DEDENT token

239

240

The indentation levels of consecutive lines are used to generate INDENT and

241

DEDENT tokens, using a stack, as follows.

242

243

Before the first line of the file is read, a single zero is pushed on the stack;

244

this will never be popped off again. The numbers pushed on the stack will

245

always be strictly increasing from bottom to top. At the beginning of each

246

logical line, the line's indentation level is compared to the top of the stack.

247

If it is equal, nothing happens. If it is larger, it is pushed on the stack, and

248

one INDENT token is generated. If it is smaller, it *must* be one of the

249

numbers occurring on the stack; all numbers on the stack that are larger are

250

popped off, and for each number popped off a DEDENT token is generated. At the

251

end of the file, a DEDENT token is generated for each number remaining on the

252

stack that is larger than zero.

253

254

Here is an example of a correctly (though confusingly) indented piece of Python

255

code::

256

257

def perm(l):

258

# Compute the list of all permutations of l

259

if len(l) <= 1:

260

return [l]

261

r = []

262

for i in range(len(l)):

263

s = l[:i] + l[i+1:]

264

p = perm(s)

265

for x in p:

266

r.append(l[i:i+1] + x)

267

return r

268

269

The following example shows various indentation errors::

270

271

def perm(l): # error: first line indented

272

for i in range(len(l)): # error: not indented

273

s = l[:i] + l[i+1:]

274

p = perm(l[:i] + l[i+1:]) # error: unexpected indent

275

for x in p:

276

r.append(l[i:i+1] + x)

277

return r # error: inconsistent dedent

278

279

(Actually, the first three errors are detected by the parser; only the last

280

error is found by the lexical analyzer --- the indentation of ``return r`` does

281

not match a level popped off the stack.)

282

283

284

.. _whitespace:

285

286

Whitespace between tokens

287

-------------------------

288

289

Except at the beginning of a logical line or in string literals, the whitespace

290

characters space, tab and formfeed can be used interchangeably to separate

291

tokens. Whitespace is needed between two tokens only if their concatenation

292

could otherwise be interpreted as a different token (e.g., ab is one token, but

293

a b is two tokens).

294

295

296

.. _other-tokens:

297

298

Other tokens

299

============

300

301

Besides NEWLINE, INDENT and DEDENT, the following categories of tokens exist:

302

*identifiers*, *keywords*, *literals*, *operators*, and *delimiters*. Whitespace

303

characters (other than line terminators, discussed earlier) are not tokens, but

304

serve to delimit tokens. Where ambiguity exists, a token comprises the longest

305

possible string that forms a legal token, when read from left to right.

306

307

308

.. _identifiers:

309

310

Identifiers and keywords

311

========================

312

313

.. index::

314

single: identifier

315

single: name

316

317

Identifiers (also referred to as *names*) are described by the following lexical

318

definitions:

319

320

.. productionlist::

321

identifier: (`letter`|"_") (`letter` | `digit` | "_")*

322

letter: `lowercase` | `uppercase`

323

lowercase: "a"..."z"

324

uppercase: "A"..."Z"

325

digit: "0"..."9"

326

327

Identifiers are unlimited in length. Case is significant.

328

329

330

.. _keywords:

331

332

Keywords

333

--------

334

335

.. index::

336

single: keyword

337

single: reserved word

338

339

The following identifiers are used as reserved words, or *keywords* of the

340

language, and cannot be used as ordinary identifiers. They must be spelled

341

exactly as written here:

342

343

.. sourcecode:: text

344

345

and del from not while

346

as elif global or with

347

assert else if pass yield

348

break except import print

349

class exec in raise

350

continue finally is return

351

def for lambda try

352

353

.. versionchanged:: 2.4

354

:const:`None` became a constant and is now recognized by the compiler as a name

355

for the built-in object :const:`None`. Although it is not a keyword, you cannot

356

assign a different object to it.

357

358

.. versionchanged:: 2.5

359

Using :keyword:`as` and :keyword:`with` as identifiers triggers a warning. To

360

use them as keywords, enable the ``with_statement`` future feature .

361

362

.. versionchanged:: 2.6

363

:keyword:`as` and :keyword:`with` are full keywords.

364

365

366

.. _id-classes:

367

368

Reserved classes of identifiers

369

-------------------------------

370

371

Certain classes of identifiers (besides keywords) have special meanings. These

372

classes are identified by the patterns of leading and trailing underscore

373

characters:

374

375

``_*``

376

Not imported by ``from module import *``. The special identifier ``_`` is used

377

in the interactive interpreter to store the result of the last evaluation; it is

378

stored in the :mod:`__builtin__` module. When not in interactive mode, ``_``

379

has no special meaning and is not defined. See section :ref:`import`.

380

381

.. note::

382

383

The name ``_`` is often used in conjunction with internationalization;

384

refer to the documentation for the :mod:`gettext` module for more

385

information on this convention.

386

387

``__*__``

388

System-defined names. These names are defined by the interpreter and its

389

implementation (including the standard library). Current system names are

390

discussed in the :ref:`specialnames` section and elsewhere. More will likely

391

be defined in future versions of Python. *Any* use of ``__*__`` names, in

392

any context, that does not follow explicitly documented use, is subject to

393

breakage without warning.

394

395

``__*``

396

Class-private names. Names in this category, when used within the context of a

397

class definition, are re-written to use a mangled form to help avoid name

398

clashes between "private" attributes of base and derived classes. See section

399

:ref:`atom-identifiers`.

400

401

402

.. _literals:

403

404

Literals

405

========

406

407

.. index::

408

single: literal

409

single: constant

410

411

Literals are notations for constant values of some built-in types.

412

413

414

.. _strings:

415

416

String literals

417

---------------

418

419

.. index:: single: string literal

420

421

String literals are described by the following lexical definitions:

422

423

.. index:: single: ASCII@ASCII

424

425

.. productionlist::

426

stringliteral: [`stringprefix`](`shortstring` | `longstring`)

427

stringprefix: "r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR"

428

: | "b" | "B" | "br" | "Br" | "bR" | "BR"

429

shortstring: "'" `shortstringitem`* "'" | '"' `shortstringitem`* '"'

430

longstring: "'''" `longstringitem`* "'''"

431

: | '"""' `longstringitem`* '"""'

432

shortstringitem: `shortstringchar` | `escapeseq`

433

longstringitem: `longstringchar` | `escapeseq`

434

shortstringchar: <any source character except "\" or newline or the quote>

435

longstringchar: <any source character except "\">

436

escapeseq: "\" <any ASCII character>

437

438

One syntactic restriction not indicated by these productions is that whitespace

439

is not allowed between the :token:`stringprefix` and the rest of the string

440

literal. The source character set is defined by the encoding declaration; it is

441

ASCII if no encoding declaration is given in the source file; see section

442

:ref:`encodings`.

443

444

.. index::

445

single: triple-quoted string

446

single: Unicode Consortium

447

single: string; Unicode

448

single: raw string

449

450

In plain English: String literals can be enclosed in matching single quotes

451

(``'``) or double quotes (``"``). They can also be enclosed in matching groups

452

of three single or double quotes (these are generally referred to as

453

*triple-quoted strings*). The backslash (``\``) character is used to escape

454

characters that otherwise have a special meaning, such as newline, backslash

455

itself, or the quote character. String literals may optionally be prefixed with

456

a letter ``'r'`` or ``'R'``; such strings are called :dfn:`raw strings` and use

457

different rules for interpreting backslash escape sequences. A prefix of

458

``'u'`` or ``'U'`` makes the string a Unicode string. Unicode strings use the

459

Unicode character set as defined by the Unicode Consortium and ISO 10646. Some

460

additional escape sequences, described below, are available in Unicode strings.

461

A prefix of ``'b'`` or ``'B'`` is ignored in Python 2; it indicates that the

462

literal should become a bytes literal in Python 3 (e.g. when code is

463

automatically converted with 2to3). A ``'u'`` or ``'b'`` prefix may be followed

464

by an ``'r'`` prefix.

465

466

In triple-quoted strings, unescaped newlines and quotes are allowed (and are

467

retained), except that three unescaped quotes in a row terminate the string. (A

468

"quote" is the character used to open the string, i.e. either ``'`` or ``"``.)

469

470

.. index::

471

single: physical line

472

single: escape sequence

473

single: Standard C

474

single: C

475

476

Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in strings are

477

interpreted according to rules similar to those used by Standard C. The

478

recognized escape sequences are:

479

480

+-----------------+---------------------------------+-------+

481

| Escape Sequence | Meaning | Notes |

482

+=================+=================================+=======+

483

| ``\newline`` | Ignored | |

484

+-----------------+---------------------------------+-------+

485

| ``\\`` | Backslash (``\``) | |

486

+-----------------+---------------------------------+-------+

487

| ``\'`` | Single quote (``'``) | |

488

+-----------------+---------------------------------+-------+

489

| ``\"`` | Double quote (``"``) | |

490

+-----------------+---------------------------------+-------+

491

| ``\a`` | ASCII Bell (BEL) | |

492

+-----------------+---------------------------------+-------+

493

| ``\b`` | ASCII Backspace (BS) | |

494

+-----------------+---------------------------------+-------+

495

| ``\f`` | ASCII Formfeed (FF) | |

496

+-----------------+---------------------------------+-------+

497

| ``\n`` | ASCII Linefeed (LF) | |

498

+-----------------+---------------------------------+-------+

499

| ``\N{name}`` | Character named *name* in the | |

500

| | Unicode database (Unicode only) | |

501

+-----------------+---------------------------------+-------+

502

| ``\r`` | ASCII Carriage Return (CR) | |

503

+-----------------+---------------------------------+-------+

504

| ``\t`` | ASCII Horizontal Tab (TAB) | |

505

+-----------------+---------------------------------+-------+

506

| ``\uxxxx`` | Character with 16-bit hex value | \(1) |

507

| | *xxxx* (Unicode only) | |

508

+-----------------+---------------------------------+-------+

509

| ``\Uxxxxxxxx`` | Character with 32-bit hex value | \(2) |

510

| | *xxxxxxxx* (Unicode only) | |

511

+-----------------+---------------------------------+-------+

512

| ``\v`` | ASCII Vertical Tab (VT) | |

513

+-----------------+---------------------------------+-------+

514

| ``\ooo`` | Character with octal value | (3,5) |

515

| | *ooo* | |

516

+-----------------+---------------------------------+-------+

517

| ``\xhh`` | Character with hex value *hh* | (4,5) |

518

+-----------------+---------------------------------+-------+

519

520

.. index:: single: ASCII@ASCII

521

522

Notes:

523

524

(1)

525

Individual code units which form parts of a surrogate pair can be encoded using

526

this escape sequence.

527

528

(2)

529

Any Unicode character can be encoded this way, but characters outside the Basic

530

Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is

531

compiled to use 16-bit code units (the default).

532

533

(3)

534

As in Standard C, up to three octal digits are accepted.

535

536

(4)

537

Unlike in Standard C, exactly two hex digits are required.

538

539

(5)

540

In a string literal, hexadecimal and octal escapes denote the byte with the

541

given value; it is not necessary that the byte encodes a character in the source

542

character set. In a Unicode literal, these escapes denote a Unicode character

543

with the given value.

544

545

.. index:: single: unrecognized escape sequence

546

547

Unlike Standard C, all unrecognized escape sequences are left in the string

548

unchanged, i.e., *the backslash is left in the string*. (This behavior is

549

useful when debugging: if an escape sequence is mistyped, the resulting output

550

is more easily recognized as broken.) It is also important to note that the

551

escape sequences marked as "(Unicode only)" in the table above fall into the

552

category of unrecognized escapes for non-Unicode string literals.

553

554

When an ``'r'`` or ``'R'`` prefix is present, a character following a backslash

555

is included in the string without change, and *all backslashes are left in the

556

string*. For example, the string literal ``r"\n"`` consists of two characters:

557

a backslash and a lowercase ``'n'``. String quotes can be escaped with a

558

backslash, but the backslash remains in the string; for example, ``r"\""`` is a

559

valid string literal consisting of two characters: a backslash and a double

560

quote; ``r"\"`` is not a valid string literal (even a raw string cannot end in

561

an odd number of backslashes). Specifically, *a raw string cannot end in a

562

single backslash* (since the backslash would escape the following quote

563

character). Note also that a single backslash followed by a newline is

564

interpreted as those two characters as part of the string, *not* as a line

565

continuation.

566

567

When an ``'r'`` or ``'R'`` prefix is used in conjunction with a ``'u'`` or

568

``'U'`` prefix, then the ``\uXXXX`` and ``\UXXXXXXXX`` escape sequences are

569

processed while *all other backslashes are left in the string*. For example,

570

the string literal ``ur"\u0062\n"`` consists of three Unicode characters: 'LATIN

571

SMALL LETTER B', 'REVERSE SOLIDUS', and 'LATIN SMALL LETTER N'. Backslashes can

572

be escaped with a preceding backslash; however, both remain in the string. As a

573

result, ``\uXXXX`` escape sequences are only recognized when there are an odd

574

number of backslashes.

575

576

577

.. _string-catenation:

578

579

String literal concatenation

580

----------------------------

581

582

Multiple adjacent string literals (delimited by whitespace), possibly using

583

different quoting conventions, are allowed, and their meaning is the same as

584

their concatenation. Thus, ``"hello" 'world'`` is equivalent to

585

``"helloworld"``. This feature can be used to reduce the number of backslashes

586

needed, to split long strings conveniently across long lines, or even to add

587

comments to parts of strings, for example::

588

589

re.compile("[A-Za-z_]" # letter or underscore

590

"[A-Za-z0-9_]*" # letter, digit or underscore

591

)

592

593

Note that this feature is defined at the syntactical level, but implemented at

594

compile time. The '+' operator must be used to concatenate string expressions

595

at run time. Also note that literal concatenation can use different quoting

596

styles for each component (even mixing raw strings and triple quoted strings).

597

598

599

.. _numbers:

600

601

Numeric literals

602

----------------

603

604

.. index::

605

single: number

606

single: numeric literal

607

single: integer literal

608

single: plain integer literal

609

single: long integer literal

610

single: floating point literal

611

single: hexadecimal literal

612

single: binary literal

613

single: octal literal

614

single: decimal literal

615

single: imaginary literal

616

single: complex; literal

617

618

There are four types of numeric literals: plain integers, long integers,

619

floating point numbers, and imaginary numbers. There are no complex literals

620

(complex numbers can be formed by adding a real number and an imaginary number).

621

622

Note that numeric literals do not include a sign; a phrase like ``-1`` is

623

actually an expression composed of the unary operator '``-``' and the literal

624

``1``.

625

626

627

.. _integers:

628

629

Integer and long integer literals

630

---------------------------------

631

632

Integer and long integer literals are described by the following lexical

633

definitions:

634

635

.. productionlist::

636

longinteger: `integer` ("l" | "L")

637

integer: `decimalinteger` | `octinteger` | `hexinteger` | `bininteger`

638

decimalinteger: `nonzerodigit` `digit`* | "0"

639

octinteger: "0" ("o" | "O") `octdigit`+ | "0" `octdigit`+

640

hexinteger: "0" ("x" | "X") `hexdigit`+

641

bininteger: "0" ("b" | "B") `bindigit`+

642

nonzerodigit: "1"..."9"

643

octdigit: "0"..."7"

644

bindigit: "0" | "1"

645

hexdigit: `digit` | "a"..."f" | "A"..."F"

646

647

Although both lower case ``'l'`` and upper case ``'L'`` are allowed as suffix

648

for long integers, it is strongly recommended to always use ``'L'``, since the

649

letter ``'l'`` looks too much like the digit ``'1'``.

650

651

Plain integer literals that are above the largest representable plain integer

652

(e.g., 2147483647 when using 32-bit arithmetic) are accepted as if they were

653

long integers instead. [#]_ There is no limit for long integer literals apart

654

from what can be stored in available memory.

655

656

Some examples of plain integer literals (first row) and long integer literals

657

(second and third rows)::

658

659

7 2147483647 0177

660

3L 79228162514264337593543950336L 0377L 0x100000000L

661

79228162514264337593543950336 0xdeadbeef

662

663

664

.. _floating:

665

666

Floating point literals

667

-----------------------

668

669

Floating point literals are described by the following lexical definitions:

670

671

.. productionlist::

672

floatnumber: `pointfloat` | `exponentfloat`

673

pointfloat: [`intpart`] `fraction` | `intpart` "."

674

exponentfloat: (`intpart` | `pointfloat`) `exponent`

675

intpart: `digit`+

676

fraction: "." `digit`+

677

exponent: ("e" | "E") ["+" | "-"] `digit`+

678

679

Note that the integer and exponent parts of floating point numbers can look like

680

octal integers, but are interpreted using radix 10. For example, ``077e010`` is

681

legal, and denotes the same number as ``77e10``. The allowed range of floating

682

point literals is implementation-dependent. Some examples of floating point

683

literals::

684

685

3.14 10. .001 1e100 3.14e-10 0e0

686

687

Note that numeric literals do not include a sign; a phrase like ``-1`` is

688

actually an expression composed of the unary operator ``-`` and the literal

689

``1``.

690

691

692

.. _imaginary:

693

694

Imaginary literals

695

------------------

696

697

Imaginary literals are described by the following lexical definitions:

698

699

.. productionlist::

700

imagnumber: (`floatnumber` | `intpart`) ("j" | "J")

701

702

An imaginary literal yields a complex number with a real part of 0.0. Complex

703

numbers are represented as a pair of floating point numbers and have the same

704

restrictions on their range. To create a complex number with a nonzero real

705

part, add a floating point number to it, e.g., ``(3+4j)``. Some examples of

706

imaginary literals::

707

708

3.14j 10.j 10j .001j 1e100j 3.14e-10j

709

710

711

.. _operators:

712

713

Operators

714

=========

715

716

.. index:: single: operators

717

718

The following tokens are operators:

719

720

.. code-block:: none

721

722

723

+ - * ** / // %

724

<< >> & | ^ ~

725

< > <= >= == != <>

726

727

The comparison operators ``<>`` and ``!=`` are alternate spellings of the same

728

operator. ``!=`` is the preferred spelling; ``<>`` is obsolescent.

729

730

731

.. _delimiters:

732

733

Delimiters

734

==========

735

736

.. index:: single: delimiters

737

738

The following tokens serve as delimiters in the grammar:

739

740

.. code-block:: none

741

742

( ) [ ] { } @

743

, : . ` = ;

744

+= -= *= /= //= %=

745

&= |= ^= >>= <<= **=

746

747

The period can also occur in floating-point and imaginary literals. A sequence

748

of three periods has a special meaning as an ellipsis in slices. The second half

749

of the list, the augmented assignment operators, serve lexically as delimiters,

750

but also perform an operation.

751

752

The following printing ASCII characters have special meaning as part of other

753

tokens or are otherwise significant to the lexical analyzer:

754

755

.. code-block:: none

756

757

' " # \

758

759

.. index:: single: ASCII@ASCII

760

761

The following printing ASCII characters are not used in Python. Their

762

occurrence outside string literals and comments is an unconditional error:

763

764

.. code-block:: none

765

766

$ ?

767

768

.. rubric:: Footnotes

769

770

.. [#] In versions of Python prior to 2.4, octal and hexadecimal literals in the range

771

just above the largest representable plain integer but below the largest

772

unsigned 32-bit number (on a machine using 32-bit arithmetic), 4294967296, were

773

taken as the negative plain integer obtained by subtracting 4294967296 from

774

their unsigned value.

775