~ubuntu-branches/ubuntu/wily/sqlite3/wily

onfocus="entersearch()" onblur="leavesearch()" style="width:24ex;padding:1px 1ex; border:solid white 1px; font-size:0.9em ; font-style:italic;color:#044a64;" value="Search SQLite Docs...">

112

113

</form>

114

</div>

115

</table>

116

</div></div></div></div>

117

</td></tr></table>

118

119

120

121

122

123

<h1> The SQLite Query Planner</h1>

124

This document provides overview of how the query planner and optimizer

125

for SQLite works.

126

127

128

Given a single SQL statement, there might be dozens, hundreds, or even

129

thousands of ways to implement that statement, depending on the complexity

130

of the statement itself and of the underlying database schema. The

131

task of the query planner is to select an algorithm from among the many

132

choices that provides the answer with a minimum of disk I/O and CPU

133

overhead.

134

135

136

<h2>1.0 WHERE clause analysis</h2>

137

The WHERE clause on a query is broken up into "terms" where each term

138

is separated from the others by an AND operator.

139

If the WHERE clause is composed of constraints separate by the OR

140

operator then the entire clause is considered to be a single "term"

141

to which the <a href="#or_opt">OR-clause optimization</a> is applied.

142

143

144

All terms of the WHERE clause are analyzed to see if they can be

145

satisfied using indices.

146

Terms that cannot be satisfied through the use of indices become

147

tests that are evaluated against each row of the relevant input

148

tables. No tests are done for terms that are completely satisfied by

149

indices. Sometimes

150

one or more terms will provide hints to indices but still must be

151

evaluated against each row of the input tables.

152

153

154

The analysis of a term might cause new "virtual" terms to

155

be added to the WHERE clause. Virtual terms can be used with

156

indices to restrict a search. But virtual terms never generate code

157

that is tested against input rows.

158

159

160

To be usable by an index a term must be of one of the following

161

forms:

162

163

164

column = expression

165

column > expression

166

column >= expression

167

column < expression

168

column <= expression

169

expression = column

170

expression > column

171

expression >= column

172

expression < column

173

expression <= column

174

column IN (expression-list)

175

column IN (subquery)

176

column IS NULL

177

</pre></blockquote>

178

If an index is created using a statement like this:

179

180

181

CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z);

182

</pre></blockquote>

183

Then the index might be used if the initial columns of the index

184

(columns a, b, and so forth) appear in WHERE clause terms.

185

The initial columns of the index must be used with

186

the <tt><big>=</big></tt> or <tt><big>IN</big></tt> operators.

187

The right-most column that is used can employ inequalities.

188

For the right-most

189

column of an index that is used, there can be up to two inequalities

190

that must sandwich the allowed values of the column between two extremes.

191

192

193

It is not necessary for every column of an index to appear in a

194

WHERE clause term in order for that index to be used.

195

But there can not be gaps in the columns of the index that are used.

196

Thus for the example index above, if there is no WHERE clause term

197

that constraints column c, then terms that constrain columns a and b can

198

be used with the index but not terms that constraint columns d through z.

199

Similarly, no index column will be used (for indexing purposes)

200

that is to the right of a

201

column that is constrained only by inequalities.

202

203

204

<h3>1.1 Index term usage examples</h3>

205

For the index above and WHERE clause like this:

206

207

208

... WHERE a=5 AND b IN (1,2,3) AND c IS NULL AND d='hello'

209

</pre></blockquote>

210

The first four columns a, b, c, and d of the index would be usable since

211

those four columns form a prefix of the index and are all bound by

212

equality constraints.

213

214

215

For the index above and WHERE clause like this:

216

217

218

... WHERE a=5 AND b IN (1,2,3) AND c>12 AND d='hello'

219

</pre></blockquote>

220

Only columns a, b, and c of the index would be usable. The d column

221

would not be usable because it occurs to the right of c and c is

222

constrained only by inequalities.

223

224

225

For the index above and WHERE clause like this:

226

227

228

... WHERE a=5 AND b IN (1,2,3) AND d='hello'

229

</pre></blockquote>

230

Only columns a and b of the index would be usable. The d column

231

would not be usable because column c is not constrained and there can

232

be no gaps in the set of columns that usable by the index.

233

234

235

For the index above and WHERE clause like this:

236

237

238

... WHERE b IN (1,2,3) AND c NOT NULL AND d='hello'

239

</pre></blockquote>

240

The index is not usable at all because the left-most column of the

241

index (column "a") is not constrained. Assuming there are no other

242

indices, the query above would result in a full table scan.

243

244

245

For the index above and WHERE clause like this:

246

247

248

... WHERE a=5 OR b IN (1,2,3) OR c NOT NULL OR d='hello'

249

</pre></blockquote>

250

The index is not usable because the WHERE clause terms are connected

251

by OR instead of AND. This query would result in a full table scan.

252

However, if three additional indices where added that contained columns

253

b, c, and d as their left-most columns, then the

254

<a href="#or_opt">OR-clause optimization</a> might apply.

255

256

257

<h2>2.0 The BETWEEN optimization</h2>

258

If a term of the WHERE clause is of the following form:

259

260

261

expr1 BETWEEN expr2 AND expr3

262

</pre></blockquote>

263

Then two virtual terms are added as follows:

264

265

266

expr1 >= expr2 AND expr1 <= expr3

267

</pre></blockquote>

268

If both virtual terms end up being used as constraints on an index,

269

then the original BETWEEN term is omitted and the corresponding test

270

is not performed on input rows.

271

Thus if the BETWEEN term ends up being used as an index constraint

272

no tests are ever performed on that term.

273

On the other hand, the

274

virtual terms themselves never causes tests to be performed on

275

input rows.

276

Thus if the BETWEEN term is not used as an index constraint and

277

instead must be used to test input rows, the expr1 expression is

278

only evaluated once.

279

280

281

<h2>3.0 OR optimizations</h2>

282

WHERE clause constraints that are connected by OR instead of AND are

283

handled in one of two way.

284

If a term consists of multiple subterms containing a common column

285

name and separated by OR, like this:

286

287

288

column = expr1 OR column = expr2 OR column = expr3 OR ...

289

</pre></blockquote>

290

Then that term is rewritten as follows:

291

292

293

column IN (expr1,expr2,expr3,expr4,...)

294

</pre></blockquote>

295

The rewritten term then might go on to constrain an index using the

296

normal rules for <tt><big>IN</big></tt> operators. Note that column must be

297

the same column in every OR-connected subterm,

298

although the column can occur on either the left or the right side of

299

the <tt><big>=</big></tt> operator.

300

301

302

If and only if the previously described conversion of OR to an IN operator

303

does not work, the second OR-clause optimization is attempted.

304

Suppose the OR clause consists of multiple subterms as follows:

305

306

307

expr1 OR expr2 OR expr3

308

</pre></blockquote>

309

Individual subterms might be a single comparison expression like

310

<tt><big>a=5</big></tt> or <tt><big>x>y</big></tt> or they can be LIKE or BETWEEN expressions, or a subterm

311

can be a parenthesized list of AND-connected sub-subterms.

312

Each subterm is analyzed as if it were itself the entire WHERE clause

313

in order to see if the subterm is indexable by itself.

314

If every subterm of an OR clause is separately indexable

315

then the OR clause might be coded such that a separate index is used

316

to evaluate each term of the OR clause. One way to think about how

317

SQLite uses separate indices foreach each OR clause term is to imagine

318

that the WHERE clause where rewritten as follows:

319

320

321

rowid IN (SELECT rowid FROM table WHERE expr1

322

UNION SELECT rowid FROM table WHERE expr2

323

UNION SELECT rowid FROM table WHERE expr3)

324

</pre></blockquote>

325

The rewritten expression above is conceptual; WHERE clauses containing

326

OR are not really rewritten this way.

327

The actual implementation of the OR clause uses a mechanism that is

328

more efficient than subqueries and which works even

329

for tables where the "rowid" column name has been

330

overloaded for other uses and no longer refers to the real rowid.

331

But the essence of the implementation is captured by the statement

332

above: Separate indices are used to find candidate result rows

333

from each OR clause term and the final result is the union of

334

those rows.

335

336

337

Note that in most cases, SQLite will only use a single index for each

338

table in the FROM clause of a query. The second OR-clause optimization

339

described here is the exception to that rule. With an OR-clause,

340

a different index might be used for each subterm in the OR-clause.

341

342

343

For any given query, the fact that the OR-clause optimization described

344

here can be used does not guarantee that it will be used.

345

SQLite uses a cost-based query planner that estimates the CPU and

346

disk I/O costs of various competing query plans and chooses the plan

347

that it thinks will be the fastest. If there are many OR terms in

348

the WHERE clause or if some of the indices on individual OR-clause

349

subterms are not very selective, then SQLite might decide that it is

350

faster to use a different query algorithm, or even a full-table scan.

351

Application developers can use the

352

<a href="lang_explain.html">EXPLAIN QUERY PLAN</a> prefix on a statement to get a

353

high-level overview of the chosen query strategy.

354

355

356

<h2>4.0 The LIKE optimization</h2>

357

Terms that are composed of the <a href="lang_expr.html#like">LIKE</a> or <a href="lang_expr.html#glob">GLOB</a> operator

358

can sometimes be used to constrain indices.

359

There are many conditions on this use:

360

361

362

<ol>

363

<li>The left-hand side of the LIKE or GLOB operator must be the name

364

of an indexed column with <a href="datatype3.html#affinity">TEXT affinity</a>.</li>

365

<li>The right-hand side of the LIKE or GLOB must be either a string literal

366

or a <a href="lang_expr.html#varparam">parameter</a> bound to a string literal

367

that does not begin with a wildcard character.</li>

368

<li>The ESCAPE clause cannot appear on the LIKE operator.</li>

369

<li>The build-in functions used to implement LIKE and GLOB must not

370

have been overloaded using the sqlite3_create_function() API.</li>

371

<li>For the GLOB operator, the column must be indexed using the

372

built-in BINARY collating sequence.</li>

373

<li>For the LIKE operator, if <a href="pragma.html#pragma_case_sensitive_like">case_sensitive_like</a> mode is enabled then

374

the column must indexed using BINARY collating sequence, or if

375

<a href="pragma.html#pragma_case_sensitive_like">case_sensitive_like</a> mode is disabled then the column must indexed

376

using built-in NOCASE collating sequence.</li>

377

</ol>

378

379

380

The LIKE operator has two modes that can be set by a

381

<a href="pragma.html#pragma_case_sensitive_like">pragma</a>. The

382

default mode is for LIKE comparisons to be insensitive to differences

383

of case for latin1 characters. Thus, by default, the following

384

expression is true:

385

386

387

'a' LIKE 'A'

388

</pre></blockquote>

389

But if the case_sensitive_like pragma is enabled as follows:

390

391

392

PRAGMA case_sensitive_like=ON;

393

</pre></blockquote>

394

Then the LIKE operator pays attention to case and the example above would

395

evaluate to false. Note that case insensitivity only applies to

396

latin1 characters - basically the upper and lower case letters of English

397

in the lower 127 byte codes of ASCII. International character sets

398

are case sensitive in SQLite unless an application-defined

399

<a href="datatype3.html#collation">collating sequence</a> and <a href="lang_corefunc.html#like">like() SQL function</a> are provided that

400

take non-ASCII characters into account.

401

But if an application-defined collating sequence and/or like() SQL

402

function are provided, the LIKE optimization described here will never

403

be taken.

404

405

406

The LIKE operator is case insensitive by default because this is what

407

the SQL standard requires. You can change the default behavior at

408

compile time by using the <a href="compile.html#case_sensitive_like">SQLITE_CASE_SENSITIVE_LIKE</a> command-line option

409

to the compiler.

410

411

412

The LIKE optimization might occur if the column named on the left of the

413

operator is indexed using the built-in BINARY collating sequence and

414

case_sensitive_like is turned on. Or the optimization might occur if

415

the column is indexed using the built-in NOCASE collating sequence and the

416

case_sensitive_like mode is off. These are the only two combinations

417

under which LIKE operators will be optimized.

418

419

420

The GLOB operator is always case sensitive. The column on the left side

421

of the GLOB operator must always use the built-in BINARY collating sequence

422

or no attempt will be made to optimize that operator with indices.

423

424

425

The LIKE optimization will only be attempted if

426

the right-hand side of the GLOB or LIKE operator is either

427

literal string or a <a href="lang_expr.html#varparam">parameter</a> that has been <a href="c3ref/bind_blob.html">bound</a>

428

to a string literal. The string literal must not

429

begin with a wildcard; if the right-hand side begins with a wildcard

430

character then this optimization is attempted. If the right-hand side

431

is a <a href="lang_expr.html#varparam">parameter</a> that is bound to a string, then this optimization is

432

only attempted if the <a href="c3ref/stmt.html">prepared statement</a> containing the expression

433

was compiled with <a href="c3ref/prepare.html">sqlite3_prepare_v2()</a> or <a href="c3ref/prepare.html">sqlite3_prepare16_v2()</a>.

434

The LIKE optimization is not attempted if the

435

right-hand side is a <a href="lang_expr.html#varparam">parameter</a> and the statement was prepared using

436

<a href="c3ref/prepare.html">sqlite3_prepare()</a> or <a href="c3ref/prepare.html">sqlite3_prepare16()</a>.

437

The LIKE optimization is not attempted if there is an EXCEPT phrase

438

on the LIKE operator.

439

440

441

Suppose the initial sequence of non-wildcard characters on the right-hand

442

side of the LIKE or GLOB operator is x. We are using a single

443

character to denote this non-wildcard prefix but the reader should

444

understand that the prefix can consist of more than 1 character.

445

Let y be the smallest string that is the same length as /x/ but which

446

compares greater than x. For example, if x is <tt><big>hello</big></tt> then

447

y would be <tt><big>hellp</big></tt>.

448

The LIKE and GLOB optimizations consist of adding two virtual terms

449

like this:

450

451

452

column >= x AND column < y

453

</pre></blockquote>

454

Under most circumstances, the original LIKE or GLOB operator is still

455

tested against each input row even if the virtual terms are used to

456

constrain an index. This is because we do not know what additional

457

constraints may be imposed by characters to the right

458

of the x prefix. However, if there is only a single

459

global wildcard to the right of x, then the original LIKE or

460

GLOB test is disabled.

461

In other words, if the pattern is like this:

462

463

464

column LIKE x%

465

column GLOB x*

466

</pre></blockquote>

467

then the original LIKE or GLOB tests are disabled when the virtual

468

terms constrain an index because in that case we know that all of the

469

rows selected by the index will pass the LIKE or GLOB test.

470

471

472

Note that when the right-hand side of a LIKE or GLOB operator is

473

a <a href="lang_expr.html#varparam">parameter</a> and the statement is prepared using <a href="c3ref/prepare.html">sqlite3_prepare_v2()</a>

474

or <a href="c3ref/prepare.html">sqlite3_prepare16_v2()</a> then the statement is automatically reparsed

475

and recompiled on the first <a href="c3ref/step.html">sqlite3_step()</a> call of each run if the binding

476

to the right-hand side parameter has changed since the previous run.

477

This reparse and recompile is essentially the same action that occurs

478

following a schema change. The recompile is necessary so that the query

479

planner can examine the new value bound to the right-hand side of the

480

LIKE or GLOB operator and determine whether or not to employ the

481

optimization described above.

482

483

484

<h2>5.0 Joins</h2>

485

The ON and USING clauses of an inner join are converted into additional

486

terms of the WHERE clause prior to WHERE clause analysis described

487

above in paragraph 1.0. Thus with SQLite, there is no computational

488

advantage to use the newer SQL92 join syntax

489

over the older SQL89 comma-join syntax. They both end up accomplishing

490

exactly the same thing on inner joins.

491

492

493

For a LEFT OUTER JOIN the situation is more complex. The following

494

two queries are not equivalent:

495

496

497

SELECT * FROM tab1 LEFT JOIN tab2 ON tab1.x=tab2.y;

498

SELECT * FROM tab1 LEFT JOIN tab2 WHERE tab1.x=tab2.y;

499

</pre></blockquote>

500

For an inner join, the two queries above would be identical. But

501

special processing applies to the ON and USING clauses of an OUTER join:

502

specifically, the constraints in an ON or USING clause do not apply if

503

the right table of the join is on a null row, but the constraints do apply

504

in the WHERE clause. The net effect is that putting the ON or USING

505

clause expressions for a LEFT JOIN in the WHERE clause effectively converts

506

the query to an

507

ordinary INNER JOIN - albeit an inner join that runs more slowly.

508

509

510

<h3>5.1 Order of tables in a join</h3>

511

The current implementation of

512

SQLite uses only loop joins. That is to say, joins are implemented as

513

nested loops.

514

515

516

The default order of the nested loops in a join is for the left-most

517

table in the FROM clause to form the outer loop and the right-most

518

table to form the inner loop.

519

However, SQLite will nest the loops in a different order if doing so

520

will help it to select better indices.

521

522

523

Inner joins can be freely reordered. However a left outer join is

524

neither commutative nor associative and hence will not be reordered.

525

Inner joins to the left and right of the outer join might be reordered

526

if the optimizer thinks that is advantageous but the outer joins are

527

always evaluated in the order in which they occur.

528

529

530

When selecting the order of tables in a join, SQLite uses a greedy

531

algorithm that runs in polynomial (O(N²)) time. Because of this,

532

SQLite is able to efficiently plan queries with 50- or 60-way joins.

533

534

535

Join reordering is automatic and usually works well enough that

536

programmers do not have to think about it, especially if <a href="lang_analyze.html">ANALYZE</a>

537

has been used to gather statistics about the available indices.

538

But occasionally some hints from the programmer are needed.

539

Consider, for example, the following schema:

540

541

542

CREATE TABLE node(

543

id INTEGER PRIMARY KEY,

544

name TEXT

545

);

546

CREATE INDEX node_idx ON node(name);

547

CREATE TABLE edge(

548

orig INTEGER REFERENCES node,

549

dest INTEGER REFERENCES node,

550

PRIMARY KEY(orig, dest)

551

);

552

CREATE INDEX edge_idx ON edge(dest,orig);

553

</pre></blockquote>

554

The schema above defines a directed graph with the ability to store a

555

name at each node. Now consider a query against this schema:

556

557

558

SELECT *

559

FROM edge AS e,

560

node AS n1,

561

node AS n2

562

WHERE n1.name = 'alice'

563

AND n2.name = 'bob'

564

AND e.orig = n1.id

565

AND e.dest = n2.id;

566

</pre></blockquote>

567

This query asks for is all information about edges that go from

568

nodes labeled "alice" to nodes labeled "bob".

569

The query optimizer in SQLite has basically two choices on how to

570

implement this query. (There are actually six different choices, but

571

we will only consider two of them here.)

572

Pseudocode below demonstrating these two choices.

573

574

Option 1:

575

576

foreach n1 where n1.name='alice' do:

577

foreach n2 where n2.name='bob' do:

578

foreach e where e.orig=n1.id and e.dest=n2.id

579

return n1.*, n2.*, e.*

580

end

581

end

582

end

583

</pre></blockquote>Option 2:

584

585

foreach n1 where n1.name='alice' do:

586

foreach e where e.orig=n1.id do:

587

foreach n2 where n2.id=e.dest and n2.name='bob' do:

588

return n1.*, n2.*, e.*

589

end

590

end

591

end

592

</pre></blockquote>

593

The same indices are used to speed up every loop in both implementation

594

options.

595

The only difference in these two query plans is the order in which

596

the loops are nested.

597

598

599

So which query plan is better? It turns out that the answer depends on

600

what kind of data is found in the node and edge tables.

601

602

603

Let the number of alice nodes be M and the number of bob nodes be N.

604

Consider two scenarios. In the first scenario, M and N are both 2 but

605

there are thousands of edges on each node. In this case, option 1 is

606

preferred. With option 1, the inner loop checks for the existence of

607

an edge between a pair of nodes and outputs the result if found.

608

But because there are only 2 alice and bob nodes each, the inner loop

609

only has to run 4 times and the query is very quick. Option 2 would

610

take much longer here. The outer loop of option 2 only executes twice,

611

but because there are a large number of edges leaving each alice node,

612

the middle loop has to iterate many thousands of times. It will be

613

much slower. So in the first scenario, we prefer to use option 1.

614

615

616

Now consider the case where M and N are both 3500. Alice nodes are

617

abundant. But suppose each of these nodes is connected by only one

618

or two edges. In this case, option 2 is preferred. With option 2,

619

the outer loop still has to run 3500 times, but the middle loop only

620

runs once or twice for each outer loop and the inner loop will only

621

run once for each middle loop, if at all. So the total number of

622

iterations of the inner loop is around 7000. Option 1, on the other

623

hand, has to run both its outer loop and its middle loop 3500 times

624

each, resulting in 12 million iterations of the middle loop.

625

Thus in the second scenario, option 2 is nearly 2000 times faster

626

than option 1.

627

628

629

So you can see that depending on how the data is structured in the table,

630

either query plan 1 or query plan 2 might be better. Which plan does

631

SQLite choose by default? As of version 3.6.18, without running <a href="lang_analyze.html">ANALYZE</a>,

632

SQLite will choose option 2.

633

But if the <a href="lang_analyze.html">ANALYZE</a> command is run in order to gather statistics,

634

a different choice might be made if the statistics indicate that the

635

alternative is likely to run faster.

636

637

638

<h3>5.2 Manual Control Of Query Plans</h3>

639

SQLite provides the ability for advanced programmers to exercise control

640

over the query plan chosen by the optimizer. One method for doing this

641

is to fudge the <a href="lang_analyze.html">ANALYZE</a> results in the sqlite_stat1 and

642

sqlite_stat2 tables. That approach is not recommended except

643

for the one scenario described in the following paragraph.

644

645

646

For a program that uses an SQLite database as its application file

647

format, when a new database instances is first created the <a href="lang_analyze.html">ANALYZE</a>

648

command is ineffective because the database contain no data from which

649

to gather statistics. In that case, one could construct a large prototype

650

database containing typical data during development and run the

651

<a href="lang_analyze.html">ANALYZE</a> command on this prototype database to gather statistics,

652

then save the prototype statistics as part of the application.

653

After deployment, when the application goes to create a new database file,

654

it can run the <a href="lang_analyze.html">ANALYZE</a> command in order to create the sqlite_stat1

655

and sqlite_stat2 tables, then copy the precomputed statistics obtained

656

from the prototype database into these new statistics tables.

657

In that way, statistics from large working data sets can be preloaded

658

into newly created application files.

659

660

661

If you really must take manual control of join loop nesting order,

662

the preferred method is to use some peculiar (though valid) SQL syntax

663

to specify the join. If you use the keyword CROSS in a join, then

664

the two tables connected by that join will not be reordered.

665

So in the query, the optimizer is free to reorder the tables of

666

the FROM clause anyway it sees fit:

667

668

669

SELECT *

670

FROM node AS n1,

671

edge AS e,

672

node AS n2

673

WHERE n1.name = 'alice'

674

AND n2.name = 'bob'

675

AND e.orig = n1.id

676

AND e.dest = n2.id;

677

</pre></blockquote>

678

But in the following logically equivalent formulation of the query,

679

the substitution of "CROSS JOIN" for the "," means that the order

680

of tables must be N1, E, N2.

681

682

683

SELECT *

684

FROM node AS n1 CROSS JOIN

685

edge AS e CROSS JOIN

686

node AS n2

687

WHERE n1.name = 'alice'

688

AND n2.name = 'bob'

689

AND e.orig = n1.id

690

AND e.dest = n2.id;

691

</pre></blockquote>

692

Hence, in the second form, the query plan must be option 2. Note that

693

you must use the keyword CROSS in order to disable the table reordering

694

optimization; INNER JOIN, NATURAL JOIN, JOIN, and other similar

695

combinations work just like a comma join in that the optimizer is

696

free to reorder tables as it sees fit. (Table reordering is also

697

disabled on an outer join, but that is because outer joins are not

698

associative or commutative. Reordering tables in outer joins changes

699

the result.)

700

701

702

<h2>6.0 Choosing between multiple indices</h2>

703

Each table in the FROM clause of a query can use at most one index

704

(except when the <a href="#or_opt">OR-clause optimization</a> comes into

705

play)

706

and SQLite strives to use at least one index on each table. Sometimes,

707

two or more indices might be candidates for use on a single table.

708

For example:

709

710

711

CREATE TABLE ex2(x,y,z);

712

CREATE INDEX ex2i1 ON ex2(x);

713

CREATE INDEX ex2i2 ON ex2(y);

714

SELECT z FROM ex2 WHERE x=5 AND y=6;

715

</pre></blockquote>

716

For the SELECT statement above, the optimizer can use the ex2i1 index

717

to lookup rows of ex2 that contain x=5 and then test each row against

718

the y=6 term. Or it can use the ex2i2 index to lookup rows

719

of ex2 that contain y=6 then test each of those rows against the

720

x=5 term.

721

722

723

When faced with a choice of two or more indices, SQLite tries to estimate

724

the total amount of work needed to perform the query using each option.

725

It then selects the option that gives the least estimated work.

726

727

728

To help the optimizer get a more accurate estimate of the work involved

729

in using various indices, the user may optionally run the <a href="lang_analyze.html">ANALYZE</a> command.

730

The <a href="lang_analyze.html">ANALYZE</a> command scans all indices of database where there might

731

be a choice between two or more indices and gathers statistics on the

732

selectiveness of those indices. The statistics gathered by

733

this scan are stored in special database tables names shows names all

734

begin with "sqlite_stat".

735

The content of these tables is not updated as the database

736

changes so after making significant changes it might be prudent to

737

rerun <a href="lang_analyze.html">ANALYZE</a>.

738

The results of an ANALYZE command are only available to database connections

739

that are opened after the ANALYZE command completes.

740

741

742

The various sqlite_statN tables contain information on how

743

selective the various indices are. For example, the sqlite_stat1

744

table might indicate that an equality constraint on column x reduces the

745

search space to 10 rows on average, whereas an equality constraint on

746

column y reduces the search space to 3 rows on average. In that case,

747

SQLite would prefer to use index ex2i2 since that index.

748

749

750

Terms of the WHERE clause can be manually disqualified for use with

751

indices by prepending a unary <tt><big>+</big></tt> operator to the column name. The

752

unary <tt><big>+</big></tt> is a no-op and will not slow down the evaluation of the test

753

specified by the term.

754

But it will prevent the term from constraining an index.

755

So, in the example above, if the query were rewritten as:

756

757

758

SELECT z FROM ex2 WHERE +x=5 AND y=6;

759

</pre></blockquote>

760

The <tt><big>+</big></tt> operator on the <tt><big>x</big></tt> column will prevent that term from

761

constraining an index. This would force the use of the ex2i2 index.

762

763

764

Note that the unary <tt><big>+</big></tt> operator also removes

765

<a href="datatype3.html#affinity">type affinity</a> from

766

an expression, and in some cases this can cause subtle changes in

767

the meaning of an expression.

768

In the example above,

769

if column <tt><big>x</big></tt> has <a href="datatype3.html#affinity">TEXT affinity</a>

770

then the comparison "x=5" will be done as text. But the <tt><big>+</big></tt> operator

771

removes the affinity. So the comparison "+x=5" will compare the text

772

in column <tt><big>x</big></tt> with the numeric value 5 and will always be false.

773

774

775

<h3>6.1 Range Queries</h3>

776

Consider a slightly different scenario:

777

778

779

CREATE TABLE ex2(x,y,z);

780

CREATE INDEX ex2i1 ON ex2(x);

781

CREATE INDEX ex2i2 ON ex2(y);

782

SELECT z FROM ex2 WHERE x BETWEEN 1 AND 100 AND y BETWEEN 1 AND 100;

783

</pre></blockquote>

784

Further suppose that column x contains values spread out

785

between 0 and 1,000,000 and column y contains values

786

that span between 0 and 1,000. In that scenario,

787

the range constraint on column x should reduce the search space by

788

a factor of 10,000 whereas the range constraint on column y should

789

reduce the search space by a factor of only 10. So the ex2i1 index

790

should be preferred.

791

792

793

SQLite will make this determination, but only if it has been compiled

794

with <a href="compile.html#enable_stat3">SQLITE_ENABLE_STAT3</a>. The <a href="compile.html#enable_stat3">SQLITE_ENABLE_STAT3</a> option causes

795

the <a href="lang_analyze.html">ANALYZE</a> command to collect a histogram of column content in the

796

sqlite_stat3 table and to use this histogram to make a better

797

guess at the best query to use for range constraints such as the above.

798

799

800

The histogram data is only useful if the right-hand side of the constraint

801

is a simple compile-time constant or <a href="lang_expr.html#varparam">parameter</a> and not an expression.

802

803

804

Another limitation of the histogram data is that it only applies to the

805

left-most column on an index. Consider this scenario:

806

807

808

CREATE TABLE ex3(w,x,y,z);

809

CREATE INDEX ex3i1 ON ex2(w, x);

810

CREATE INDEX ex3i2 ON ex2(w, y);

811

SELECT z FROM ex3 WHERE w=5 AND x BETWEEN 1 AND 100 AND y BETWEEN 1 AND 100;

812

</pre></blockquote>

813

Here the inequalities are on columns x and y which are not the

814

left-most index columns. Hence, the histogram data which is collected no

815

left-most column of indices is useless in helping to choose between the

816

range constraints on columns x and y.

817

818

819

<h2>7.0 Avoidance of table lookups</h2>

820

When doing an indexed lookup of a row, the usual procedure is to

821

do a binary search on the index to find the index entry, then extract

822

the <a href="lang_createtable.html#rowid">rowid</a> from the index and use that <a href="lang_createtable.html#rowid">rowid</a> to do a binary search on

823

the original table. Thus a typical indexed lookup involves two

824

binary searches.

825

If, however, all columns that were to be fetched from the table are

826

already available in the index itself, SQLite will use the values

827

contained in the index and will never look up the original table

828

row. This saves one binary search for each row and can make many

829

queries run twice as fast.

830

831

832

<h2>8.0 ORDER BY optimizations</h2>

833

SQLite attempts to use an index to satisfy the ORDER BY clause of a

834

query when possible.

835

When faced with the choice of using an index to satisfy WHERE clause

836

constraints or satisfying an ORDER BY clause, SQLite does the same

837

work analysis described above

838

and chooses the index that it believes will result in the fastest answer.

839

840

841

842

<h2>9.0 Subquery flattening</h2>

843

When a subquery occurs in the FROM clause of a SELECT, the simplest

844

behavior is to evaluate the subquery into a transient table, then run

845

the outer SELECT against the transient table. But such a plan

846

can be suboptimal since the transient table will not have any indices

847

and the outer query (which is likely a join) will be forced to do a

848

full table scan on the transient table.

849

850

851

To overcome this problem, SQLite attempts to flatten subqueries in

852

the FROM clause of a SELECT.

853

This involves inserting the FROM clause of the subquery into the

854

FROM clause of the outer query and rewriting expressions in

855

the outer query that refer to the result set of the subquery.

856

For example:

857

858

859

SELECT a FROM (SELECT x+y AS a FROM t1 WHERE z<100) WHERE a>5

860

</pre></blockquote>

861

Would be rewritten using query flattening as:

862

863

864

SELECT x+y AS a FROM t1 WHERE z<100 AND a>5

865

</pre></blockquote>

866

There is a long list of conditions that must all be met in order for

867

query flattening to occur.

868

869

870

<ol>

871

<li> The subquery and the outer query do not both use aggregates.

872

873

<li> The subquery is not an aggregate or the outer query is not a join.

874

875

<li> The subquery is not the right operand of a left outer join.

876

877

<li> The subquery is not DISTINCT or the outer query is not a join.

878

879

<li> The subquery is not DISTINCT or the outer query does not use

880

aggregates.

881

882

<li> The subquery does not use aggregates or the outer query is not

883

DISTINCT.

884

885

<li> The subquery has a FROM clause.

886

887

<li> The subquery does not use LIMIT or the outer query is not a join.

888

889

<li> The subquery does not use LIMIT or the outer query does not use

890

aggregates.

891

892

<li> The subquery does not use aggregates or the outer query does not

893

use LIMIT.

894

895

<li> The subquery and the outer query do not both have ORDER BY clauses.

896

897

<li> The subquery and outer query do not both use LIMIT.

898

899

<li> The subquery does not use OFFSET.

900

901

<li> The outer query is not part of a compound select or the

902

subquery does not have both an ORDER BY and a LIMIT clause.

903

904

<li> The outer query is not an aggregate or the subquery does

905

not contain ORDER BY.

906

907

<li> The sub-query is not a compound select, or it is a UNION ALL

908

compound clause made up entirely of non-aggregate queries, and

909

the parent query:

910

911

<ul>

912

<li> is not itself part of a compound select,

913

<li> is not an aggregate or DISTINCT query, and

914

<li> has no other tables or sub-selects in the FROM clause.

915

</ul>

916

917

The parent and sub-query may contain WHERE clauses. Subject to

918

rules (11), (12) and (13), they may also contain ORDER BY,

919

LIMIT and OFFSET clauses.

920

921

<li> If the sub-query is a compound select, then all terms of the

922

ORDER by clause of the parent must be simple references to

923

columns of the sub-query.

924

925

<li> The subquery does not use LIMIT or the outer query does not

926

have a WHERE clause.

927

928

<li> If the sub-query is a compound select, then it must not use

929

an ORDER BY clause.

930

</ol>

931

932

933

The casual reader is not expected to understand or remember any part of

934

the list above. The point of this list is to demonstrate

935

that the decision of whether or not to flatten a query is complex.

936

937

938

939

Query flattening is an important optimization when views are used as

940

each use of a view is translated into a subquery.

941

942

943

<h2>10.0 The MIN/MAX optimization</h2>

944

Queries of the following forms will be optimized to run in logarithmic

945

time assuming appropriate indices exist:

946

947

948

SELECT MIN(x) FROM table;

949

SELECT MAX(x) FROM table;

950

</pre></blockquote>

951

In order for these optimizations to occur, they must appear in exactly

952

the form shown above - changing only the name of the table and column.

953

It is not permissible to add a WHERE clause or do any arithmetic on the

954

result. The result set must contain a single column.

955

The column in the MIN or MAX function must be an indexed column.

956

957

958

<h2>11.0 Automatic Indices</h2>

959

When no indices are available to aid the evaluation of a query, SQLite

960

might create an automatic index that lasts only for the duration

961

of a single SQL statement and use that index to help boost the query

962

performance. Since the cost of constructing the automatic index is

963

O(NlogN) (where N is the number of entries in the table) and the cost of

964

doing a full table scan is only O(N), an automatic index will

965

only be created if SQLite expects that the lookup will be run more than

966

logN times during the course of the SQL statement. Consider an example:

967

968

969

CREATE TABLE t1(a,b);

970

CREATE TABLE t2(c,d);

971

-- Insert many rows into both t1 and t2

972

SELECT * FROM t1, t2 WHERE a=c;

973

</pre></blockquote>

974

In the query above, if both t1 and t2 have approximately N rows, then

975

without any indices the query will require O(N*N) time. On the other

976

hand, creating an index on table t2 requires O(NlogN) time and then using

977

that index to evaluate the query requires an additional O(NlogN) time.

978

In the absence of <a href="lang_analyze.html">ANALYZE</a> information, SQLite guesses that N is one

979

million and hence it believes that constructing the automatic index will

980

be the cheaper approach.

981

982

983

An automatic index might also be used for a subquery:

984

985

986

CREATE TABLE t1(a,b);

987

CREATE TABLE t2(c,d);

988

-- Insert many rows into both t1 and t2

989

SELECT a, (SELECT d FROM t2 WHERE c=b) FROM t1;

990

</pre></blockquote>

991

In this example, the t2 table is used in a subquery to translate values

992

of the t1.b column. If each table contains N rows, SQLite expects that

993

the subquery will run N times, and hence it will believe it is faster

994

to construct an automatic, transient index on t2 first and then using

995

that index to satisfy the N instances of the subquery.

996

997

998

The automatic indexing capability can be disabled at run-time using

999

the <a href="pragma.html#pragma_automatic_index">automatic_index pragma</a> and can be omitted from the build at

1000

compile-time using the <a href="compile.html#omit_automatic_index">SQLITE_OMIT_AUTOMATIC_INDEX</a> compile-time option.

1001

1002

1003