1
<!-- doc/src/sgml/query.sgml -->
3
<chapter id="tutorial-sql">
4
<title>The <acronym>SQL</acronym> Language</title>
6
<sect1 id="tutorial-sql-intro">
7
<title>Introduction</title>
10
This chapter provides an overview of how to use
11
<acronym>SQL</acronym> to perform simple operations. This
12
tutorial is only intended to give you an introduction and is in no
13
way a complete tutorial on <acronym>SQL</acronym>. Numerous books
14
have been written on <acronym>SQL</acronym>, including <xref
15
linkend="MELT93"> and <xref linkend="DATE97">.
16
You should be aware that some <productname>PostgreSQL</productname>
17
language features are extensions to the standard.
21
In the examples that follow, we assume that you have created a
22
database named <literal>mydb</literal>, as described in the previous
23
chapter, and have been able to start <application>psql</application>.
27
Examples in this manual can also be found in the
28
<productname>PostgreSQL</productname> source distribution
29
in the directory <filename>src/tutorial/</filename>. (Binary
30
distributions of <productname>PostgreSQL</productname> might not
31
compile these files.) To use those
32
files, first change to that directory and run <application>make</>:
35
<prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
36
<prompt>$</prompt> <userinput>make</userinput>
39
This creates the scripts and compiles the C files containing user-defined
40
functions and types. Then, to start the tutorial, do the following:
43
<prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/tutorial</userinput>
44
<prompt>$</prompt> <userinput>psql -s mydb</userinput>
49
<prompt>mydb=></prompt> <userinput>\i basics.sql</userinput>
52
The <literal>\i</literal> command reads in commands from the
53
specified file. <command>psql</command>'s <literal>-s</> option puts you in
54
single step mode which pauses before sending each statement to the
55
server. The commands used in this section are in the file
56
<filename>basics.sql</filename>.
61
<sect1 id="tutorial-concepts">
62
<title>Concepts</title>
65
<indexterm><primary>relational database</primary></indexterm>
66
<indexterm><primary>hierarchical database</primary></indexterm>
67
<indexterm><primary>object-oriented database</primary></indexterm>
68
<indexterm><primary>relation</primary></indexterm>
69
<indexterm><primary>table</primary></indexterm>
71
<productname>PostgreSQL</productname> is a <firstterm>relational
72
database management system</firstterm> (<acronym>RDBMS</acronym>).
73
That means it is a system for managing data stored in
74
<firstterm>relations</firstterm>. Relation is essentially a
75
mathematical term for <firstterm>table</firstterm>. The notion of
76
storing data in tables is so commonplace today that it might
77
seem inherently obvious, but there are a number of other ways of
78
organizing databases. Files and directories on Unix-like
79
operating systems form an example of a hierarchical database. A
80
more modern development is the object-oriented database.
84
<indexterm><primary>row</primary></indexterm>
85
<indexterm><primary>column</primary></indexterm>
87
Each table is a named collection of <firstterm>rows</firstterm>.
88
Each row of a given table has the same set of named
89
<firstterm>columns</firstterm>,
90
and each column is of a specific data type. Whereas columns have
91
a fixed order in each row, it is important to remember that SQL
92
does not guarantee the order of the rows within the table in any
93
way (although they can be explicitly sorted for display).
97
<indexterm><primary>database cluster</primary></indexterm>
98
<indexterm><primary>cluster</primary><secondary>of databases</secondary><see>database cluster</see></indexterm>
100
Tables are grouped into databases, and a collection of databases
101
managed by a single <productname>PostgreSQL</productname> server
102
instance constitutes a database <firstterm>cluster</firstterm>.
107
<sect1 id="tutorial-table">
108
<title>Creating a New Table</title>
110
<indexterm zone="tutorial-table">
111
<primary>CREATE TABLE</primary>
115
You can create a new table by specifying the table
116
name, along with all column names and their types:
119
CREATE TABLE weather (
121
temp_lo int, -- low temperature
122
temp_hi int, -- high temperature
123
prcp real, -- precipitation
128
You can enter this into <command>psql</command> with the line
129
breaks. <command>psql</command> will recognize that the command
130
is not terminated until the semicolon.
134
White space (i.e., spaces, tabs, and newlines) can be used freely
135
in SQL commands. That means you can type the command aligned
136
differently than above, or even all on one line. Two dashes
137
(<quote><literal>--</literal></quote>) introduce comments.
138
Whatever follows them is ignored up to the end of the line. SQL
139
is case insensitive about key words and identifiers, except
140
when identifiers are double-quoted to preserve the case (not done
145
<type>varchar(80)</type> specifies a data type that can store
146
arbitrary character strings up to 80 characters in length.
147
<type>int</type> is the normal integer type. <type>real</type> is
148
a type for storing single precision floating-point numbers.
149
<type>date</type> should be self-explanatory. (Yes, the column of
150
type <type>date</type> is also named <structfield>date</structfield>.
151
This might be convenient or confusing — you choose.)
155
<productname>PostgreSQL</productname> supports the standard
156
<acronym>SQL</acronym> types <type>int</type>,
157
<type>smallint</type>, <type>real</type>, <type>double
158
precision</type>, <type>char(<replaceable>N</>)</type>,
159
<type>varchar(<replaceable>N</>)</type>, <type>date</type>,
160
<type>time</type>, <type>timestamp</type>, and
161
<type>interval</type>, as well as other types of general utility
162
and a rich set of geometric types.
163
<productname>PostgreSQL</productname> can be customized with an
164
arbitrary number of user-defined data types. Consequently, type
165
names are not key words in the syntax, except where required to
166
support special cases in the <acronym>SQL</acronym> standard.
170
The second example will store cities and their associated
171
geographical location:
173
CREATE TABLE cities (
178
The <type>point</type> type is an example of a
179
<productname>PostgreSQL</productname>-specific data type.
184
<primary>DROP TABLE</primary>
187
Finally, it should be mentioned that if you don't need a table any
188
longer or want to recreate it differently you can remove it using
189
the following command:
191
DROP TABLE <replaceable>tablename</replaceable>;
197
<sect1 id="tutorial-populate">
198
<title>Populating a Table With Rows</title>
200
<indexterm zone="tutorial-populate">
201
<primary>INSERT</primary>
205
The <command>INSERT</command> statement is used to populate a table with
209
INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
212
Note that all data types use rather obvious input formats.
213
Constants that are not simple numeric values usually must be
214
surrounded by single quotes (<literal>'</>), as in the example.
216
<type>date</type> type is actually quite flexible in what it
217
accepts, but for this tutorial we will stick to the unambiguous
222
The <type>point</type> type requires a coordinate pair as input,
225
INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');
230
The syntax used so far requires you to remember the order of the
231
columns. An alternative syntax allows you to list the columns
234
INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
235
VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');
237
You can list the columns in a different order if you wish or
238
even omit some columns, e.g., if the precipitation is unknown:
240
INSERT INTO weather (date, city, temp_hi, temp_lo)
241
VALUES ('1994-11-29', 'Hayward', 54, 37);
243
Many developers consider explicitly listing the columns better
244
style than relying on the order implicitly.
248
Please enter all the commands shown above so you have some data to
249
work with in the following sections.
254
<primary>COPY</primary>
257
You could also have used <command>COPY</command> to load large
258
amounts of data from flat-text files. This is usually faster
259
because the <command>COPY</command> command is optimized for this
260
application while allowing less flexibility than
261
<command>INSERT</command>. An example would be:
264
COPY weather FROM '/home/user/weather.txt';
267
where the file name for the source file must be available on the
268
machine running the backend process, not the client, since the backend process
269
reads the file directly. You can read more about the
270
<command>COPY</command> command in <xref linkend="sql-copy">.
275
<sect1 id="tutorial-select">
276
<title>Querying a Table</title>
279
<indexterm><primary>query</primary></indexterm>
280
<indexterm><primary>SELECT</primary></indexterm>
282
To retrieve data from a table, the table is
283
<firstterm>queried</firstterm>. An <acronym>SQL</acronym>
284
<command>SELECT</command> statement is used to do this. The
285
statement is divided into a select list (the part that lists the
286
columns to be returned), a table list (the part that lists the
287
tables from which to retrieve the data), and an optional
288
qualification (the part that specifies any restrictions). For
289
example, to retrieve all the rows of table
290
<structname>weather</structname>, type:
292
SELECT * FROM weather;
294
Here <literal>*</literal> is a shorthand for <quote>all columns</quote>.
297
While <literal>SELECT *</literal> is useful for off-the-cuff
298
queries, it is widely considered bad style in production code,
299
since adding a column to the table would change the results.
302
So the same result would be had with:
304
SELECT city, temp_lo, temp_hi, prcp, date FROM weather;
307
The output should be:
310
city | temp_lo | temp_hi | prcp | date
311
---------------+---------+---------+------+------------
312
San Francisco | 46 | 50 | 0.25 | 1994-11-27
313
San Francisco | 43 | 57 | 0 | 1994-11-29
314
Hayward | 37 | 54 | | 1994-11-29
320
You can write expressions, not just simple column references, in the
321
select list. For example, you can do:
323
SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
327
city | temp_avg | date
328
---------------+----------+------------
329
San Francisco | 48 | 1994-11-27
330
San Francisco | 50 | 1994-11-29
331
Hayward | 45 | 1994-11-29
334
Notice how the <literal>AS</literal> clause is used to relabel the
335
output column. (The <literal>AS</literal> clause is optional.)
339
A query can be <quote>qualified</> by adding a <literal>WHERE</>
340
clause that specifies which rows are wanted. The <literal>WHERE</>
341
clause contains a Boolean (truth value) expression, and only rows for
342
which the Boolean expression is true are returned. The usual
343
Boolean operators (<literal>AND</literal>,
344
<literal>OR</literal>, and <literal>NOT</literal>) are allowed in
345
the qualification. For example, the following
346
retrieves the weather of San Francisco on rainy days:
349
SELECT * FROM weather
350
WHERE city = 'San Francisco' AND prcp > 0.0;
354
city | temp_lo | temp_hi | prcp | date
355
---------------+---------+---------+------+------------
356
San Francisco | 46 | 50 | 0.25 | 1994-11-27
362
<indexterm><primary>ORDER BY</primary></indexterm>
364
You can request that the results of a query
365
be returned in sorted order:
368
SELECT * FROM weather
373
city | temp_lo | temp_hi | prcp | date
374
---------------+---------+---------+------+------------
375
Hayward | 37 | 54 | | 1994-11-29
376
San Francisco | 43 | 57 | 0 | 1994-11-29
377
San Francisco | 46 | 50 | 0.25 | 1994-11-27
380
In this example, the sort order isn't fully specified, and so you
381
might get the San Francisco rows in either order. But you'd always
382
get the results shown above if you do:
385
SELECT * FROM weather
386
ORDER BY city, temp_lo;
391
<indexterm><primary>DISTINCT</primary></indexterm>
392
<indexterm><primary>duplicate</primary></indexterm>
394
You can request that duplicate rows be removed from the result of
410
Here again, the result row ordering might vary.
411
You can ensure consistent results by using <literal>DISTINCT</literal> and
412
<literal>ORDER BY</literal> together:
415
In some database systems, including older versions of
416
<productname>PostgreSQL</productname>, the implementation of
417
<literal>DISTINCT</literal> automatically orders the rows and
418
so <literal>ORDER BY</literal> is unnecessary. But this is not
419
required by the SQL standard, and current
420
<productname>PostgreSQL</productname> does not guarantee that
421
<literal>DISTINCT</literal> causes the rows to be ordered.
434
<sect1 id="tutorial-join">
435
<title>Joins Between Tables</title>
437
<indexterm zone="tutorial-join">
438
<primary>join</primary>
442
Thus far, our queries have only accessed one table at a time.
443
Queries can access multiple tables at once, or access the same
444
table in such a way that multiple rows of the table are being
445
processed at the same time. A query that accesses multiple rows
446
of the same or different tables at one time is called a
447
<firstterm>join</firstterm> query. As an example, say you wish to
448
list all the weather records together with the location of the
449
associated city. To do that, we need to compare the <structfield>city</>
450
column of each row of the <structname>weather</> table with the
451
<structfield>name</> column of all rows in the <structname>cities</>
452
table, and select the pairs of rows where these values match.
455
This is only a conceptual model. The join is usually performed
456
in a more efficient manner than actually comparing each possible
457
pair of rows, but this is invisible to the user.
460
This would be accomplished by the following query:
469
city | temp_lo | temp_hi | prcp | date | name | location
470
---------------+---------+---------+------+------------+---------------+-----------
471
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
472
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
479
Observe two things about the result set:
483
There is no result row for the city of Hayward. This is
484
because there is no matching entry in the
485
<structname>cities</structname> table for Hayward, so the join
486
ignores the unmatched rows in the <structname>weather</> table. We will see
487
shortly how this can be fixed.
493
There are two columns containing the city name. This is
494
correct because the lists of columns from the
495
<structname>weather</structname> and
496
<structname>cities</structname> tables are concatenated. In
497
practice this is undesirable, though, so you will probably want
498
to list the output columns explicitly rather than using
499
<literal>*</literal>:
501
SELECT city, temp_lo, temp_hi, prcp, date, location
511
<title>Exercise:</title>
514
Attempt to determine the semantics of this query when the
515
<literal>WHERE</literal> clause is omitted.
520
Since the columns all had different names, the parser
521
automatically found which table they belong to. If there
522
were duplicate column names in the two tables you'd need to
523
<firstterm>qualify</> the column names to show which one you
527
SELECT weather.city, weather.temp_lo, weather.temp_hi,
528
weather.prcp, weather.date, cities.location
530
WHERE cities.name = weather.city;
533
It is widely considered good style to qualify all column names
534
in a join query, so that the query won't fail if a duplicate
535
column name is later added to one of the tables.
539
Join queries of the kind seen thus far can also be written in this
544
FROM weather INNER JOIN cities ON (weather.city = cities.name);
547
This syntax is not as commonly used as the one above, but we show
548
it here to help you understand the following topics.
552
<indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
554
Now we will figure out how we can get the Hayward records back in.
555
What we want the query to do is to scan the
556
<structname>weather</structname> table and for each row to find the
557
matching <structname>cities</structname> row(s). If no matching row is
558
found we want some <quote>empty values</quote> to be substituted
559
for the <structname>cities</structname> table's columns. This kind
560
of query is called an <firstterm>outer join</firstterm>. (The
561
joins we have seen so far are inner joins.) The command looks
566
FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
568
city | temp_lo | temp_hi | prcp | date | name | location
569
---------------+---------+---------+------+------------+---------------+-----------
570
Hayward | 37 | 54 | | 1994-11-29 | |
571
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
572
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
576
This query is called a <firstterm>left outer
577
join</firstterm> because the table mentioned on the left of the
578
join operator will have each of its rows in the output at least
579
once, whereas the table on the right will only have those rows
580
output that match some row of the left table. When outputting a
581
left-table row for which there is no right-table match, empty (null)
582
values are substituted for the right-table columns.
586
<title>Exercise:</title>
589
There are also right outer joins and full outer joins. Try to
590
find out what those do.
595
<indexterm><primary>join</primary><secondary>self</secondary></indexterm>
596
<indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
598
We can also join a table against itself. This is called a
599
<firstterm>self join</firstterm>. As an example, suppose we wish
600
to find all the weather records that are in the temperature range
601
of other weather records. So we need to compare the
602
<structfield>temp_lo</> and <structfield>temp_hi</> columns of
603
each <structname>weather</structname> row to the
604
<structfield>temp_lo</structfield> and
605
<structfield>temp_hi</structfield> columns of all other
606
<structname>weather</structname> rows. We can do this with the
610
SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
611
W2.city, W2.temp_lo AS low, W2.temp_hi AS high
612
FROM weather W1, weather W2
613
WHERE W1.temp_lo < W2.temp_lo
614
AND W1.temp_hi > W2.temp_hi;
616
city | low | high | city | low | high
617
---------------+-----+------+---------------+-----+------
618
San Francisco | 43 | 57 | San Francisco | 46 | 50
619
Hayward | 37 | 54 | San Francisco | 46 | 50
623
Here we have relabeled the weather table as <literal>W1</> and
624
<literal>W2</> to be able to distinguish the left and right side
625
of the join. You can also use these kinds of aliases in other
626
queries to save some typing, e.g.:
629
FROM weather w, cities c
630
WHERE w.city = c.name;
632
You will encounter this style of abbreviating quite frequently.
637
<sect1 id="tutorial-agg">
638
<title>Aggregate Functions</title>
640
<indexterm zone="tutorial-agg">
641
<primary>aggregate function</primary>
645
Like most other relational database products,
646
<productname>PostgreSQL</productname> supports
647
<firstterm>aggregate functions</>.
648
An aggregate function computes a single result from multiple input rows.
649
For example, there are aggregates to compute the
650
<function>count</function>, <function>sum</function>,
651
<function>avg</function> (average), <function>max</function> (maximum) and
652
<function>min</function> (minimum) over a set of rows.
656
As an example, we can find the highest low-temperature reading anywhere
660
SELECT max(temp_lo) FROM weather;
672
<indexterm><primary>subquery</primary></indexterm>
674
If we wanted to know what city (or cities) that reading occurred in,
678
SELECT city FROM weather WHERE temp_lo = max(temp_lo); <lineannotation>WRONG</lineannotation>
681
but this will not work since the aggregate
682
<function>max</function> cannot be used in the
683
<literal>WHERE</literal> clause. (This restriction exists because
684
the <literal>WHERE</literal> clause determines which rows will be
685
included in the aggregate calculation; so obviously it has to be evaluated
686
before aggregate functions are computed.)
687
However, as is often the case
688
the query can be restated to accomplish the desired result, here
689
by using a <firstterm>subquery</firstterm>:
692
SELECT city FROM weather
693
WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
703
This is OK because the subquery is an independent computation
704
that computes its own aggregate separately from what is happening
709
<indexterm><primary>GROUP BY</primary></indexterm>
710
<indexterm><primary>HAVING</primary></indexterm>
712
Aggregates are also very useful in combination with <literal>GROUP
713
BY</literal> clauses. For example, we can get the maximum low
714
temperature observed in each city with:
717
SELECT city, max(temp_lo)
724
---------------+-----
730
which gives us one output row per city. Each aggregate result is
731
computed over the table rows matching that city.
732
We can filter these grouped
733
rows using <literal>HAVING</literal>:
736
SELECT city, max(temp_lo)
739
HAVING max(temp_lo) < 40;
749
which gives us the same results for only the cities that have all
750
<structfield>temp_lo</> values below 40. Finally, if we only care about
752
names begin with <quote><literal>S</literal></quote>, we might do:
755
SELECT city, max(temp_lo)
757
WHERE city LIKE 'S%'<co id="co.tutorial-agg-like">
759
HAVING max(temp_lo) < 40;
762
<callout arearefs="co.tutorial-agg-like">
764
The <literal>LIKE</literal> operator does pattern matching and
765
is explained in <xref linkend="functions-matching">.
772
It is important to understand the interaction between aggregates and
773
<acronym>SQL</acronym>'s <literal>WHERE</literal> and <literal>HAVING</literal> clauses.
774
The fundamental difference between <literal>WHERE</literal> and
775
<literal>HAVING</literal> is this: <literal>WHERE</literal> selects
776
input rows before groups and aggregates are computed (thus, it controls
777
which rows go into the aggregate computation), whereas
778
<literal>HAVING</literal> selects group rows after groups and
779
aggregates are computed. Thus, the
780
<literal>WHERE</literal> clause must not contain aggregate functions;
781
it makes no sense to try to use an aggregate to determine which rows
782
will be inputs to the aggregates. On the other hand, the
783
<literal>HAVING</literal> clause always contains aggregate functions.
784
(Strictly speaking, you are allowed to write a <literal>HAVING</literal>
785
clause that doesn't use aggregates, but it's seldom useful. The same
786
condition could be used more efficiently at the <literal>WHERE</literal>
791
In the previous example, we can apply the city name restriction in
792
<literal>WHERE</literal>, since it needs no aggregate. This is
793
more efficient than adding the restriction to <literal>HAVING</literal>,
794
because we avoid doing the grouping and aggregate calculations
795
for all rows that fail the <literal>WHERE</literal> check.
800
<sect1 id="tutorial-update">
801
<title>Updates</title>
803
<indexterm zone="tutorial-update">
804
<primary>UPDATE</primary>
808
You can update existing rows using the
809
<command>UPDATE</command> command.
810
Suppose you discover the temperature readings are
811
all off by 2 degrees after November 28. You can correct the
816
SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
817
WHERE date > '1994-11-28';
822
Look at the new state of the data:
824
SELECT * FROM weather;
826
city | temp_lo | temp_hi | prcp | date
827
---------------+---------+---------+------+------------
828
San Francisco | 46 | 50 | 0.25 | 1994-11-27
829
San Francisco | 41 | 55 | 0 | 1994-11-29
830
Hayward | 35 | 52 | | 1994-11-29
836
<sect1 id="tutorial-delete">
837
<title>Deletions</title>
839
<indexterm zone="tutorial-delete">
840
<primary>DELETE</primary>
844
Rows can be removed from a table using the <command>DELETE</command>
846
Suppose you are no longer interested in the weather of Hayward.
847
Then you can do the following to delete those rows from the table:
849
DELETE FROM weather WHERE city = 'Hayward';
852
All weather records belonging to Hayward are removed.
855
SELECT * FROM weather;
859
city | temp_lo | temp_hi | prcp | date
860
---------------+---------+---------+------+------------
861
San Francisco | 46 | 50 | 0.25 | 1994-11-27
862
San Francisco | 41 | 55 | 0 | 1994-11-29
868
One should be wary of statements of the form
870
DELETE FROM <replaceable>tablename</replaceable>;
873
Without a qualification, <command>DELETE</command> will
874
remove <emphasis>all</> rows from the given table, leaving it
875
empty. The system will not request confirmation before