2
$PostgreSQL: pgsql/doc/src/sgml/query.sgml,v 1.42.4.1 2005-01-22 23:05:48 momjian Exp $
5
<chapter id="tutorial-sql">
6
<title>The <acronym>SQL</acronym> Language</title>
8
<sect1 id="tutorial-sql-intro">
9
<title>Introduction</title>
12
This chapter provides an overview of how to use
13
<acronym>SQL</acronym> to perform simple operations. This
14
tutorial is only intended to give you an introduction and is in no
15
way a complete tutorial on <acronym>SQL</acronym>. Numerous books
16
have been written on <acronym>SQL</acronym>, including <xref
17
linkend="MELT93"> and <xref linkend="DATE97">.
18
You should be aware that some <productname>PostgreSQL</productname>
19
language features are extensions to the standard.
23
In the examples that follow, we assume that you have created a
24
database named <literal>mydb</literal>, as described in the previous
25
chapter, and have started <application>psql</application>.
29
Examples in this manual can also be found in the
30
<productname>PostgreSQL</productname> source distribution
31
in the directory <filename>src/tutorial/</filename>. To use those
32
files, first change to that directory and run <application>make</>:
35
<prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
36
<prompt>$</prompt> <userinput>make</userinput>
39
This creates the scripts and compiles the C files containing user-defined
40
functions and types. (You must use GNU make for this — it may be named
41
something different on your system, often <application>gmake</>.)
42
Then, to start the tutorial, do the following:
45
<prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
46
<prompt>$</prompt> <userinput>psql -s mydb</userinput>
51
<prompt>mydb=></prompt> <userinput>\i basics.sql</userinput>
54
The <literal>\i</literal> command reads in commands from the
55
specified file. The <literal>-s</literal> option puts you in
56
single step mode which pauses before sending each statement to the
57
server. The commands used in this section are in the file
58
<filename>basics.sql</filename>.
63
<sect1 id="tutorial-concepts">
64
<title>Concepts</title>
67
<indexterm><primary>relational database</primary></indexterm>
68
<indexterm><primary>hierarchical database</primary></indexterm>
69
<indexterm><primary>object-oriented database</primary></indexterm>
70
<indexterm><primary>relation</primary></indexterm>
71
<indexterm><primary>table</primary></indexterm>
73
<productname>PostgreSQL</productname> is a <firstterm>relational
74
database management system</firstterm> (<acronym>RDBMS</acronym>).
75
That means it is a system for managing data stored in
76
<firstterm>relations</firstterm>. Relation is essentially a
77
mathematical term for <firstterm>table</firstterm>. The notion of
78
storing data in tables is so commonplace today that it might
79
seem inherently obvious, but there are a number of other ways of
80
organizing databases. Files and directories on Unix-like
81
operating systems form an example of a hierarchical database. A
82
more modern development is the object-oriented database.
86
<indexterm><primary>row</primary></indexterm>
87
<indexterm><primary>column</primary></indexterm>
89
Each table is a named collection of <firstterm>rows</firstterm>.
90
Each row of a given table has the same set of named
91
<firstterm>columns</firstterm>,
92
and each column is of a specific data type. Whereas columns have
93
a fixed order in each row, it is important to remember that SQL
94
does not guarantee the order of the rows within the table in any
95
way (although they can be explicitly sorted for display).
99
<indexterm><primary>database cluster</primary></indexterm>
100
<indexterm><primary>cluster</primary><secondary>of databases</secondary><see>database cluster</see></indexterm>
102
Tables are grouped into databases, and a collection of databases
103
managed by a single <productname>PostgreSQL</productname> server
104
instance constitutes a database <firstterm>cluster</firstterm>.
109
<sect1 id="tutorial-table">
110
<title>Creating a New Table</title>
112
<indexterm zone="tutorial-table">
113
<primary>CREATE TABLE</primary>
117
You can create a new table by specifying the table
118
name, along with all column names and their types:
121
CREATE TABLE weather (
123
temp_lo int, -- low temperature
124
temp_hi int, -- high temperature
125
prcp real, -- precipitation
130
You can enter this into <command>psql</command> with the line
131
breaks. <command>psql</command> will recognize that the command
132
is not terminated until the semicolon.
136
White space (i.e., spaces, tabs, and newlines) may be used freely
137
in SQL commands. That means you can type the command aligned
138
differently than above, or even all on one line. Two dashes
139
(<quote><literal>--</literal></quote>) introduce comments.
140
Whatever follows them is ignored up to the end of the line. SQL
141
is case insensitive about key words and identifiers, except
142
when identifiers are double-quoted to preserve the case (not done
147
<type>varchar(80)</type> specifies a data type that can store
148
arbitrary character strings up to 80 characters in length.
149
<type>int</type> is the normal integer type. <type>real</type> is
150
a type for storing single precision floating-point numbers.
151
<type>date</type> should be self-explanatory. (Yes, the column of
152
type <type>date</type> is also named <literal>date</literal>.
153
This may be convenient or confusing — you choose.)
157
<productname>PostgreSQL</productname> supports the standard
158
<acronym>SQL</acronym> types <type>int</type>,
159
<type>smallint</type>, <type>real</type>, <type>double
160
precision</type>, <type>char(<replaceable>N</>)</type>,
161
<type>varchar(<replaceable>N</>)</type>, <type>date</type>,
162
<type>time</type>, <type>timestamp</type>, and
163
<type>interval</type>, as well as other types of general utility
164
and a rich set of geometric types.
165
<productname>PostgreSQL</productname> can be customized with an
166
arbitrary number of user-defined data types. Consequently, type
167
names are not syntactical key words, except where required to
168
support special cases in the <acronym>SQL</acronym> standard.
172
The second example will store cities and their associated
173
geographical location:
175
CREATE TABLE cities (
180
The <type>point</type> type is an example of a
181
<productname>PostgreSQL</productname>-specific data type.
186
<primary>DROP TABLE</primary>
189
Finally, it should be mentioned that if you don't need a table any
190
longer or want to recreate it differently you can remove it using
191
the following command:
193
DROP TABLE <replaceable>tablename</replaceable>;
199
<sect1 id="tutorial-populate">
200
<title>Populating a Table With Rows</title>
202
<indexterm zone="tutorial-populate">
203
<primary>INSERT</primary>
207
The <command>INSERT</command> statement is used to populate a table with
211
INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
214
Note that all data types use rather obvious input formats.
215
Constants that are not simple numeric values usually must be
216
surrounded by single quotes (<literal>'</>), as in the example.
218
<type>date</type> type is actually quite flexible in what it
219
accepts, but for this tutorial we will stick to the unambiguous
224
The <type>point</type> type requires a coordinate pair as input,
227
INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');
232
The syntax used so far requires you to remember the order of the
233
columns. An alternative syntax allows you to list the columns
236
INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
237
VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');
239
You can list the columns in a different order if you wish or
240
even omit some columns, e.g., if the precipitation is unknown:
242
INSERT INTO weather (date, city, temp_hi, temp_lo)
243
VALUES ('1994-11-29', 'Hayward', 54, 37);
245
Many developers consider explicitly listing the columns better
246
style than relying on the order implicitly.
250
Please enter all the commands shown above so you have some data to
251
work with in the following sections.
256
<primary>COPY</primary>
259
You could also have used <command>COPY</command> to load large
260
amounts of data from flat-text files. This is usually faster
261
because the <command>COPY</command> command is optimized for this
262
application while allowing less flexibility than
263
<command>INSERT</command>. An example would be:
266
COPY weather FROM '/home/user/weather.txt';
269
where the file name for the source file must be available to the
270
backend server machine, not the client, since the backend server
271
reads the file directly. You can read more about the
272
<command>COPY</command> command in <xref linkend="sql-copy">.
277
<sect1 id="tutorial-select">
278
<title>Querying a Table</title>
281
<indexterm><primary>query</primary></indexterm>
282
<indexterm><primary>SELECT</primary></indexterm>
284
To retrieve data from a table, the table is
285
<firstterm>queried</firstterm>. An <acronym>SQL</acronym>
286
<command>SELECT</command> statement is used to do this. The
287
statement is divided into a select list (the part that lists the
288
columns to be returned), a table list (the part that lists the
289
tables from which to retrieve the data), and an optional
290
qualification (the part that specifies any restrictions). For
291
example, to retrieve all the rows of table
292
<classname>weather</classname>, type:
294
SELECT * FROM weather;
296
Here <literal>*</literal> is a shorthand for <quote>all columns</quote>.
299
While <literal>SELECT *</literal> is useful for off-the-cuff
300
queries, it is widely considered bad style in production code,
301
since adding a column to the table would change the results.
304
So the same result would be had with:
306
SELECT city, temp_lo, temp_hi, prcp, date FROM weather;
309
The output should be:
312
city | temp_lo | temp_hi | prcp | date
313
---------------+---------+---------+------+------------
314
San Francisco | 46 | 50 | 0.25 | 1994-11-27
315
San Francisco | 43 | 57 | 0 | 1994-11-29
316
Hayward | 37 | 54 | | 1994-11-29
322
You can write expressions, not just simple column references, in the
323
select list. For example, you can do:
325
SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
329
city | temp_avg | date
330
---------------+----------+------------
331
San Francisco | 48 | 1994-11-27
332
San Francisco | 50 | 1994-11-29
333
Hayward | 45 | 1994-11-29
336
Notice how the <literal>AS</literal> clause is used to relabel the
337
output column. (The <literal>AS</literal> clause is optional.)
341
A query can be <quote>qualified</> by adding a <literal>WHERE</>
342
clause that specifies which rows are wanted. The <literal>WHERE</>
343
clause contains a Boolean (truth value) expression, and only rows for
344
which the Boolean expression is true are returned. The usual
345
Boolean operators (<literal>AND</literal>,
346
<literal>OR</literal>, and <literal>NOT</literal>) are allowed in
347
the qualification. For example, the following
348
retrieves the weather of San Francisco on rainy days:
351
SELECT * FROM weather
352
WHERE city = 'San Francisco' AND prcp > 0.0;
356
city | temp_lo | temp_hi | prcp | date
357
---------------+---------+---------+------+------------
358
San Francisco | 46 | 50 | 0.25 | 1994-11-27
364
<indexterm><primary>ORDER BY</primary></indexterm>
366
You can request that the results of a query
367
be returned in sorted order:
370
SELECT * FROM weather
375
city | temp_lo | temp_hi | prcp | date
376
---------------+---------+---------+------+------------
377
Hayward | 37 | 54 | | 1994-11-29
378
San Francisco | 43 | 57 | 0 | 1994-11-29
379
San Francisco | 46 | 50 | 0.25 | 1994-11-27
382
In this example, the sort order isn't fully specified, and so you
383
might get the San Francisco rows in either order. But you'd always
384
get the results shown above if you do
387
SELECT * FROM weather
388
ORDER BY city, temp_lo;
393
<indexterm><primary>DISTINCT</primary></indexterm>
394
<indexterm><primary>duplicate</primary></indexterm>
396
You can request that duplicate rows be removed from the result of
412
Here again, the result row ordering might vary.
413
You can ensure consistent results by using <literal>DISTINCT</literal> and
414
<literal>ORDER BY</literal> together:
417
In some database systems, including older versions of
418
<productname>PostgreSQL</productname>, the implementation of
419
<literal>DISTINCT</literal> automatically orders the rows and
420
so <literal>ORDER BY</literal> is redundant. But this is not
421
required by the SQL standard, and current
422
<productname>PostgreSQL</productname> doesn't guarantee that
423
<literal>DISTINCT</literal> causes the rows to be ordered.
436
<sect1 id="tutorial-join">
437
<title>Joins Between Tables</title>
439
<indexterm zone="tutorial-join">
440
<primary>join</primary>
444
Thus far, our queries have only accessed one table at a time.
445
Queries can access multiple tables at once, or access the same
446
table in such a way that multiple rows of the table are being
447
processed at the same time. A query that accesses multiple rows
448
of the same or different tables at one time is called a
449
<firstterm>join</firstterm> query. As an example, say you wish to
450
list all the weather records together with the location of the
451
associated city. To do that, we need to compare the city column of
452
each row of the weather table with the name column of all rows in
453
the cities table, and select the pairs of rows where these values match.
456
This is only a conceptual model. The join is usually performed
457
in a more efficient manner than actually comparing each possible
458
pair of rows, but this is invisible to the user.
461
This would be accomplished by the following query:
470
city | temp_lo | temp_hi | prcp | date | name | location
471
---------------+---------+---------+------+------------+---------------+-----------
472
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
473
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
480
Observe two things about the result set:
484
There is no result row for the city of Hayward. This is
485
because there is no matching entry in the
486
<classname>cities</classname> table for Hayward, so the join
487
ignores the unmatched rows in the weather table. We will see
488
shortly how this can be fixed.
494
There are two columns containing the city name. This is
495
correct because the lists of columns of the
496
<classname>weather</classname> and the
497
<classname>cities</classname> table are concatenated. In
498
practice this is undesirable, though, so you will probably want
499
to list the output columns explicitly rather than using
500
<literal>*</literal>:
502
SELECT city, temp_lo, temp_hi, prcp, date, location
512
<title>Exercise:</title>
515
Attempt to find out the semantics of this query when the
516
<literal>WHERE</literal> clause is omitted.
521
Since the columns all had different names, the parser
522
automatically found out which table they belong to, but it is good
523
style to fully qualify column names in join queries:
526
SELECT weather.city, weather.temp_lo, weather.temp_hi,
527
weather.prcp, weather.date, cities.location
529
WHERE cities.name = weather.city;
534
Join queries of the kind seen thus far can also be written in this
539
FROM weather INNER JOIN cities ON (weather.city = cities.name);
542
This syntax is not as commonly used as the one above, but we show
543
it here to help you understand the following topics.
547
<indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
549
Now we will figure out how we can get the Hayward records back in.
550
What we want the query to do is to scan the
551
<classname>weather</classname> table and for each row to find the
552
matching <classname>cities</classname> row. If no matching row is
553
found we want some <quote>empty values</quote> to be substituted
554
for the <classname>cities</classname> table's columns. This kind
555
of query is called an <firstterm>outer join</firstterm>. (The
556
joins we have seen so far are inner joins.) The command looks
561
FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
563
city | temp_lo | temp_hi | prcp | date | name | location
564
---------------+---------+---------+------+------------+---------------+-----------
565
Hayward | 37 | 54 | | 1994-11-29 | |
566
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
567
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
571
This query is called a <firstterm>left outer
572
join</firstterm> because the table mentioned on the left of the
573
join operator will have each of its rows in the output at least
574
once, whereas the table on the right will only have those rows
575
output that match some row of the left table. When outputting a
576
left-table row for which there is no right-table match, empty (null)
577
values are substituted for the right-table columns.
581
<title>Exercise:</title>
584
There are also right outer joins and full outer joins. Try to
585
find out what those do.
590
<indexterm><primary>join</primary><secondary>self</secondary></indexterm>
591
<indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
593
We can also join a table against itself. This is called a
594
<firstterm>self join</firstterm>. As an example, suppose we wish
595
to find all the weather records that are in the temperature range
596
of other weather records. So we need to compare the
597
<structfield>temp_lo</> and <structfield>temp_hi</> columns of
598
each <classname>weather</classname> row to the
599
<structfield>temp_lo</structfield> and
600
<structfield>temp_hi</structfield> columns of all other
601
<classname>weather</classname> rows. We can do this with the
605
SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
606
W2.city, W2.temp_lo AS low, W2.temp_hi AS high
607
FROM weather W1, weather W2
608
WHERE W1.temp_lo < W2.temp_lo
609
AND W1.temp_hi > W2.temp_hi;
611
city | low | high | city | low | high
612
---------------+-----+------+---------------+-----+------
613
San Francisco | 43 | 57 | San Francisco | 46 | 50
614
Hayward | 37 | 54 | San Francisco | 46 | 50
618
Here we have relabeled the weather table as <literal>W1</> and
619
<literal>W2</> to be able to distinguish the left and right side
620
of the join. You can also use these kinds of aliases in other
621
queries to save some typing, e.g.:
624
FROM weather w, cities c
625
WHERE w.city = c.name;
627
You will encounter this style of abbreviating quite frequently.
632
<sect1 id="tutorial-agg">
633
<title>Aggregate Functions</title>
635
<indexterm zone="tutorial-agg">
636
<primary>aggregate function</primary>
640
<indexterm><primary>average</primary></indexterm>
641
<indexterm><primary>count</primary></indexterm>
642
<indexterm><primary>max</primary></indexterm>
643
<indexterm><primary>min</primary></indexterm>
644
<indexterm><primary>sum</primary></indexterm>
646
Like most other relational database products,
647
<productname>PostgreSQL</productname> supports
649
An aggregate function computes a single result from multiple input rows.
650
For example, there are aggregates to compute the
651
<function>count</function>, <function>sum</function>,
652
<function>avg</function> (average), <function>max</function> (maximum) and
653
<function>min</function> (minimum) over a set of rows.
657
As an example, we can find the highest low-temperature reading anywhere
661
SELECT max(temp_lo) FROM weather;
673
<indexterm><primary>subquery</primary></indexterm>
675
If we wanted to know what city (or cities) that reading occurred in,
679
SELECT city FROM weather WHERE temp_lo = max(temp_lo); <lineannotation>WRONG</lineannotation>
682
but this will not work since the aggregate
683
<function>max</function> cannot be used in the
684
<literal>WHERE</literal> clause. (This restriction exists because
685
the <literal>WHERE</literal> clause determines the rows that will
686
go into the aggregation stage; so it has to be evaluated before
687
aggregate functions are computed.)
688
However, as is often the case
689
the query can be restated to accomplish the intended result, here
690
by using a <firstterm>subquery</firstterm>:
693
SELECT city FROM weather
694
WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
704
This is OK because the subquery is an independent computation
705
that computes its own aggregate separately from what is happening
710
<indexterm><primary>GROUP BY</primary></indexterm>
711
<indexterm><primary>HAVING</primary></indexterm>
713
Aggregates are also very useful in combination with <literal>GROUP
714
BY</literal> clauses. For example, we can get the maximum low
715
temperature observed in each city with
718
SELECT city, max(temp_lo)
725
---------------+-----
731
which gives us one output row per city. Each aggregate result is
732
computed over the table rows matching that city.
733
We can filter these grouped
734
rows using <literal>HAVING</literal>:
737
SELECT city, max(temp_lo)
740
HAVING max(temp_lo) < 40;
750
which gives us the same results for only the cities that have all
751
<literal>temp_lo</> values below 40. Finally, if we only care about
753
names begin with <quote><literal>S</literal></quote>, we might do
756
SELECT city, max(temp_lo)
758
WHERE city LIKE 'S%'<co id="co.tutorial-agg-like">
760
HAVING max(temp_lo) < 40;
763
<callout arearefs="co.tutorial-agg-like">
765
The <literal>LIKE</literal> operator does pattern matching and
766
is explained in <xref linkend="functions-matching">.
773
It is important to understand the interaction between aggregates and
774
<acronym>SQL</acronym>'s <literal>WHERE</literal> and <literal>HAVING</literal> clauses.
775
The fundamental difference between <literal>WHERE</literal> and
776
<literal>HAVING</literal> is this: <literal>WHERE</literal> selects
777
input rows before groups and aggregates are computed (thus, it controls
778
which rows go into the aggregate computation), whereas
779
<literal>HAVING</literal> selects group rows after groups and
780
aggregates are computed. Thus, the
781
<literal>WHERE</literal> clause must not contain aggregate functions;
782
it makes no sense to try to use an aggregate to determine which rows
783
will be inputs to the aggregates. On the other hand, the
784
<literal>HAVING</literal> clause always contains aggregate functions.
785
(Strictly speaking, you are allowed to write a <literal>HAVING</literal>
786
clause that doesn't use aggregates, but it's wasteful. The same condition
787
could be used more efficiently at the <literal>WHERE</literal> stage.)
791
In the previous example, we can apply the city name restriction in
792
<literal>WHERE</literal>, since it needs no aggregate. This is
793
more efficient than adding the restriction to <literal>HAVING</literal>,
794
because we avoid doing the grouping and aggregate calculations
795
for all rows that fail the <literal>WHERE</literal> check.
800
<sect1 id="tutorial-update">
801
<title>Updates</title>
803
<indexterm zone="tutorial-update">
804
<primary>UPDATE</primary>
808
You can update existing rows using the
809
<command>UPDATE</command> command.
810
Suppose you discover the temperature readings are
811
all off by 2 degrees as of November 28. You may update the
816
SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
817
WHERE date > '1994-11-28';
822
Look at the new state of the data:
824
SELECT * FROM weather;
826
city | temp_lo | temp_hi | prcp | date
827
---------------+---------+---------+------+------------
828
San Francisco | 46 | 50 | 0.25 | 1994-11-27
829
San Francisco | 41 | 55 | 0 | 1994-11-29
830
Hayward | 35 | 52 | | 1994-11-29
836
<sect1 id="tutorial-delete">
837
<title>Deletions</title>
839
<indexterm zone="tutorial-delete">
840
<primary>DELETE</primary>
844
Rows can be removed from a table using the <command>DELETE</command>
846
Suppose you are no longer interested in the weather of Hayward.
847
Then you can do the following to delete those rows from the table:
849
DELETE FROM weather WHERE city = 'Hayward';
852
All weather records belonging to Hayward are removed.
855
SELECT * FROM weather;
859
city | temp_lo | temp_hi | prcp | date
860
---------------+---------+---------+------+------------
861
San Francisco | 46 | 50 | 0.25 | 1994-11-27
862
San Francisco | 41 | 55 | 0 | 1994-11-29
868
One should be wary of statements of the form
870
DELETE FROM <replaceable>tablename</replaceable>;
873
Without a qualification, <command>DELETE</command> will
874
remove <emphasis>all</> rows from the given table, leaving it
875
empty. The system will not request confirmation before
882
<!-- Keep this comment at the end of the file
887
sgml-minimize-attributes:nil
888
sgml-always-quote-attributes:t
891
sgml-parent-document:nil
892
sgml-default-dtd-file:"./reference.ced"
893
sgml-exposed-tags:nil
894
sgml-local-catalogs:("/usr/lib/sgml/catalog")
895
sgml-local-ecat-files:nil