1
This is pgin.tcl/INTERNALS, notes on internal implementation of pgin.tcl.
2
Last updated for pgin.tcl-3.0.1 on 2006-08-13
3
The project home page is: http://pgfoundry.org/projects/pgintcl/
4
-----------------------------------------------------------------------------
5
INTERNAL IMPLEMENTATION NOTES:
7
This information is provided for maintenance, test, and debugging.
9
A connection handle is just a Tcl socket channel. The application using
10
pgin.tcl must not read from or write to this channel.
12
Internal procedures, result structures, and other data are stored in a
13
namespace called "pgtcl". The following namespace variables apply to
16
pgtcl::debug A debug flag, default 0 (no debugging)
17
pgtcl::version pgin.tcl version string
18
pgtcl::rn Result number counter
19
pgtcl::fnoids Function OID cache; see FAST-PATH FUNCTION CALLS
20
pgtcl::errnames Constant array of error message field names
22
The following arrays are indexed by connection handle, and contain data
23
applying only to that connection:
25
pgtcl::notice() Command to execute when receiving a Notice
26
pgtcl::xstate() Transaction state
27
pgtcl::notify() Notifications; see NOTIFICATIONS
29
Additional namespace variables are described in the sections below.
30
Result structure variables are described next.
32
-----------------------------------------------------------------------------
35
A result structure is implemented as a variable result$N in the pgtcl
36
namespace, where N is an integer. (The value of N is stored in pgtcl::rn
37
and is incremented each time a new result structure is needed.) The result
38
handle is passed back to the caller as $N, just the integer. The result
39
structure is an array which stores all the meta-information about the
40
result as well as the result values.
42
The result structure array indexes in use are:
44
Variables describing the overall result:
45
result(conn) The connection handle (the socket channel)
46
result(nattr) Number of attributes (columns)
47
result(ntuple) Number of tuples (rows)
48
result(status) PostgreSQL status code, e.g. PGRES_TUPLES_OK
49
result(error) Error message if status is PGRES_FATAL_ERROR
50
result(complete) Command completion status, e.g. "INSERT 10101"
51
result(error,C) Error message field C if status is PGRES_FATAL_ERROR.
52
C is one of the codes for extended error message fields.
54
Variables describing the attributes (columns) in the result:
55
result(attrs) A list of the name of each attribute
56
result(types) A list of the type OID for each attribute
57
result(sizes) A list of attribute byte lengths or -1 if variable
58
result(modifs) A list of the size modifier for each attributes
59
result(formats) A list of the data format for each attributes
60
result(tbloids) A list of the table OIDs for each attribute
62
Variables storing the query result values:
63
result($irow,$icol) Data value for result
64
result(null,$irow,$icol) NULL flag for result
66
The pg_exec and pg_exec_prepared commands create and return a new result
67
structure. The pg_result command retrieves information from the result
68
structure and also frees the result structure with the -clear option.
69
(Other commands, notably pg_select and pg_execute, use pg_exec, so they
70
also make a result structure, but it stays internal to the command and the
71
caller never sees it.) The result structure innards are also directly
72
accessed by some other routines, such as pg_select and pg_execute. Result
73
structure arrays are unset (freed) by pg_result -clear, and any left-over
74
result structures associated with a connection handle are freed when the
75
connection handle is closed by pg_disconnect.
77
The query result values are stored in result($irow,$icol) where $irow is
78
the tuple (row) number, between 0 and $result(ntuples)-1 inclusive, and
79
$icol is the attribute (column) number, between 0 and $result(nattr)-1
80
inclusive. If the value returned by the database is NULL, then
81
$result($irow,$icol) is set to an empty string, and
82
$result(null,$irow,$icol) is also set to an empty string for this row and
83
column. For non-NULL values, $result(null,$irow,$icol) is not set at all.
84
The "null,*,*" indexes are used only by pg_result -getNull if it is
85
necessary for the application to distinguish NULL from empty string - both
86
of which are stored as empty strings in result($irow,$icol) and return an
87
empty string with any of the pg_result access methods. There is no way to
88
distinguish NULL from empty string with pg_select, pg_execute, or
91
The entire result of a query is stored before anything else happens (that
92
is, before pg_exec and pg_exec_prepared return, and before pg_execute and
93
pg_select process the first row). This is also true of libpq and libpgtcl
94
(in their synchronous mode), but Tcl can be slower.
96
Extended error message fields are new with PostgreSQL-7.4. Individual parts
97
of a received error message are stored in the result array indexed by
98
(error,$c) where $c is the one-letter code used in the protocol. See the
99
pgin.tcl documentation for "pg_result -errorField" for more information.
100
(As of 2.2.0, pg_result -errorField is the same as pg_result -error: both
101
take an optional field name or code argument to return an extended error
102
message field, rather than the full message.)
104
-----------------------------------------------------------------------------
107
PostgreSQL protocol version 3 (PostgreSQL-7.4) uses a message-based
108
protocol. To read messages from the backend, pgin.tcl implements a
109
per-connection buffer using several Tcl variables in the pgtcl namespace.
110
The name of the connection handle (the socket name) is part of the variable
111
name, represented by $c below.
113
pgtcl::buf_$c The buffer holding a message from the backend.
114
pgtcl::bufi_$c Index of the next byte to be processed from buf_$c
115
pgtcl::bufn_$c Total number of bytes in the buffer buf_$c.
117
For example, if the connection handle is "sock3", the variables are
118
pgtcl::buf_sock3, pgtcl::bufi_sock3, and pgtcl::bufn_sock3.
120
A few tests determined that the fastest way to fetch data from the buffers
121
in Tcl was to use [string index] and [string range], although this might
124
-----------------------------------------------------------------------------
127
The PostgreSQL backend can notify a front-end client about some parameters,
128
and pgin.tcl stores these in the following variable in the pgtcl namespace:
130
pgtcl::param_$c Array of parameter values, indexed by parameter name
132
where $c is the connection handle (socket name).
134
Access to these parameters is through the pg_parameter_status command,
135
a pgin.tcl extension.
137
-----------------------------------------------------------------------------
140
This version of pgin.tcl speaks only to a Protocol Version 3 PostgreSQL
141
backend (7.4 or later). There is one concession made to Version 2, and
142
that is reading an error message. If a Version 2 error message is read,
143
pgin.tcl will recognize it and pretend it got a Version 3 message. This
144
is for use during the connection stage, to allow it to fail with a
145
proper message if connecting to a Version 2-only backend.
147
-----------------------------------------------------------------------------
150
An array pgtcl::notify keeps track of notifications you want. The array is
151
indexed as pgtcl::notify(connection,name) where connection is the
152
connection handle (socket name) and name is the parameter used in
153
pg_listen. The value of an array element is the command to execute on
154
notification. Note that no data is passed - just the fact that the
155
notification occurred.
157
-----------------------------------------------------------------------------
160
Notice and warning message handling can be customized using the
161
pg_notice_handler command. By default, the notice handler is
162
puts -nonewline stderr
163
and this string will be returned the first time pg_notice_handler is
164
called. A notice handler should be defined as a proc with one or more
165
arguments. Leading arguments are supplied when the handler is set with
166
pg_notice_handler, and the final argument is the notice or warning message.
168
-----------------------------------------------------------------------------
171
The large object commands are implemented using the PostgreSQL "fast-path"
172
function call interface (same as libpq). See the next section for more
173
information on fast-path.
175
The pg_lo_creat command takes a mode argument. According to the PostgreSQL
176
libpq documentation, lo_creat should take "INV_READ", "INV_WRITE", or
177
"INV_READ|INV_WRITE". (pgin.tcl accepts "r", "w", and "rw" as equivalent
178
to those respectively, but this is not compatible with libpgtcl.) It isn't
179
clear why you would ever create a large object with other than
180
"INV_READ|INV_WRITE".
182
The pg_lo_open command also takes a mode argument. According to the
183
PostgreSQL libpq documentation, lo_open takes the same mode values as
184
lo_creat. But in libpgtcl the pg_lo_open command takes "r", "w", or "rw"
185
for the mode, for some reason. pgin.tcl accepts either form for mode,
186
but to be compatible with libpgtcl you should use "r", "w", or "rw"
187
with pg_lo_open instead of INV_READ, INV_WRITE, or INV_READ|INV_WRITE.
190
-----------------------------------------------------------------------------
191
FAST-PATH FUNCTION CALLS
193
Access to the PostgreSQL "Fast-path function call" interface is available
194
in pgin.tcl. This was written to implement the large object command, and
195
general use is discouraged. See the libpq documentation for more details on
196
what this interface is and how to use it.
198
It is expected that the Fast-path function call interface in PostgreSQL
199
will be deprecated in favor of using the Extended Protocol to do
200
separate Prepare, Bind, and Execute steps. See PREPARE/BIND/EXECUTE.
202
Internally, backend functions are called by their PostgreSQL OID, but
203
pgin.tcl handles the mapping of function name to OID for you. The
204
fast-path function interface in pgin.tcl uses an array pgtcl::fnoids to
205
cache object IDs of the PostgreSQL functions. One instance of this array
206
is shared among all connections, under the assumption that these OIDs are
207
common to all databases. (It is possible that if you have simultaneous
208
connections to multiple database servers running different versions of
209
PostgreSQL this could break.) The index to pgtcl::fnoids is the name
210
of the function, or the function plus argument type list, as supplied
211
to the pgin.tcl fast-path function call commands. The value of each
212
array index is the OID of the function.
214
PostgreSQL supports overloaded functions (same name, different number
215
and/or argument types). You can call overloaded functions with pgin.tcl by
216
specifying the argument type list after the function name. See examples
217
below. You must specify the argument list exactly like psql "\df" does - as
218
a list of correct type names, separated by a single comma and space. There
219
is currently no provision to distinguish functions by their return type. It
220
doesn't seem like there are any PostgreSQL functions which differ only by
223
Before PostgreSQL-7.4, certain errors in fast-path calls (such as supplying
224
the wrong number of arguments to the backend function) would cause the
225
back-end and front-end to lose synchronization, and the channel would be
226
closed. This was true about libpq as well. This has been fixed with the
227
new protocol in PostgreSQL-7.4.
232
pg_callfn $db "fname" result "arginfo" arg...
234
Call a PostgreSQL backend function and store the result.
235
Returns the size of the result in bytes.
239
$db is the connection handle.
241
"fname" is the PostgreSQL function name. This is either a simple
242
name, like "encode", or a name followed by a parenthesized
243
argument type list, like "like(text, text)". The second form
244
is needed to specify which of several overloaded functions you want
247
"result" is the name of a variable where the PostgreSQL backend
248
function returned value is to be stored. The number of bytes
249
stored in "result" is returned as the value of pg_callfn.
251
"arginfo" is a list of argument descriptors. Each list element is
252
one of the following:
253
I An integer32 argument is expected.
254
S A Tcl string argument is expected. The length of the
255
string is used (remember Tcl strings can contain null bytes).
257
A Tcl string argument is expected, and exactly this many
258
bytes of the string argument are passed (padding with null
261
arg... Zero or more arguments to the PostgreSQL function follow.
262
The number of arguments must match the number of elements
263
in the "arginfo" list. The values are passed to the backend
264
function according to the corresponding descriptor in
267
For PostgreSQL backend functions which return a single integer32 argument,
268
the following simplified interface is available:
270
pg_callfn_int $db "fname" "arginfo" arg...
272
The db, fname, arginfo, and other arguments are the same as
273
for pg_callfn. The return value from pg_callfn_int is the
274
integer32 value returned by the PostgreSQL backend function.
277
Note: These examples demonstrate the command, but in both of these
278
cases you would be better off using an SQL query instead.
280
set n [pg_callfn $db version result ""]
281
This calls the backend function version() and stores the return
282
value in $result and the result length in $n.
284
pg_callfn $db encode result {S S} $str base64
285
This calls the backend function encode($str, "base64") with 2
286
string arguments and stores the result in $result.
288
pg_callfn_int $db length(text) S "This is a test"
289
This calls the backend function length("This is a test"). Because
290
there are multiple functions called length(), the argument type
291
list "(text)" must be given after the function name. The length
292
of the string (14) is returned by the function.
294
-----------------------------------------------------------------------------
297
Starting with PostgreSQL-7.4, access to separate Parse, Bind, and Execute
298
steps are provided by the protocol. The Parse step can be replaced by an
299
SQL PREPARE command. pgin.tcl provides support for this extended query
300
protocol with pg_exec_prepared (introduced in pgin.tcl-2.0.0), and
301
pg_exec_params (introduced in pgin.tcl-2.1.0). There is also a variation of
302
pg_exec which provides a simplified interface to pg_exec_params.
304
The main advantage of the extended query protocol is separation of
305
parameters from the query text string. This avoids the need to quote and
306
escape parameters, and may prevent SQL injection attacks. pg_exec_prepared
307
also offers some performance advantages if a query can be prepared, parsed,
308
and stored once and then execute multiple times without re-parsing.
310
In addition to working with text parameters and results, the
311
pg_exec_prepared and pg_exec_params commands support sending unescaped
312
binary data to the server. (Fast-path function calls also support this.)
313
These commands also support returning binary data to the client. (This can
314
also be done with binary cursors.) Although the protocol definition and
315
pgin.tcl commands support mixed text and binary results, libpq requires all
316
result columns to be text, or all binary. Using mixed binary/text result
317
columns will make your application incompatible with libpq-based versions
320
pg_exec_prepared is for execution of pre-prepared SQL statements after
321
binding parameters. A named SQL statement must be prepared using the SQL
322
"PREPARE" command before using pg_exec_prepared. An advantage of
323
pg_exec_prepared is that the protocol-level Parse step requires the client
324
to translate parameter types to OIDs, but using PREPARE lets the server
325
determine the parameter argument types. pg_exec_prepared is modelled after
326
the Libpq call: PQexecPrepared().
328
pg_exec_params does all three steps of the extended query protocol: parse,
329
bind, and execute. Parameter types can be specified by type OID, or parameters
330
can be based as text to be interpreted by the server as it does for any
331
untyped literal string. To find the type OID of a PostgreSQL type '<T>',
332
you need to query the server like this:
333
SELECT oid FROM pg_type where typname='<T>'
334
pg_exec_params is modelled after the Libpq call: PQexecParams().
336
A limitation of both pg_exec_prepared and pg_exec_params is lack of support
337
for NULLs as parameter values. There is no way to pass a NULL parameter to
338
the prepared statement. This is not a protocol or database limitation, but
339
just lack of a good idea on how to implement the command interface to
340
support NULLs without needlessly complication the more common case without
344
-----------------------------------------------------------------------------
347
MD5 authentication was added at PostgreSQL-7.2. This is a
348
challenge/response protocol which avoids having clear-text passwords passed
349
over the network. To activate this, the PostgreSQL administrator puts "md5"
350
in the pg_hba.conf file instead of "password". Pgin.tcl supports this
351
transparently; that is, if the backend requests MD5 authentication during
352
the connection, pg_connect will use this protocol. The MD5 implementation
353
was coded by the original author of pgin.tcl. It does not use the tcllib
354
implementation, which is significantly faster but much more complex.
356
-----------------------------------------------------------------------------
359
Character set encoding was added to pgin.tcl-3.0.0. More information can be
360
found in the README and REFERENCE files.
362
The following are converted to Unicode before being sent to PostgreSQL:
364
+ Query strings (pg_exec, and all higher-level commands which use it)
365
+ TEXT-format query parameters in pg_exec_prepared/pg_exec_params
366
+ All parameter arguments in pg_exec when query parameters are used
367
+ Prepared statement name in pg_exec_prepared
368
+ COPY table FROM STDIN data sent using pg_copy_write
370
The following are converted from Unicode when received from PostgreSQL:
372
+ Query result column data when TEXT-format (not when BINARY-format)
373
+ All Error and Notice response strings
374
+ Parameter names and values
375
+ Notification messages
376
+ Command completion message
377
+ Query result field names (column names)
378
+ COPY table TO STDOUT data received using pg_copy_read
380
Conversion of data to Unicode for sending to PostgreSQL occurs in 5 places
381
in the code: pg_exec and pg_exec_params query strings, pg_exec_prepared
382
statement name, pg_exec_prepared text format parameters, and when writing
383
COPY FROM data in pg_copy_write.
385
Conversion of Unicode data from PostgreSQL occurs in 3 places in the code:
386
when receiving a protocol message "string" type (which covers various
387
messages, parameters, and field names), when reading TEXT mode tuple data,
388
and when reading COPY TO data in pg_copy_read.
390
There is no Unicode conversion for the connection parameters username,
391
database-name, or password. PostgreSQL seems to store these using the
392
encoding of the database cluster/template1 database, which may differ from
393
the encoding of the database to which the client is connected. It is
394
unclear how to recode these characters. At this time, it is wise to avoid
395
non-ASCII characters in database names, usernames, and passwords. This may
396
be fixed in the future.
398
The fast-path function call interface treats all its arguments as binary
399
data and does not encode or decode them. The fast-path function calls
400
were implemented primarily for large object support, and large object
401
support is not affected by Unicode encoding because it is all binary
402
data. It is unlikely that encoding support will be added to fast-path
403
function calls, since parameterized queries are the preferred replacement.
405
-----------------------------------------------------------------------------