~ubuntu-branches/ubuntu/oneiric/postgresql-9.1/oneiric-updates

« back to all changes in this revision

Viewing changes to src/backend/utils/adt/selfuncs.c

  • Committer: Package Import Robot
  • Author(s): Jamie Strandboge
  • Date: 2012-08-16 17:06:20 UTC
  • mfrom: (1.1.9)
  • Revision ID: package-import@ubuntu.com-20120816170620-jo0bdsnqqc4u5iqf
Tags: 9.1.5-0ubuntu11.10
* New upstream bug fix/security release:
 - Prevent access to external files/URLs via XML entity references
   (Noah Misch, Tom Lane)
   xml_parse() would attempt to fetch external files or URLs as needed
   to resolve DTD and entity references in an XML value, thus allowing
   unprivileged database users to attempt to fetch data with the
   privileges of the database server. While the external data wouldn't
   get returned directly to the user, portions of it could be exposed
   in error messages if the data didn't parse as valid XML; and in any
   case the mere ability to check existence of a file might be useful
   to an attacker. (CVE-2012-3489)
 - Prevent access to external files/URLs via "contrib/xml2"'s
   xslt_process() (Peter Eisentraut)
   libxslt offers the ability to read and write both files and URLs
   through stylesheet commands, thus allowing unprivileged database
   users to both read and write data with the privileges of the
   database server. Disable that through proper use of libxslt's
   security options. (CVE-2012-3488)
   Also, remove xslt_process()'s ability to fetch documents and
   stylesheets from external files/URLs. While this was a documented
   "feature", it was long regarded as a bad idea. The fix for
   CVE-2012-3489 broke that capability, and rather than expend effort
   on trying to fix it, we're just going to summarily remove it.
 - Prevent too-early recycling of btree index pages (Noah Misch)
   When we allowed read-only transactions to skip assigning XIDs, we
   introduced the possibility that a deleted btree page could be
   recycled while a read-only transaction was still in flight to it.
   This would result in incorrect index search results. The
   probability of such an error occurring in the field seems very low
   because of the timing requirements, but nonetheless it should be
   fixed.
 - Fix crash-safety bug with newly-created-or-reset sequences (Tom
   Lane)
   If "ALTER SEQUENCE" was executed on a freshly created or reset
   sequence, and then precisely one nextval() call was made on it, and
   then the server crashed, WAL replay would restore the sequence to a
   state in which it appeared that no nextval() had been done, thus
   allowing the first sequence value to be returned again by the next
   nextval() call. In particular this could manifest for serial
   columns, since creation of a serial column's sequence includes an
   "ALTER SEQUENCE OWNED BY" step.
 - Fix race condition in enum-type value comparisons (Robert Haas, Tom
   Lane)
   Comparisons could fail when encountering an enum value added since
   the current query started.
 - Fix txid_current() to report the correct epoch when not in hot
   standby (Heikki Linnakangas)
   This fixes a regression introduced in the previous minor release.
 - Prevent selection of unsuitable replication connections as the
   synchronous standby (Fujii Masao)
   The master might improperly choose pseudo-servers such as
   pg_receivexlog or pg_basebackup as the synchronous standby, and
   then wait indefinitely for them.
 - Fix bug in startup of Hot Standby when a master transaction has
   many subtransactions (Andres Freund)
   This mistake led to failures reported as "out-of-order XID
   insertion in KnownAssignedXids".
 - Ensure the "backup_label" file is fsync'd after pg_start_backup()
   (Dave Kerr)
 - Fix timeout handling in walsender processes (Tom Lane)
   WAL sender background processes neglected to establish a SIGALRM
   handler, meaning they would wait forever in some corner cases where
   a timeout ought to happen.
 - Wake walsenders after each background flush by walwriter (Andres
   Freund, Simon Riggs)
   This greatly reduces replication delay when the workload contains
   only asynchronously-committed transactions.
 - Fix LISTEN/NOTIFY to cope better with I/O problems, such as out of
   disk space (Tom Lane)
   After a write failure, all subsequent attempts to send more NOTIFY
   messages would fail with messages like "Could not read from file
   "pg_notify/nnnn" at offset nnnnn: Success".
 - Only allow autovacuum to be auto-canceled by a directly blocked
   process (Tom Lane)
   The original coding could allow inconsistent behavior in some
   cases; in particular, an autovacuum could get canceled after less
   than deadlock_timeout grace period.
 - Improve logging of autovacuum cancels (Robert Haas)
 - Fix log collector so that log_truncate_on_rotation works during the
   very first log rotation after server start (Tom Lane)
 - Fix WITH attached to a nested set operation
   (UNION/INTERSECT/EXCEPT) (Tom Lane)
 - Ensure that a whole-row reference to a subquery doesn't include any
   extra GROUP BY or ORDER BY columns (Tom Lane)
 - Fix dependencies generated during ALTER TABLE ... ADD CONSTRAINT
   USING INDEX (Tom Lane)
   This command left behind a redundant pg_depend entry for the index,
   which could confuse later operations, notably ALTER TABLE ... ALTER
   COLUMN TYPE on one of the indexed columns.
 - Fix "REASSIGN OWNED" to work on extensions (Alvaro Herrera)
 - Disallow copying whole-row references in CHECK constraints and
   index definitions during "CREATE TABLE" (Tom Lane)
   This situation can arise in "CREATE TABLE" with LIKE or INHERITS.
   The copied whole-row variable was incorrectly labeled with the row
   type of the original table not the new one. Rejecting the case
   seems reasonable for LIKE, since the row types might well diverge
   later. For INHERITS we should ideally allow it, with an implicit
   coercion to the parent table's row type; but that will require more
   work than seems safe to back-patch.
 - Fix memory leak in ARRAY(SELECT ...) subqueries (Heikki
   Linnakangas, Tom Lane)
 - Fix planner to pass correct collation to operator selectivity
   estimators (Tom Lane)
   This was not previously required by any core selectivity estimation
   function, but third-party code might need it.
 - Fix extraction of common prefixes from regular expressions (Tom
   Lane)
   The code could get confused by quantified parenthesized
   subexpressions, such as ^(foo)?bar. This would lead to incorrect
   index optimization of searches for such patterns.
 - Fix bugs with parsing signed "hh":"mm" and "hh":"mm":"ss" fields in
   interval constants (Amit Kapila, Tom Lane)
 - Fix pg_dump to better handle views containing partial GROUP BY
   lists (Tom Lane)
   A view that lists only a primary key column in GROUP BY, but uses
   other table columns as if they were grouped, gets marked as
   depending on the primary key. Improper handling of such primary key
   dependencies in pg_dump resulted in poorly-ordered dumps, which at
   best would be inefficient to restore and at worst could result in
   outright failure of a parallel pg_restore run.
 - In PL/Perl, avoid setting UTF8 flag when in SQL_ASCII encoding
   (Alex Hunsaker, Kyotaro Horiguchi, Alvaro Herrera)
 - Use Postgres' encoding conversion functions, not Python's, when
   converting a Python Unicode string to the server encoding in
   PL/Python (Jan Urbanski)
   This avoids some corner-case problems, notably that Python doesn't
   support all the encodings Postgres does. A notable functional
   change is that if the server encoding is SQL_ASCII, you will get
   the UTF-8 representation of the string; formerly, any non-ASCII
   characters in the string would result in an error.
 - Fix mapping of PostgreSQL encodings to Python encodings in
   PL/Python (Jan Urbanski)
 - Report errors properly in "contrib/xml2"'s xslt_process() (Tom
   Lane)
 - Update time zone data files to tzdata release 2012e for DST law
   changes in Morocco and Tokelau

Show diffs side-by-side

added added

removed removed

Lines of Context:
83
83
 * joins, however, the selectivity is defined as the fraction of the left-hand
84
84
 * side relation's rows that are expected to have a match (ie, at least one
85
85
 * row with a TRUE result) in the right-hand side.
 
86
 *
 
87
 * For both oprrest and oprjoin functions, the operator's input collation OID
 
88
 * (if any) is passed using the standard fmgr mechanism, so that the estimator
 
89
 * function can fetch it with PG_GET_COLLATION().  Note, however, that all
 
90
 * statistics in pg_statistic are currently built using the database's default
 
91
 * collation.  Thus, in most cases where we are looking at statistics, we
 
92
 * should ignore the actual operator collation and use DEFAULT_COLLATION_OID.
 
93
 * We expect that the error induced by doing this is usually not large enough
 
94
 * to justify complicating matters.
86
95
 *----------
87
96
 */
88
97
 
177
186
static Selectivity prefix_selectivity(PlannerInfo *root,
178
187
                                   VariableStatData *vardata,
179
188
                                   Oid vartype, Oid opfamily, Const *prefixcon);
180
 
static Selectivity pattern_selectivity(Const *patt, Pattern_Type ptype);
 
189
static Selectivity like_selectivity(const char *patt, int pattlen,
 
190
                                                                        bool case_insensitive);
 
191
static Selectivity regex_selectivity(const char *patt, int pattlen,
 
192
                                                                         bool case_insensitive,
 
193
                                                                         int fixed_prefix_len);
181
194
static Datum string_to_datum(const char *str, Oid datatype);
182
195
static Const *string_to_const(const char *str, Oid datatype);
183
196
static Const *string_to_bytea_const(const char *str, size_t str_len);
1087
1100
        Oid                     operator = PG_GETARG_OID(1);
1088
1101
        List       *args = (List *) PG_GETARG_POINTER(2);
1089
1102
        int                     varRelid = PG_GETARG_INT32(3);
 
1103
        Oid                     collation = PG_GET_COLLATION();
1090
1104
        VariableStatData vardata;
1091
1105
        Node       *other;
1092
1106
        bool            varonleft;
1095
1109
        Oid                     vartype;
1096
1110
        Oid                     opfamily;
1097
1111
        Pattern_Prefix_Status pstatus;
1098
 
        Const      *patt = NULL;
 
1112
        Const      *patt;
1099
1113
        Const      *prefix = NULL;
1100
 
        Const      *rest = NULL;
 
1114
        Selectivity     rest_selec = 0;
1101
1115
        double          result;
1102
1116
 
1103
1117
        /*
1187
1201
        }
1188
1202
 
1189
1203
        /*
1190
 
         * Divide pattern into fixed prefix and remainder.      XXX we have to assume
1191
 
         * default collation here, because we don't have access to the actual
1192
 
         * input collation for the operator.  FIXME ...
 
1204
         * Pull out any fixed prefix implied by the pattern, and estimate the
 
1205
         * fractional selectivity of the remainder of the pattern.  Unlike many of
 
1206
         * the other functions in this file, we use the pattern operator's actual
 
1207
         * collation for this step.  This is not because we expect the collation
 
1208
         * to make a big difference in the selectivity estimate (it seldom would),
 
1209
         * but because we want to be sure we cache compiled regexps under the
 
1210
         * right cache key, so that they can be re-used at runtime.
1193
1211
         */
1194
1212
        patt = (Const *) other;
1195
 
        pstatus = pattern_fixed_prefix(patt, ptype, DEFAULT_COLLATION_OID,
1196
 
                                                                   &prefix, &rest);
 
1213
        pstatus = pattern_fixed_prefix(patt, ptype, collation,
 
1214
                                                                   &prefix, &rest_selec);
1197
1215
 
1198
1216
        /*
1199
 
         * If necessary, coerce the prefix constant to the right type. (The "rest"
1200
 
         * constant need not be changed.)
 
1217
         * If necessary, coerce the prefix constant to the right type.
1201
1218
         */
1202
1219
        if (prefix && prefix->consttype != vartype)
1203
1220
        {
1271
1288
                {
1272
1289
                        Selectivity heursel;
1273
1290
                        Selectivity prefixsel;
1274
 
                        Selectivity restsel;
1275
1291
 
1276
1292
                        if (pstatus == Pattern_Prefix_Partial)
1277
1293
                                prefixsel = prefix_selectivity(root, &vardata, vartype,
1278
1294
                                                                                           opfamily, prefix);
1279
1295
                        else
1280
1296
                                prefixsel = 1.0;
1281
 
                        restsel = pattern_selectivity(rest, ptype);
1282
 
                        heursel = prefixsel * restsel;
 
1297
                        heursel = prefixsel * rest_selec;
1283
1298
 
1284
1299
                        if (selec < 0)          /* fewer than 10 histogram entries? */
1285
1300
                                selec = heursel;
1776
1791
                                                                                elem_nulls[i],
1777
1792
                                                                                elmbyval));
1778
1793
                        if (is_join_clause)
1779
 
                                s2 = DatumGetFloat8(FunctionCall5(&oprselproc,
1780
 
                                                                                                  PointerGetDatum(root),
1781
 
                                                                                                  ObjectIdGetDatum(operator),
1782
 
                                                                                                  PointerGetDatum(args),
1783
 
                                                                                                  Int16GetDatum(jointype),
1784
 
                                                                                                  PointerGetDatum(sjinfo)));
 
1794
                                s2 = DatumGetFloat8(FunctionCall5Coll(&oprselproc,
 
1795
                                                                                                          clause->inputcollid,
 
1796
                                                                                                          PointerGetDatum(root),
 
1797
                                                                                                          ObjectIdGetDatum(operator),
 
1798
                                                                                                          PointerGetDatum(args),
 
1799
                                                                                                          Int16GetDatum(jointype),
 
1800
                                                                                                          PointerGetDatum(sjinfo)));
1785
1801
                        else
1786
 
                                s2 = DatumGetFloat8(FunctionCall4(&oprselproc,
1787
 
                                                                                                  PointerGetDatum(root),
1788
 
                                                                                                  ObjectIdGetDatum(operator),
1789
 
                                                                                                  PointerGetDatum(args),
1790
 
                                                                                                  Int32GetDatum(varRelid)));
 
1802
                                s2 = DatumGetFloat8(FunctionCall4Coll(&oprselproc,
 
1803
                                                                                                          clause->inputcollid,
 
1804
                                                                                                          PointerGetDatum(root),
 
1805
                                                                                                          ObjectIdGetDatum(operator),
 
1806
                                                                                                          PointerGetDatum(args),
 
1807
                                                                                                          Int32GetDatum(varRelid)));
1791
1808
                        if (useOr)
1792
1809
                                s1 = s1 + s2 - s1 * s2;
1793
1810
                        else
1818
1835
                         */
1819
1836
                        args = list_make2(leftop, elem);
1820
1837
                        if (is_join_clause)
1821
 
                                s2 = DatumGetFloat8(FunctionCall5(&oprselproc,
1822
 
                                                                                                  PointerGetDatum(root),
1823
 
                                                                                                  ObjectIdGetDatum(operator),
1824
 
                                                                                                  PointerGetDatum(args),
1825
 
                                                                                                  Int16GetDatum(jointype),
1826
 
                                                                                                  PointerGetDatum(sjinfo)));
 
1838
                                s2 = DatumGetFloat8(FunctionCall5Coll(&oprselproc,
 
1839
                                                                                                          clause->inputcollid,
 
1840
                                                                                                          PointerGetDatum(root),
 
1841
                                                                                                          ObjectIdGetDatum(operator),
 
1842
                                                                                                          PointerGetDatum(args),
 
1843
                                                                                                          Int16GetDatum(jointype),
 
1844
                                                                                                          PointerGetDatum(sjinfo)));
1827
1845
                        else
1828
 
                                s2 = DatumGetFloat8(FunctionCall4(&oprselproc,
1829
 
                                                                                                  PointerGetDatum(root),
1830
 
                                                                                                  ObjectIdGetDatum(operator),
1831
 
                                                                                                  PointerGetDatum(args),
1832
 
                                                                                                  Int32GetDatum(varRelid)));
 
1846
                                s2 = DatumGetFloat8(FunctionCall4Coll(&oprselproc,
 
1847
                                                                                                          clause->inputcollid,
 
1848
                                                                                                          PointerGetDatum(root),
 
1849
                                                                                                          ObjectIdGetDatum(operator),
 
1850
                                                                                                          PointerGetDatum(args),
 
1851
                                                                                                          Int32GetDatum(varRelid)));
1833
1852
                        if (useOr)
1834
1853
                                s1 = s1 + s2 - s1 * s2;
1835
1854
                        else
1854
1873
                dummyexpr->collation = clause->inputcollid;
1855
1874
                args = list_make2(leftop, dummyexpr);
1856
1875
                if (is_join_clause)
1857
 
                        s2 = DatumGetFloat8(FunctionCall5(&oprselproc,
1858
 
                                                                                          PointerGetDatum(root),
1859
 
                                                                                          ObjectIdGetDatum(operator),
1860
 
                                                                                          PointerGetDatum(args),
1861
 
                                                                                          Int16GetDatum(jointype),
1862
 
                                                                                          PointerGetDatum(sjinfo)));
 
1876
                        s2 = DatumGetFloat8(FunctionCall5Coll(&oprselproc,
 
1877
                                                                                                  clause->inputcollid,
 
1878
                                                                                                  PointerGetDatum(root),
 
1879
                                                                                                  ObjectIdGetDatum(operator),
 
1880
                                                                                                  PointerGetDatum(args),
 
1881
                                                                                                  Int16GetDatum(jointype),
 
1882
                                                                                                  PointerGetDatum(sjinfo)));
1863
1883
                else
1864
 
                        s2 = DatumGetFloat8(FunctionCall4(&oprselproc,
1865
 
                                                                                          PointerGetDatum(root),
1866
 
                                                                                          ObjectIdGetDatum(operator),
1867
 
                                                                                          PointerGetDatum(args),
1868
 
                                                                                          Int32GetDatum(varRelid)));
 
1884
                        s2 = DatumGetFloat8(FunctionCall4Coll(&oprselproc,
 
1885
                                                                                                  clause->inputcollid,
 
1886
                                                                                                  PointerGetDatum(root),
 
1887
                                                                                                  ObjectIdGetDatum(operator),
 
1888
                                                                                                  PointerGetDatum(args),
 
1889
                                                                                                  Int32GetDatum(varRelid)));
1869
1890
                s1 = useOr ? 0.0 : 1.0;
1870
1891
 
1871
1892
                /*
1937
1958
{
1938
1959
        Selectivity s1;
1939
1960
        Oid                     opno = linitial_oid(clause->opnos);
 
1961
        Oid                     inputcollid = linitial_oid(clause->inputcollids);
1940
1962
        List       *opargs;
1941
1963
        bool            is_join_clause;
1942
1964
 
1977
1999
                /* Estimate selectivity for a join clause. */
1978
2000
                s1 = join_selectivity(root, opno,
1979
2001
                                                          opargs,
 
2002
                                                          inputcollid,
1980
2003
                                                          jointype,
1981
2004
                                                          sjinfo);
1982
2005
        }
1985
2008
                /* Estimate selectivity for a restriction clause. */
1986
2009
                s1 = restriction_selectivity(root, opno,
1987
2010
                                                                         opargs,
 
2011
                                                                         inputcollid,
1988
2012
                                                                         varRelid);
1989
2013
        }
1990
2014
 
4869
4893
 *
4870
4894
 * *prefix is set to a palloc'd prefix string (in the form of a Const node),
4871
4895
 *      or to NULL if no fixed prefix exists for the pattern.
4872
 
 * *rest is set to a palloc'd Const representing the remainder of the pattern
4873
 
 *      after the portion describing the fixed prefix.
4874
 
 * Each of these has the same type (TEXT or BYTEA) as the given pattern Const.
 
4896
 * If rest_selec is not NULL, *rest_selec is set to an estimate of the
 
4897
 *      selectivity of the remainder of the pattern (without any fixed prefix).
 
4898
 * The prefix Const has the same type (TEXT or BYTEA) as the input pattern.
4875
4899
 *
4876
4900
 * The return value distinguishes no fixed prefix, a partial prefix,
4877
4901
 * or an exact-match-only pattern.
4879
4903
 
4880
4904
static Pattern_Prefix_Status
4881
4905
like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
4882
 
                                  Const **prefix_const, Const **rest_const)
 
4906
                                  Const **prefix_const, Selectivity *rest_selec)
4883
4907
{
4884
4908
        char       *match;
4885
4909
        char       *patt;
4886
4910
        int                     pattlen;
4887
 
        char       *rest;
4888
4911
        Oid                     typeid = patt_const->consttype;
4889
4912
        int                     pos,
4890
4913
                                match_pos;
4964
4987
        }
4965
4988
 
4966
4989
        match[match_pos] = '\0';
4967
 
        rest = &patt[pos];
4968
4990
 
4969
4991
        if (typeid != BYTEAOID)
4970
 
        {
4971
4992
                *prefix_const = string_to_const(match, typeid);
4972
 
                *rest_const = string_to_const(rest, typeid);
4973
 
        }
4974
4993
        else
4975
 
        {
4976
4994
                *prefix_const = string_to_bytea_const(match, match_pos);
4977
 
                *rest_const = string_to_bytea_const(rest, pattlen - pos);
4978
 
        }
 
4995
 
 
4996
        if (rest_selec != NULL)
 
4997
                *rest_selec = like_selectivity(&patt[pos], pattlen - pos,
 
4998
                                                                           case_insensitive);
4979
4999
 
4980
5000
        pfree(patt);
4981
5001
        pfree(match);
4992
5012
 
4993
5013
static Pattern_Prefix_Status
4994
5014
regex_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
4995
 
                                   Const **prefix_const, Const **rest_const)
 
5015
                                   Const **prefix_const, Selectivity *rest_selec)
4996
5016
{
4997
 
        char       *match;
4998
 
        int                     pos,
4999
 
                                match_pos,
5000
 
                                prev_pos,
5001
 
                                prev_match_pos;
5002
 
        bool            have_leading_paren;
5003
 
        char       *patt;
5004
 
        char       *rest;
5005
5017
        Oid                     typeid = patt_const->consttype;
5006
 
        bool            is_multibyte = (pg_database_encoding_max_length() > 1);
5007
 
        pg_locale_t locale = 0;
5008
 
        bool            locale_is_c = false;
 
5018
        char       *prefix;
 
5019
        bool            exact;
5009
5020
 
5010
5021
        /*
5011
5022
         * Should be unnecessary, there are no bytea regex operators defined. As
5017
5028
                                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
5018
5029
                 errmsg("regular-expression matching not supported on type bytea")));
5019
5030
 
5020
 
        if (case_insensitive)
5021
 
        {
5022
 
                /* If case-insensitive, we need locale info */
5023
 
                if (lc_ctype_is_c(collation))
5024
 
                        locale_is_c = true;
5025
 
                else if (collation != DEFAULT_COLLATION_OID)
5026
 
                {
5027
 
                        if (!OidIsValid(collation))
5028
 
                        {
5029
 
                                /*
5030
 
                                 * This typically means that the parser could not resolve a
5031
 
                                 * conflict of implicit collations, so report it that way.
5032
 
                                 */
5033
 
                                ereport(ERROR,
5034
 
                                                (errcode(ERRCODE_INDETERMINATE_COLLATION),
5035
 
                                                 errmsg("could not determine which collation to use for regular expression"),
5036
 
                                                 errhint("Use the COLLATE clause to set the collation explicitly.")));
5037
 
                        }
5038
 
                        locale = pg_newlocale_from_collation(collation);
5039
 
                }
5040
 
        }
5041
 
 
5042
 
        /* the right-hand const is type text for all of these */
5043
 
        patt = TextDatumGetCString(patt_const->constvalue);
5044
 
 
5045
 
        /*
5046
 
         * Check for ARE director prefix.  It's worth our trouble to recognize
5047
 
         * this because similar_escape() used to use it, and some other code might
5048
 
         * still use it, to force ARE mode.
5049
 
         */
5050
 
        pos = 0;
5051
 
        if (strncmp(patt, "***:", 4) == 0)
5052
 
                pos = 4;
5053
 
 
5054
 
        /* Pattern must be anchored left */
5055
 
        if (patt[pos] != '^')
5056
 
        {
5057
 
                rest = patt;
5058
 
 
5059
 
                *prefix_const = NULL;
5060
 
                *rest_const = string_to_const(rest, typeid);
5061
 
 
5062
 
                return Pattern_Prefix_None;
5063
 
        }
5064
 
        pos++;
5065
 
 
5066
 
        /*
5067
 
         * If '|' is present in pattern, then there may be multiple alternatives
5068
 
         * for the start of the string.  (There are cases where this isn't so, for
5069
 
         * instance if the '|' is inside parens, but detecting that reliably is
5070
 
         * too hard.)
5071
 
         */
5072
 
        if (strchr(patt + pos, '|') != NULL)
5073
 
        {
5074
 
                rest = patt;
5075
 
 
5076
 
                *prefix_const = NULL;
5077
 
                *rest_const = string_to_const(rest, typeid);
5078
 
 
5079
 
                return Pattern_Prefix_None;
5080
 
        }
5081
 
 
5082
 
        /* OK, allocate space for pattern */
5083
 
        match = palloc(strlen(patt) + 1);
5084
 
        prev_match_pos = match_pos = 0;
5085
 
 
5086
 
        /*
5087
 
         * We special-case the syntax '^(...)$' because psql uses it.  But beware:
5088
 
         * sequences beginning "(?" are not what they seem, unless they're "(?:".
5089
 
         * (We must recognize that because of similar_escape().)
5090
 
         */
5091
 
        have_leading_paren = false;
5092
 
        if (patt[pos] == '(' &&
5093
 
                (patt[pos + 1] != '?' || patt[pos + 2] == ':'))
5094
 
        {
5095
 
                have_leading_paren = true;
5096
 
                pos += (patt[pos + 1] != '?' ? 1 : 3);
5097
 
        }
5098
 
 
5099
 
        /* Scan remainder of pattern */
5100
 
        prev_pos = pos;
5101
 
        while (patt[pos])
5102
 
        {
5103
 
                int                     len;
5104
 
 
5105
 
                /*
5106
 
                 * Check for characters that indicate multiple possible matches here.
5107
 
                 * Also, drop out at ')' or '$' so the termination test works right.
5108
 
                 */
5109
 
                if (patt[pos] == '.' ||
5110
 
                        patt[pos] == '(' ||
5111
 
                        patt[pos] == ')' ||
5112
 
                        patt[pos] == '[' ||
5113
 
                        patt[pos] == '^' ||
5114
 
                        patt[pos] == '$')
5115
 
                        break;
5116
 
 
5117
 
                /* Stop if case-varying character (it's sort of a wildcard) */
5118
 
                if (case_insensitive &&
5119
 
                  pattern_char_isalpha(patt[pos], is_multibyte, locale, locale_is_c))
5120
 
                        break;
5121
 
 
5122
 
                /*
5123
 
                 * Check for quantifiers.  Except for +, this means the preceding
5124
 
                 * character is optional, so we must remove it from the prefix too!
5125
 
                 */
5126
 
                if (patt[pos] == '*' ||
5127
 
                        patt[pos] == '?' ||
5128
 
                        patt[pos] == '{')
5129
 
                {
5130
 
                        match_pos = prev_match_pos;
5131
 
                        pos = prev_pos;
5132
 
                        break;
5133
 
                }
5134
 
                if (patt[pos] == '+')
5135
 
                {
5136
 
                        pos = prev_pos;
5137
 
                        break;
5138
 
                }
5139
 
 
5140
 
                /*
5141
 
                 * Normally, backslash quotes the next character.  But in AREs,
5142
 
                 * backslash followed by alphanumeric is an escape, not a quoted
5143
 
                 * character.  Must treat it as having multiple possible matches.
5144
 
                 * Note: since only ASCII alphanumerics are escapes, we don't have to
5145
 
                 * be paranoid about multibyte or collations here.
5146
 
                 */
5147
 
                if (patt[pos] == '\\')
5148
 
                {
5149
 
                        if (isalnum((unsigned char) patt[pos + 1]))
5150
 
                                break;
5151
 
                        pos++;
5152
 
                        if (patt[pos] == '\0')
5153
 
                                break;
5154
 
                }
5155
 
                /* save position in case we need to back up on next loop cycle */
5156
 
                prev_match_pos = match_pos;
5157
 
                prev_pos = pos;
5158
 
                /* must use encoding-aware processing here */
5159
 
                len = pg_mblen(&patt[pos]);
5160
 
                memcpy(&match[match_pos], &patt[pos], len);
5161
 
                match_pos += len;
5162
 
                pos += len;
5163
 
        }
5164
 
 
5165
 
        match[match_pos] = '\0';
5166
 
        rest = &patt[pos];
5167
 
 
5168
 
        if (have_leading_paren && patt[pos] == ')')
5169
 
                pos++;
5170
 
 
5171
 
        if (patt[pos] == '$' && patt[pos + 1] == '\0')
5172
 
        {
5173
 
                rest = &patt[pos + 1];
5174
 
 
5175
 
                *prefix_const = string_to_const(match, typeid);
5176
 
                *rest_const = string_to_const(rest, typeid);
5177
 
 
5178
 
                pfree(patt);
5179
 
                pfree(match);
5180
 
 
 
5031
        /* Use the regexp machinery to extract the prefix, if any */
 
5032
        prefix = regexp_fixed_prefix(DatumGetTextPP(patt_const->constvalue),
 
5033
                                                                 case_insensitive, collation,
 
5034
                                                                 &exact);
 
5035
 
 
5036
        if (prefix == NULL)
 
5037
        {
 
5038
                *prefix_const = NULL;
 
5039
 
 
5040
                if (rest_selec != NULL)
 
5041
                {
 
5042
                        char   *patt = TextDatumGetCString(patt_const->constvalue);
 
5043
 
 
5044
                        *rest_selec = regex_selectivity(patt, strlen(patt),
 
5045
                                                                                        case_insensitive,
 
5046
                                                                                        0);
 
5047
                        pfree(patt);
 
5048
                }
 
5049
 
 
5050
                return Pattern_Prefix_None;
 
5051
        }
 
5052
 
 
5053
        *prefix_const = string_to_const(prefix, typeid);
 
5054
 
 
5055
        if (rest_selec != NULL)
 
5056
        {
 
5057
                if (exact)
 
5058
                {
 
5059
                        /* Exact match, so there's no additional selectivity */
 
5060
                        *rest_selec = 1.0;
 
5061
                }
 
5062
                else
 
5063
                {
 
5064
                        char   *patt = TextDatumGetCString(patt_const->constvalue);
 
5065
 
 
5066
                        *rest_selec = regex_selectivity(patt, strlen(patt),
 
5067
                                                                                        case_insensitive,
 
5068
                                                                                        strlen(prefix));
 
5069
                        pfree(patt);
 
5070
                }
 
5071
        }
 
5072
 
 
5073
        pfree(prefix);
 
5074
 
 
5075
        if (exact)
5181
5076
                return Pattern_Prefix_Exact;    /* pattern specifies exact match */
5182
 
        }
5183
 
 
5184
 
        *prefix_const = string_to_const(match, typeid);
5185
 
        *rest_const = string_to_const(rest, typeid);
5186
 
 
5187
 
        pfree(patt);
5188
 
        pfree(match);
5189
 
 
5190
 
        if (match_pos > 0)
 
5077
        else
5191
5078
                return Pattern_Prefix_Partial;
5192
 
 
5193
 
        return Pattern_Prefix_None;
5194
5079
}
5195
5080
 
5196
5081
Pattern_Prefix_Status
5197
5082
pattern_fixed_prefix(Const *patt, Pattern_Type ptype, Oid collation,
5198
 
                                         Const **prefix, Const **rest)
 
5083
                                         Const **prefix, Selectivity *rest_selec)
5199
5084
{
5200
5085
        Pattern_Prefix_Status result;
5201
5086
 
5202
5087
        switch (ptype)
5203
5088
        {
5204
5089
                case Pattern_Type_Like:
5205
 
                        result = like_fixed_prefix(patt, false, collation, prefix, rest);
 
5090
                        result = like_fixed_prefix(patt, false, collation,
 
5091
                                                                           prefix, rest_selec);
5206
5092
                        break;
5207
5093
                case Pattern_Type_Like_IC:
5208
 
                        result = like_fixed_prefix(patt, true, collation, prefix, rest);
 
5094
                        result = like_fixed_prefix(patt, true, collation,
 
5095
                                                                           prefix, rest_selec);
5209
5096
                        break;
5210
5097
                case Pattern_Type_Regex:
5211
 
                        result = regex_fixed_prefix(patt, false, collation, prefix, rest);
 
5098
                        result = regex_fixed_prefix(patt, false, collation,
 
5099
                                                                                prefix, rest_selec);
5212
5100
                        break;
5213
5101
                case Pattern_Type_Regex_IC:
5214
 
                        result = regex_fixed_prefix(patt, true, collation, prefix, rest);
 
5102
                        result = regex_fixed_prefix(patt, true, collation,
 
5103
                                                                                prefix, rest_selec);
5215
5104
                        break;
5216
5105
                default:
5217
5106
                        elog(ERROR, "unrecognized ptype: %d", (int) ptype);
5326
5215
 
5327
5216
/*
5328
5217
 * Estimate the selectivity of a pattern of the specified type.
5329
 
 * Note that any fixed prefix of the pattern will have been removed already.
 
5218
 * Note that any fixed prefix of the pattern will have been removed already,
 
5219
 * so actually we may be looking at just a fragment of the pattern.
5330
5220
 *
5331
5221
 * For now, we use a very simplistic approach: fixed characters reduce the
5332
5222
 * selectivity a good deal, character ranges reduce it a little,
5340
5230
#define PARTIAL_WILDCARD_SEL 2.0
5341
5231
 
5342
5232
static Selectivity
5343
 
like_selectivity(Const *patt_const, bool case_insensitive)
 
5233
like_selectivity(const char *patt, int pattlen, bool case_insensitive)
5344
5234
{
5345
5235
        Selectivity sel = 1.0;
5346
5236
        int                     pos;
5347
 
        Oid                     typeid = patt_const->consttype;
5348
 
        char       *patt;
5349
 
        int                     pattlen;
5350
 
 
5351
 
        /* the right-hand const is type text or bytea */
5352
 
        Assert(typeid == BYTEAOID || typeid == TEXTOID);
5353
 
 
5354
 
        if (typeid == BYTEAOID && case_insensitive)
5355
 
                ereport(ERROR,
5356
 
                                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
5357
 
                   errmsg("case insensitive matching not supported on type bytea")));
5358
 
 
5359
 
        if (typeid != BYTEAOID)
5360
 
        {
5361
 
                patt = TextDatumGetCString(patt_const->constvalue);
5362
 
                pattlen = strlen(patt);
5363
 
        }
5364
 
        else
5365
 
        {
5366
 
                bytea      *bstr = DatumGetByteaP(patt_const->constvalue);
5367
 
 
5368
 
                pattlen = VARSIZE(bstr) - VARHDRSZ;
5369
 
                patt = (char *) palloc(pattlen);
5370
 
                memcpy(patt, VARDATA(bstr), pattlen);
5371
 
                if ((Pointer) bstr != DatumGetPointer(patt_const->constvalue))
5372
 
                        pfree(bstr);
5373
 
        }
5374
5237
 
5375
5238
        /* Skip any leading wildcard; it's already factored into initial sel */
5376
5239
        for (pos = 0; pos < pattlen; pos++)
5400
5263
        /* Could get sel > 1 if multiple wildcards */
5401
5264
        if (sel > 1.0)
5402
5265
                sel = 1.0;
5403
 
 
5404
 
        pfree(patt);
5405
5266
        return sel;
5406
5267
}
5407
5268
 
5408
5269
static Selectivity
5409
 
regex_selectivity_sub(char *patt, int pattlen, bool case_insensitive)
 
5270
regex_selectivity_sub(const char *patt, int pattlen, bool case_insensitive)
5410
5271
{
5411
5272
        Selectivity sel = 1.0;
5412
5273
        int                     paren_depth = 0;
5499
5360
}
5500
5361
 
5501
5362
static Selectivity
5502
 
regex_selectivity(Const *patt_const, bool case_insensitive)
 
5363
regex_selectivity(const char *patt, int pattlen, bool case_insensitive,
 
5364
                                  int fixed_prefix_len)
5503
5365
{
5504
5366
        Selectivity sel;
5505
 
        char       *patt;
5506
 
        int                     pattlen;
5507
 
        Oid                     typeid = patt_const->consttype;
5508
 
 
5509
 
        /*
5510
 
         * Should be unnecessary, there are no bytea regex operators defined. As
5511
 
         * such, it should be noted that the rest of this function has *not* been
5512
 
         * made safe for binary (possibly NULL containing) strings.
5513
 
         */
5514
 
        if (typeid == BYTEAOID)
5515
 
                ereport(ERROR,
5516
 
                                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
5517
 
                 errmsg("regular-expression matching not supported on type bytea")));
5518
 
 
5519
 
        /* the right-hand const is type text for all of these */
5520
 
        patt = TextDatumGetCString(patt_const->constvalue);
5521
 
        pattlen = strlen(patt);
5522
5367
 
5523
5368
        /* If patt doesn't end with $, consider it to have a trailing wildcard */
5524
5369
        if (pattlen > 0 && patt[pattlen - 1] == '$' &&
5532
5377
                /* no trailing $ */
5533
5378
                sel = regex_selectivity_sub(patt, pattlen, case_insensitive);
5534
5379
                sel *= FULL_WILDCARD_SEL;
5535
 
                if (sel > 1.0)
5536
 
                        sel = 1.0;
5537
5380
        }
 
5381
 
 
5382
        /* If there's a fixed prefix, discount its selectivity */
 
5383
        if (fixed_prefix_len > 0)
 
5384
                sel /= pow(FIXED_CHAR_SEL, fixed_prefix_len);
 
5385
 
 
5386
        /* Make sure result stays in range */
 
5387
        CLAMP_PROBABILITY(sel);
5538
5388
        return sel;
5539
5389
}
5540
5390
 
5541
 
static Selectivity
5542
 
pattern_selectivity(Const *patt, Pattern_Type ptype)
5543
 
{
5544
 
        Selectivity result;
5545
 
 
5546
 
        switch (ptype)
5547
 
        {
5548
 
                case Pattern_Type_Like:
5549
 
                        result = like_selectivity(patt, false);
5550
 
                        break;
5551
 
                case Pattern_Type_Like_IC:
5552
 
                        result = like_selectivity(patt, true);
5553
 
                        break;
5554
 
                case Pattern_Type_Regex:
5555
 
                        result = regex_selectivity(patt, false);
5556
 
                        break;
5557
 
                case Pattern_Type_Regex_IC:
5558
 
                        result = regex_selectivity(patt, true);
5559
 
                        break;
5560
 
                default:
5561
 
                        elog(ERROR, "unrecognized ptype: %d", (int) ptype);
5562
 
                        result = 1.0;           /* keep compiler quiet */
5563
 
                        break;
5564
 
        }
5565
 
        return result;
5566
 
}
5567
 
 
5568
5391
 
5569
5392
/*
5570
5393
 * Try to generate a string greater than the given string or any