PostgreSQL 11.13 (upcoming) commit log

Further refinement of stuck_on_old_timeline recovery test

commit   : c0a7587807dd9e0db5dbffb6d6b813a3e87d2519    
  
author   : Andrew Dunstan <andrew@dunslane.net>    
date     : Tue, 15 Jun 2021 15:30:11 -0400    
  
committer: Andrew Dunstan <andrew@dunslane.net>    
date     : Tue, 15 Jun 2021 15:30:11 -0400    

Click here for diff

TestLib::perl2host can take a file argument as well as a directory  
argument, so that code becomes substantially simpler. Also add comments  
on why we're using forward slashes, and why we're setting  
PERL_BADLANG=0.  
  
Discussion: https://postgr.es/m/e9947bcd-20ee-027c-f0fe-01f736b7e345@dunslane.net  

M src/test/recovery/t/025_stuck_on_old_timeline.pl

Fix decoding of speculative aborts.

commit   : 1f8a934e0a867cdf3223ea7520c69cd16eedd936    
  
author   : Amit Kapila <akapila@postgresql.org>    
date     : Tue, 15 Jun 2021 09:02:32 +0530    
  
committer: Amit Kapila <akapila@postgresql.org>    
date     : Tue, 15 Jun 2021 09:02:32 +0530    

Click here for diff

During decoding for speculative inserts, we were relying for cleaning  
toast hash on confirmation records or next change records. But that  
could lead to multiple problems (a) memory leak if there is neither a  
confirmation record nor any other record after toast insertion for a  
speculative insert in the transaction, (b) error and assertion failures  
if the next operation is not an insert/update on the same table.  
  
The fix is to start queuing spec abort change and clean up toast hash  
and change record during its processing. Currently, we are queuing the  
spec aborts for both toast and main table even though we perform cleanup  
while processing the main table's spec abort record. Later, if we have a  
way to distinguish between the spec abort record of toast and the main  
table, we can avoid queuing the change for spec aborts of toast tables.  
  
Reported-by: Ashutosh Bapat  
Author: Dilip Kumar  
Reviewed-by: Amit Kapila  
Backpatch-through: 9.6, where it was introduced  
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com  

M src/backend/replication/logical/decode.c
M src/backend/replication/logical/reorderbuffer.c
M src/include/replication/reorderbuffer.h

Work around portability issue with newer versions of mktime().

commit   : 73fa762417de19ec57fb1500cd415d89c28db91c    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Sun, 13 Jun 2021 14:32:42 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Sun, 13 Jun 2021 14:32:42 -0400    

Click here for diff

Recent glibc versions have made mktime() fail if tm_isdst is  
inconsistent with the prevailing timezone; in particular it fails for  
tm_isdst = 1 when the zone is UTC.  (This seems wildly inconsistent  
with the POSIX-mandated treatment of "incorrect" values for the other  
fields of struct tm, so if you ask me it's a bug, but I bet they'll  
say it's intentional.)  This has been observed to cause cosmetic  
problems when pg_restore'ing an archive created in a different  
timezone.  
  
To fix, do mktime() using the field values from the archive, and if  
that fails try again with tm_isdst = -1.  This will give a result  
that's off by the UTC-offset difference from the original zone, but  
that was true before, too.  It's not terribly critical since we don't  
do anything with the result except possibly print it.  (Someday we  
should flush this entire bit of logic and record a standard-format  
timestamp in the archive instead.  That's not okay for a back-patched  
bug fix, though.)  
  
Also, guard our only other use of mktime() by having initdb's  
build_time_t() set tm_isdst = -1 not 0.  This case could only have  
an issue in zones that are DST year-round; but I think some do exist,  
or could in future.  
  
Per report from Wells Oliver.  Back-patch to all supported  
versions, since any of them might need to run with a newer glibc.  
  
Discussion: https://postgr.es/m/CAOC+FBWDhDHO7G-i1_n_hjRzCnUeFO+H-Czi1y10mFhRWpBrew@mail.gmail.com  

M src/bin/initdb/findtimezone.c
M src/bin/pg_dump/pg_backup_archiver.c

Further tweaks to stuck_on_old_timeline recovery test

commit   : 8cb3d95c209cabe2127478c1f80002326defa81b    
  
author   : Andrew Dunstan <andrew@dunslane.net>    
date     : Sun, 13 Jun 2021 07:10:41 -0400    
  
committer: Andrew Dunstan <andrew@dunslane.net>    
date     : Sun, 13 Jun 2021 07:10:41 -0400    

Click here for diff

Translate path slashes on target directory path. This was confusing old  
branches, but is applied to all branches for the sake of uniformity.  
Perl is perfectly able to understand paths with forward slashes.  
  
Along the way, restore the previous archive_wait query, for the sake of  
uniformity with other tests, per gripe from Tom Lane.  

M src/test/recovery/t/025_stuck_on_old_timeline.pl

Ignore more environment variables in pg_regress.c

commit   : 15cb0bf4714566721185fdea32d4dd5f8f706aa2    
  
author   : Michael Paquier <michael@paquier.xyz>    
date     : Sun, 13 Jun 2021 20:07:54 +0900    
  
committer: Michael Paquier <michael@paquier.xyz>    
date     : Sun, 13 Jun 2021 20:07:54 +0900    

Click here for diff

This is similar to the work done in 8279f68 for TestLib.pm, where  
environment variables set may cause unwanted failures if using a  
temporary installation with pg_regress.  The list of variables reset is  
adjusted in each stable branch depending on what is supported.  
  
Comments are added to remember that the lists in TestLib.pm and  
pg_regress.c had better be kept in sync.  
  
Reviewed-by: Álvaro Herrera  
Discussion: https://postgr.es/m/YMNR9GYDn+fHlMta@paquier.xyz  
Backpatch-through: 9.6  

M src/test/perl/TestLib.pm
M src/test/regress/pg_regress.c

Restore robustness of TAP tests that wait for postmaster restart.

commit   : b170ca021c6ae09e65fb2fd9d8ff8a39b382d427    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Sat, 12 Jun 2021 15:12:10 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Sat, 12 Jun 2021 15:12:10 -0400    

Click here for diff

Several TAP tests use poll_query_until() to wait for the postmaster  
to restart.  They were checking to see if a trivial query  
(e.g. "SELECT 1") succeeds.  However, that's problematic in the wake  
of commit 11e9caff8, because now that we feed said query to psql  
via stdin, we risk IPC::Run whining about a SIGPIPE failure if psql  
quits before reading the query.  Hence, we can't use a nonempty  
query in cases where we need to wait for connection failures to  
stop happening.  
  
Per the precedent of commits c757a3da0 and 6d41dd045, we can pass  
"undef" as the query in such cases to ensure that IPC::Run has  
nothing to write.  However, then we have to say that the expected  
output is empty, and this exposes a deficiency in poll_query_until:  
if psql fails altogether and returns empty stdout, poll_query_until  
will treat that as a success!  That's because, contrary to its  
documentation, it makes no actual check for psql failure, looking  
neither at the exit status nor at stderr.  
  
To fix that, adjust poll_query_until to insist on empty stderr as  
well as a stdout match.  (I experimented with checking exit status  
instead, but it seems that psql often does exit(1) in cases that we  
need to consider successes.  That might be something to fix someday,  
but it would be a non-back-patchable behavior change.)  
  
Back-patch to v10.  The test cases needing this exist only as far  
back as v11, but it seems wise to keep poll_query_until's behavior  
the same in v10, in case we back-patch another such test case in  
future.  (9.6 does not currently need this change, because in that  
branch poll_query_until can't be told to accept empty stdout as  
a success case.)  
  
Per assorted buildfarm failures, mostly on hoverfly.  
  
Discussion: https://postgr.es/m/CAA4eK1+zM6L4QSA1XMvXY_qqWwdUmqkOS1+hWvL8QcYEBGA1Uw@mail.gmail.com  

M src/test/perl/PostgresNode.pm
M src/test/recovery/t/013_crash_restart.pl

Ensure pg_filenode_relation(0, 0) returns NULL.

commit   : 25d1ef1aaf6d104fa0b315e2f5f9840f0b164334    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Sat, 12 Jun 2021 13:29:24 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Sat, 12 Jun 2021 13:29:24 -0400    

Click here for diff

Previously, a zero value for the relfilenode resulted in  
a confusing error message about "unexpected duplicate".  
This function returns NULL for other invalid relfilenode  
values, so zero should be treated likewise.  
  
It's been like this all along, so back-patch to all supported  
branches.  
  
Justin Pryzby  
  
Discussion: https://postgr.es/m/20210612023324.GT16435@telsasoft.com  

M src/backend/utils/adt/dbsize.c

Don't use Asserts to check for violations of replication protocol.

commit   : 9eecea7f373ab6bdff61d31666d6d3b0c435763f    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Sat, 12 Jun 2021 12:59:15 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Sat, 12 Jun 2021 12:59:15 -0400    

Click here for diff

Using an Assert to check the validity of incoming messages is an  
extremely poor decision.  In a debug build, it should not be that easy  
for a broken or malicious remote client to crash the logrep worker.  
The consequences could be even worse in non-debug builds, which will  
fail to make such checks at all, leading to who-knows-what misbehavior.  
Hence, promote every Assert that could possibly be triggered by wrong  
or out-of-order replication messages to a full test-and-ereport.  
  
To avoid bloating the set of messages the translation team has to cope  
with, establish a policy that replication protocol violation error  
reports don't need to be translated.  Hence, all the new messages here  
use errmsg_internal().  A couple of old messages are changed likewise  
for consistency.  
  
Along the way, fix some non-idiomatic or outright wrong uses of  
hash_search().  
  
Most of these mistakes are new with the "streaming replication"  
patch (commit 464824323), but a couple go back a long way.  
Back-patch as appropriate.  
  
Discussion: https://postgr.es/m/1719083.1623351052@sss.pgh.pa.us  

M src/backend/replication/logical/reorderbuffer.c
M src/backend/replication/logical/worker.c

Fix new recovery test for use under msys

commit   : 8b9e1275c5e38f2661da8c09bc00e5d4ba9fbad5    
  
author   : Andrew Dunstan <andrew@dunslane.net>    
date     : Sat, 12 Jun 2021 08:37:16 -0400    
  
committer: Andrew Dunstan <andrew@dunslane.net>    
date     : Sat, 12 Jun 2021 08:37:16 -0400    

Click here for diff

Commit caba8f0d43 wasn't quite right for msys, as demonstrated by  
several buildfarm animals, including jacana and fairywren. We need to  
use the msys perl in the archive command, but call it in such a way that  
Windows will understand the path. Furthermore, inside the copy script we  
need to convert a Windows path to an msys path.  

M src/test/recovery/t/025_stuck_on_old_timeline.pl
M src/test/recovery/t/cp_history_files

Remove PGSSLCRLDIR from the list of variables ignored in TAP tests

commit   : a04eb75dec8abb0ce2eb2d072e7f27cfde85d857    
  
author   : Michael Paquier <michael@paquier.xyz>    
date     : Sat, 12 Jun 2021 10:39:33 +0900    
  
committer: Michael Paquier <michael@paquier.xyz>    
date     : Sat, 12 Jun 2021 10:39:33 +0900    

Click here for diff

This variable was present in the list added by 9d660670, but it is not  
supported by this branch.  Issue noticed while diving into a similar  
change for pg_regress.c.  
  
Backpatch-through: 9.6  

M src/test/perl/TestLib.pm

Rearrange logrep worker's snapshot handling some more.

commit   : eea081ad013d657a2a7857bd591cfcb559c9a94d    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Thu, 10 Jun 2021 12:27:27 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Thu, 10 Jun 2021 12:27:27 -0400    

Click here for diff

It turns out that worker.c's code path for TRUNCATE was also  
careless about establishing a snapshot while executing user-defined  
code, allowing the checks added by commit 84f5c2908 to fail when  
a trigger is fired in that context.  
  
We could just wrap Push/PopActiveSnapshot around the truncate call,  
but it seems better to establish a policy of holding a snapshot  
throughout execution of a replication step.  To help with that and  
possible future requirements, replace the previous ensure_transaction  
calls with pairs of begin/end_replication_step calls.  
  
Per report from Mark Dilger.  Back-patch to v11, like the previous  
changes.  
  
Discussion: https://postgr.es/m/B4A3AF82-79ED-4F4C-A4E5-CD2622098972@enterprisedb.com  

M src/backend/replication/logical/worker.c

Adjust new test case to set wal_keep_segments.

commit   : 534b9be805b4eac83d059c577021ec57df896c08    
  
author   : Robert Haas <rhaas@postgresql.org>    
date     : Thu, 10 Jun 2021 09:43:35 -0400    
  
committer: Robert Haas <rhaas@postgresql.org>    
date     : Thu, 10 Jun 2021 09:43:35 -0400    

Click here for diff

Per buildfarm member conchuela and Kyotaro Horiguchi, it's possible  
for the WAL segment that the cascading standby needs to be removed  
too quickly. Hopefully this will prevent that.  
  
Kyotaro Horiguchi  
  
Discussion: http://postgr.es/m/20210610.101240.1270925505780628275.horikyota.ntt@gmail.com  

M src/test/recovery/t/025_stuck_on_old_timeline.pl

Fix corner case failure of new standby to follow new primary.

commit   : ca158c168ea381ac54aab3e7a3c1239747ee4a7f    
  
author   : Robert Haas <rhaas@postgresql.org>    
date     : Wed, 9 Jun 2021 16:20:10 -0400    
  
committer: Robert Haas <rhaas@postgresql.org>    
date     : Wed, 9 Jun 2021 16:20:10 -0400    

Click here for diff

This only happens if (1) the new standby has no WAL available locally,  
(2) the new standby is starting from the old timeline, (3) the promotion  
happened in the WAL segment from which the new standby is starting,  
(4) the timeline history file for the new timeline is available from  
the archive but the WAL files for are not (i.e. this is a race),  
(5) the WAL files for the new timeline are available via streaming,  
and (6) recovery_target_timeline='latest'.  
  
Commit ee994272ca50f70b53074f0febaec97e28f83c4e introduced this  
logic and was an improvement over the previous code, but it mishandled  
this case. If recovery_target_timeline='latest' and restore_command is  
set, validateRecoveryParameters() can change recoveryTargetTLI to be  
different from receiveTLI. If streaming is then tried afterward,  
expectedTLEs gets initialized with the history of the wrong timeline.  
It's supposed to be a list of entries explaining how to get to the  
target timeline, but in this case it ends up with a list of entries  
explaining how to get to the new standby's original timeline, which  
isn't right.  
  
Dilip Kumar and Robert Haas, reviewed by Kyotaro Horiguchi.  
  
Discussion: http://postgr.es/m/CAFiTN-sE-jr=LB8jQuxeqikd-Ux+jHiXyh4YDiZMPedgQKup0g@mail.gmail.com  

M src/backend/access/transam/xlog.c
A src/test/recovery/t/025_stuck_on_old_timeline.pl
A src/test/recovery/t/cp_history_files

Allow PostgresNode.pm's backup method to accept backup_options.

commit   : 38982b8b7bd9372ce247780021c93868cb5cb7b7    
  
author   : Robert Haas <rhaas@postgresql.org>    
date     : Wed, 9 Jun 2021 12:30:28 -0400    
  
committer: Robert Haas <rhaas@postgresql.org>    
date     : Wed, 9 Jun 2021 12:30:28 -0400    

Click here for diff

Partial back-port of commit 081876d75ea15c3bd2ee5ba64a794fd8ea46d794.  
A test case for a pending bug fix needs this capability, but the code  
on 9.6 is significantly different, so I'm only back-patching this  
change as far as v10. We'll have to work around the problem another  
way in v9.6.  
  
Discussion: http://postgr.es/m/CAFiTN-tcivNvL0Rg6rD7_CErNfE75H7+gh9WbMxjbgsattja1Q@mail.gmail.com  

M src/test/perl/PostgresNode.pm

Fix inconsistencies in psql --help=commands

commit   : f0f879e106d5294ed7f6cc2ff14f9985d8ecdaa1    
  
author   : Michael Paquier <michael@paquier.xyz>    
date     : Wed, 9 Jun 2021 16:26:02 +0900    
  
committer: Michael Paquier <michael@paquier.xyz>    
date     : Wed, 9 Jun 2021 16:26:02 +0900    

Click here for diff

The set of subcommands supported by \dAp, \do and \dy was described  
incorrectly in psql's --help.  The documentation was already consistent  
with the code.  
  
Reported-by: inoas, from IRC  
Author: Matthijs van der Vleuten  
Reviewed-by: Neil Chen  
Discussion: https://postgr.es/m/6a984e24-2171-4039-9050-92d55e7b23fe@www.fastmail.com  
Backpatch-through: 9.6  

M src/bin/psql/help.c

Force NO SCROLL for plpgsql's implicit cursors.

commit   : 5b7bf9f72ab5dc77c081d3af3e6c3b50a6dbe872    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Tue, 8 Jun 2021 18:40:06 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Tue, 8 Jun 2021 18:40:06 -0400    

Click here for diff

Further thought about bug #17050 suggests that it's a good idea  
to use CURSOR_OPT_NO_SCROLL for the implicit cursor opened by  
a plpgsql FOR-over-query loop.  This ensures that, if somebody  
commits inside the loop, PersistHoldablePortal won't try to  
rewind and re-read the cursor.  While we'd have selected NO_SCROLL  
anyway if FOR UPDATE/SHARE appears in the query, there are other  
hazards with volatile functions; and in any case, it's silly to  
expend effort storing rows that we know for certain won't be needed.  
  
(While here, improve the comment in exec_run_select, which was a bit  
confused about the rationale for when we can use parallel mode.  
Cursor operations aren't a hazard for nameless portals.)  
  
This wasn't an issue until v11, which introduced the possibility  
of persisting such cursors.  Hence, back-patch to v11.  
  
Per bug #17050 from Алексей Булгаков.  
  
Discussion: https://postgr.es/m/17050-f77aa827dc85247c@postgresql.org  

M src/pl/plpgsql/src/pl_exec.c

Avoid misbehavior when persisting a non-stable cursor.

commit   : 2757865fa7fb213eb83088927d5c8d5a9425e555    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Tue, 8 Jun 2021 17:50:15 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Tue, 8 Jun 2021 17:50:15 -0400    

Click here for diff

PersistHoldablePortal has long assumed that it should store the  
entire output of the query-to-be-persisted, which requires rewinding  
and re-reading the output.  This is problematic if the query is not  
stable: we might get different row contents, or even a different  
number of rows, which'd confuse the cursor state mightily.  
  
In the case where the cursor is NO SCROLL, this is very easy to  
solve: just store the remaining query output, without any rewinding,  
and tweak the portal's cursor state to match.  Aside from removing  
the semantic problem, this could be significantly more efficient  
than storing the whole output.  
  
If the cursor is scrollable, there's not much we can do, but it  
was already the case that scrolling a volatile query's result was  
pretty unsafe.  We can just document more clearly that getting  
correct results from that is not guaranteed.  
  
There are already prohibitions in place on using SCROLL with  
FOR UPDATE/SHARE, which is one way for a SELECT query to have  
non-stable results.  We could imagine prohibiting SCROLL when  
the query contains volatile functions, but that would be  
expensive to enforce.  Moreover, it could break applications  
that work just fine, if they have functions that are in fact  
stable but the user neglected to mark them so.  So settle for  
documenting the hazard.  
  
While this problem has existed in some guise for a long time,  
it got a lot worse in v11, which introduced the possibility  
of persisting plpgsql cursors (perhaps implicit ones) even  
when they violate the rules for what can be marked WITH HOLD.  
Hence, I've chosen to back-patch to v11 but not further.  
  
Per bug #17050 from Алексей Булгаков.  
  
Discussion: https://postgr.es/m/17050-f77aa827dc85247c@postgresql.org  

M doc/src/sgml/plpgsql.sgml
M doc/src/sgml/ref/declare.sgml
M src/backend/commands/portalcmds.c
M src/pl/plpgsql/src/expected/plpgsql_transaction.out
M src/pl/plpgsql/src/sql/plpgsql_transaction.sql

Fix contrib/seg regression test in v11.

commit   : 9c3844bbcad96179b2f0ffb34e87e762f116d303    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Tue, 8 Jun 2021 10:57:39 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Tue, 8 Jun 2021 10:57:39 -0400    

Click here for diff

In 20e36c6d2, I neglected to update seg_1.out (which doesn't  
exist in newer branches).  
  
Per buildfarm, via Andrew Dunstan.  
  
Discussion: https://postgr.es/m/7611b36f-7d57-5234-ad68-7d25a81e0f13@dunslane.net  

M contrib/seg/expected/seg_1.out

Stabilize contrib/seg regression test.

commit   : 20e36c6d203463af5bd678d10459f61ad9047b95    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Mon, 7 Jun 2021 14:52:42 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Mon, 7 Jun 2021 14:52:42 -0400    

Click here for diff

If autovacuum comes along just after we fill table test_seg with  
some data, it will update the stats to the point where we prefer  
a plain indexscan over a bitmap scan, breaking the expected  
output (as well as the point of the test case).  To fix, just  
force a bitmap scan to be chosen here.  
  
This has evidently been wrong since commit de1d042f5.  It's not  
clear why we just recently saw any buildfarm failures due to it;  
but prairiedog has failed twice on this test in the past week.  
Hence, backpatch to v11 where this test case came in.  

M contrib/seg/expected/seg.out
M contrib/seg/sql/seg.sql

Fix incautious handling of possibly-miscoded strings in client code.

commit   : 89a5499ef9e477beb97d742d4df6fc8f601d87d5    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Mon, 7 Jun 2021 14:15:25 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Mon, 7 Jun 2021 14:15:25 -0400    

Click here for diff

An incorrectly-encoded multibyte character near the end of a string  
could cause various processing loops to run past the string's  
terminating NUL, with results ranging from no detectable issue to  
a program crash, depending on what happens to be in the following  
memory.  
  
This isn't an issue in the server, because we take care to verify  
the encoding of strings before doing any interesting processing  
on them.  However, that lack of care leaked into client-side code  
which shouldn't assume that anyone has validated the encoding of  
its input.  
  
Although this is certainly a bug worth fixing, the PG security team  
elected not to regard it as a security issue, primarily because  
any untrusted text should be sanitized by PQescapeLiteral or  
the like before being incorporated into a SQL or psql command.  
(If an app fails to do so, the same technique can be used to  
cause SQL injection, with probably much more dire consequences  
than a mere client-program crash.)  Those functions were already  
made proof against this class of problem, cf CVE-2006-2313.  
  
To fix, invent PQmblenBounded() which is like PQmblen() except it  
won't return more than the number of bytes remaining in the string.  
In HEAD we can make this a new libpq function, as PQmblen() is.  
It seems imprudent to change libpq's API in stable branches though,  
so in the back branches define PQmblenBounded as a macro in the files  
that need it.  (Note that just changing PQmblen's behavior would not  
be a good idea; notably, it would completely break the escaping  
functions' defense against this exact problem.  So we just want a  
version for those callers that don't have any better way of handling  
this issue.)  
  
Per private report from houjingyi.  Back-patch to all supported branches.  

M src/bin/psql/common.c
M src/bin/psql/psqlscanslash.l
M src/bin/psql/stringutils.c
M src/bin/psql/tab-complete.c
M src/bin/scripts/common.c
M src/fe_utils/print.c
M src/interfaces/libpq/fe-print.c
M src/interfaces/libpq/fe-protocol3.c

In PostgresNode.pm, don't pass SQL to psql on the command line

commit   : b5bd1351fd7b967259cd505199422c28ed2ccd98    
  
author   : Andrew Dunstan <andrew@dunslane.net>    
date     : Thu, 3 Jun 2021 16:08:33 -0400    
  
committer: Andrew Dunstan <andrew@dunslane.net>    
date     : Thu, 3 Jun 2021 16:08:33 -0400    

Click here for diff

The Msys shell mangles certain patterns in its command line, so avoid  
handing arbitrary SQL to psql on the command line and instead use  
IPC::Run's redirection facility for stdin. This pattern is already  
mostly whats used, but query_poll_until() was not doing the right thing.  
  
Problem discovered on the buildfarm when a new TAP test failed on msys.  

M src/test/perl/PostgresNode.pm

Reduce risks of conflicts in internal queries of REFRESH MATVIEW CONCURRENTLY

commit   : dbc9dbba56529a915effa6078f9303343aedf3f1    
  
author   : Michael Paquier <michael@paquier.xyz>    
date     : Thu, 3 Jun 2021 15:28:45 +0900    
  
committer: Michael Paquier <michael@paquier.xyz>    
date     : Thu, 3 Jun 2021 15:28:45 +0900    

Click here for diff

The internal SQL queries used by REFRESH MATERIALIZED VIEW CONCURRENTLY  
include some aliases for its diff and temporary relations with  
rather-generic names: diff, newdata, newdata2 and mv.  Depending on the  
queries used for the materialized view, using CONCURRENTLY could lead to  
some internal failures if the query and those internal aliases conflict.  
  
Those names have been chosen in 841c29c8.  This commit switches instead  
to a naming pattern which is less likely going to cause conflicts, based  
on an idea from Thomas Munro, by appending _$ to those aliases.  This is  
not perfect as those new names could still conflict, but at least it has  
the advantage to keep the code readable and simple while reducing the  
likelihood of conflicts to be close to zero.  
  
Reported-by: Mathis Rudolf  
Author: Bharath Rupireddy  
Reviewed-by: Bernd Helmle, Thomas Munro, Michael Paquier  
Discussion: https://postgr.es/m/109c267a-10d2-3c53-b60e-720fcf44d9e8@credativ.de  
Backpatch-through: 9.6  

M src/backend/commands/matview.c

Ignore more environment variables in TAP tests

commit   : 36ac9f6af23039b334729b44a1751f1207f83f0e    
  
author   : Michael Paquier <michael@paquier.xyz>    
date     : Thu, 3 Jun 2021 11:51:51 +0900    
  
committer: Michael Paquier <michael@paquier.xyz>    
date     : Thu, 3 Jun 2021 11:51:51 +0900    

Click here for diff

Various environment variables were not getting reset in the TAP tests,  
which would cause failures depending on the tests or the environment  
variables involved.  For example, PGSSL{MAX,MIN}PROTOCOLVERSION could  
cause failures in the SSL tests.  Even worse, a junk value of  
PGCLIENTENCODING makes a server startup fail.  The list of variables  
reset is adjusted in each stable branch depending on what is supported.  
  
While on it, simplify a bit the code per a suggestion from Andrew  
Dunstan, using a list of variables instead of doing single deletions.  
  
Reviewed-by: Andrew Dunstan, Daniel Gustafsson  
Discussion: https://postgr.es/m/YLbjjRpucIeZ78VQ@paquier.xyz  
Backpatch-through: 9.6  

M src/test/perl/TestLib.pm

Reject SELECT ... GROUP BY GROUPING SETS (()) FOR UPDATE.

commit   : dc272157a8aa1764aa92c013a2813d300f6ca32a    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Tue, 1 Jun 2021 11:12:56 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Tue, 1 Jun 2021 11:12:56 -0400    

Click here for diff

This case should be disallowed, just as FOR UPDATE with a plain  
GROUP BY is disallowed; FOR UPDATE only makes sense when each row  
of the query result can be identified with a single table row.  
However, we missed teaching CheckSelectLocking() to check  
groupingSets as well as groupClause, so that it would allow  
degenerate grouping sets.  That resulted in a bad plan and  
a null-pointer dereference in the executor.  
  
Looking around for other instances of the same bug, the only one  
I found was in examine_simple_variable().  That'd just lead to  
silly estimates, but it should be fixed too.  
  
Per private report from Yaoguang Chen.  
Back-patch to all supported branches.  

M src/backend/parser/analyze.c
M src/backend/utils/adt/selfuncs.c
M src/test/regress/expected/errors.out
M src/test/regress/sql/errors.sql

Fix mis-planning of repeated application of a projection.

commit   : fe194f731fbf6ea0d8c08bdd41edd0a2db52a643    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Mon, 31 May 2021 12:03:00 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Mon, 31 May 2021 12:03:00 -0400    

Click here for diff

create_projection_plan contains a hidden assumption (here made  
explicit by an Assert) that a projection-capable Path will yield a  
projection-capable Plan.  Unfortunately, that assumption is violated  
only a few lines away, by create_projection_plan itself.  This means  
that two stacked ProjectionPaths can yield an outcome where we try to  
jam the upper path's tlist into a non-projection-capable child node,  
resulting in an invalid plan.  
  
There isn't any good reason to have stacked ProjectionPaths; indeed the  
whole concept is faulty, since the set of Vars/Aggs/etc needed by the  
upper one wouldn't necessarily be available in the output of the lower  
one, nor could the lower one create such values if they weren't  
available from its input.  Hence, we can fix this by adjusting  
create_projection_path to strip any top-level ProjectionPath from the  
subpath it's given.  (This amounts to saying "oh, we changed our  
minds about what we need to project here".)  
  
The test case added here only fails in v13 and HEAD; before that, we  
don't attempt to shove the Sort into the parallel part of the plan,  
for reasons that aren't entirely clear to me.  However, all the  
directly-related code looks generally the same as far back as v11,  
where the hazard was introduced (by d7c19e62a).  So I've got no faith  
that the same type of bug doesn't exist in v11 and v12, given the  
right test case.  Hence, back-patch the code changes, but not the  
irrelevant test case, into those branches.  
  
Per report from Bas Poot.  
  
Discussion: https://postgr.es/m/534fca83789c4a378c7de379e9067d4f@politie.nl  

M src/backend/optimizer/plan/createplan.c
M src/backend/optimizer/util/pathnode.c

Raise a timeout to 180s, in test 010_logical_decoding_timelines.pl.

commit   : b6ca46e12d5962f316cd937efb490289362f0bad    
  
author   : Noah Misch <noah@leadboat.com>    
date     : Mon, 31 May 2021 00:29:58 -0700    
  
committer: Noah Misch <noah@leadboat.com>    
date     : Mon, 31 May 2021 00:29:58 -0700    

Click here for diff

Per buildfarm member hornet.  Also, update Pod documentation showing the  
lower value.  Back-patch to v10, where the test first appeared.  

M src/test/perl/PostgresNode.pm
M src/test/recovery/t/010_logical_decoding_timelines.pl

Fix race condition when sharing tuple descriptors.

commit   : a15d84470db8f42d3850a007dfa166d3fce5bc66    
  
author   : Thomas Munro <tmunro@postgresql.org>    
date     : Sat, 29 May 2021 14:48:15 +1200    
  
committer: Thomas Munro <tmunro@postgresql.org>    
date     : Sat, 29 May 2021 14:48:15 +1200    

Click here for diff

Parallel query processes that called BlessTupleDesc() for identical  
tuple descriptors at the same moment could crash.  There was code to  
handle that rare case, but it dereferenced a bogus DSA pointer.  Repair.  
  
Back-patch to 11, where commit cc5f8136 added support for sharing tuple  
descriptors in parallel queries.  
  
Reported-by: Eric Thinnes <e.thinnes@gmx.de>  
Discussion: https://postgr.es/m/99aaa2eb-e194-bf07-c29a-1a76b4f2bcf9%40gmx.de  

M src/backend/utils/cache/typcache.c

fix syntax error

commit   : a44a2b9acfdcff3ba12eea7b6c10b8c1b6182e3f    
  
author   : Andrew Dunstan <andrew@dunslane.net>    
date     : Fri, 28 May 2021 09:35:11 -0400    
  
committer: Andrew Dunstan <andrew@dunslane.net>    
date     : Fri, 28 May 2021 09:35:11 -0400    

Click here for diff

M src/tools/msvc/Solution.pm

Report configured port in MSVC built pg_config

commit   : 3f70d7e441b36f49a5a0a0c67e14c2db24820191    
  
author   : Andrew Dunstan <andrew@dunslane.net>    
date     : Fri, 28 May 2021 09:26:30 -0400    
  
committer: Andrew Dunstan <andrew@dunslane.net>    
date     : Fri, 28 May 2021 09:26:30 -0400    

Click here for diff

This is a long standing omission, discovered when trying to write code  
that relied on it.  
  
Backpatch to all live branches.  

M src/tools/msvc/Solution.pm

Fix MSVC scripts when building with GSSAPI/Kerberos

commit   : cf2e09544d92ed479aa0b1d078b3f8665dcfdcea    
  
author   : Michael Paquier <michael@paquier.xyz>    
date     : Thu, 27 May 2021 20:11:29 +0900    
  
committer: Michael Paquier <michael@paquier.xyz>    
date     : Thu, 27 May 2021 20:11:29 +0900    

Click here for diff

The deliverables of upstream Kerberos on Windows are installed with  
paths that do not match our MSVC scripts.  First, the include folder was  
named "inc/" in our scripts, but the upstream MSIs use "include/".  
Second, the build would fail with 64-bit environments as the libraries  
are named differently.  
  
This commit adjusts the MSVC scripts to be compatible with the latest  
installations of upstream, and I have checked that the compilation was  
able to work with the 32-bit and 64-bit installations.  
  
Special thanks to Kondo Yuta for the help in investigating the situation  
in hamerkop, which had an incorrect configuration for the GSS  
compilation.  
  
Reported-by: Brian Ye  
Discussion: https://postgr.es/m/162128202219.27274.12616756784952017465@wrigleys.postgresql.org  
Backpatch-through: 9.6  

M src/tools/msvc/Solution.pm

doc: Fix description of some GUCs in docs and postgresql.conf.sample

commit   : b1a1f775bc90e53ff9970c2087c239c342c2e7e7    
  
author   : Michael Paquier <michael@paquier.xyz>    
date     : Thu, 27 May 2021 14:58:15 +0900    
  
committer: Michael Paquier <michael@paquier.xyz>    
date     : Thu, 27 May 2021 14:58:15 +0900    

Click here for diff

The following parameters have been imprecise, or incorrect, about their  
description (PGC_POSTMASTER or PGC_SIGHUP):  
- autovacuum_work_mem (docs, as of 9.6~)  
- huge_page_size (docs, as of 14~)  
- max_logical_replication_workers (docs, as of 10~)  
- max_sync_workers_per_subscription (docs, as of 10~)  
- min_dynamic_shared_memory (docs, as of 14~)  
- recovery_init_sync_method (postgresql.conf.sample, as of 14~)  
- remove_temp_files_after_crash (docs, as of 14~)  
- restart_after_crash (docs, as of 9.6~)  
- ssl_min_protocol_version (docs, as of 12~)  
- ssl_max_protocol_version (docs, as of 12~)  
  
This commit adjusts the description of all these parameters to be more  
consistent with the practice used for the others.  
  
Revewed-by: Justin Pryzby  
Discussion: https://postgr.es/m/YK2ltuLpe+FbRXzA@paquier.xyz  
Backpatch-through: 9.6  

M doc/src/sgml/config.sgml

Disallow SSL renegotiation

commit   : 96918b76fb059126f55f3099f4bcc50de1ed6150    
  
author   : Michael Paquier <michael@paquier.xyz>    
date     : Tue, 25 May 2021 10:11:21 +0900    
  
committer: Michael Paquier <michael@paquier.xyz>    
date     : Tue, 25 May 2021 10:11:21 +0900    

Click here for diff

SSL renegotiation is already disabled as of 48d23c72, however this does  
not prevent the server to comply with a client willing to use  
renegotiation.  In the last couple of years, renegotiation had its set  
of security issues and flaws (like the recent CVE-2021-3449), and it  
could be possible to crash the backend with a client attempting  
renegotiation.  
  
This commit takes one extra step by disabling renegotiation in the  
backend in the same way as SSL compression (f9264d15) or tickets  
(97d3a0b0).  OpenSSL 1.1.0h has added an option named  
SSL_OP_NO_RENEGOTIATION able to achieve that.  In older versions  
there is an option called SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS that  
was undocumented, and could be set within the SSL object created when  
the TLS connection opens, but I have decided not to use it, as it feels  
trickier to rely on, and it is not official.  Note that this option is  
not usable in OpenSSL < 1.1.0h as the internal contents of the *SSL  
object are hidden to applications.  
  
SSL renegotiation concerns protocols up to TLSv1.2.  
  
Per original report from Robert Haas, with a patch based on a suggestion  
by Andres Freund.  
  
Author: Michael Paquier  
Reviewed-by: Daniel Gustafsson  
Discussion: https://postgr.es/m/YKZBXx7RhU74FlTE@paquier.xyz  
Backpatch-through: 9.6  

M src/backend/libpq/be-secure-openssl.c

Restore the portal-level snapshot after procedure COMMIT/ROLLBACK.

commit   : ef94805096229ee3573624465a76ca11d2bd8529    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Fri, 21 May 2021 14:03:53 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Fri, 21 May 2021 14:03:53 -0400    

Click here for diff

COMMIT/ROLLBACK necessarily destroys all snapshots within the session.  
The original implementation of intra-procedure transactions just  
cavalierly did that, ignoring the fact that this left us executing in  
a rather different environment than normal.  In particular, it turns  
out that handling of toasted datums depends rather critically on there  
being an outer ActiveSnapshot: otherwise, when SPI or the core  
executor pop whatever snapshot they used and return, it's unsafe to  
dereference any toasted datums that may appear in the query result.  
It's possible to demonstrate "no known snapshots" and "missing chunk  
number N for toast value" errors as a result of this oversight.  
  
Historically this outer snapshot has been held by the Portal code,  
and that seems like a good plan to preserve.  So add infrastructure  
to pquery.c to allow re-establishing the Portal-owned snapshot if it's  
not there anymore, and add enough bookkeeping support that we can tell  
whether it is or not.  
  
We can't, however, just re-establish the Portal snapshot as part of  
COMMIT/ROLLBACK.  As in normal transaction start, acquiring the first  
snapshot should wait until after SET and LOCK commands.  Hence, teach  
spi.c about doing this at the right time.  (Note that this patch  
doesn't fix the problem for any PLs that try to run intra-procedure  
transactions without using SPI to execute SQL commands.)  
  
This makes SPI's no_snapshots parameter rather a misnomer, so in HEAD,  
rename that to allow_nonatomic.  
  
replication/logical/worker.c also needs some fixes, because it wasn't  
careful to hold a snapshot open around AFTER trigger execution.  
That code doesn't use a Portal, which I suspect someday we're gonna  
have to fix.  But for now, just rearrange the order of operations.  
This includes back-patching the recent addition of finish_estate()  
to centralize the cleanup logic there.  
  
This also back-patches commit 2ecfeda3e into v13, to improve the  
test coverage for worker.c (it was that test that exposed that  
worker.c's snapshot management is wrong).  
  
Per bug #15990 from Andreas Wicht.  Back-patch to v11 where  
intra-procedure COMMIT was added.  
  
Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org  

M src/backend/commands/functioncmds.c
M src/backend/executor/spi.c
M src/backend/replication/logical/worker.c
M src/backend/tcop/pquery.c
M src/backend/utils/mmgr/portalmem.c
M src/include/executor/spi_priv.h
M src/include/tcop/pquery.h
M src/include/utils/portal.h
M src/pl/plpgsql/src/pl_exec.c
M src/test/isolation/expected/plpgsql-toast.out
M src/test/isolation/specs/plpgsql-toast.spec

Fix deadlock for multiple replicating truncates of the same table.

commit   : 71787b23e3dc5510c5bb167abc8a086756811841    
  
author   : Amit Kapila <akapila@postgresql.org>    
date     : Fri, 21 May 2021 08:31:10 +0530    
  
committer: Amit Kapila <akapila@postgresql.org>    
date     : Fri, 21 May 2021 08:31:10 +0530    

Click here for diff

While applying the truncate change, the logical apply worker acquires  
RowExclusiveLock on the relation being truncated. This allowed truncate on  
the relation at a time by two apply workers which lead to a deadlock. The  
reason was that one of the workers after updating the pg_class tuple tries  
to acquire SHARE lock on the relation and started to wait for the second  
worker which has acquired RowExclusiveLock on the relation. And when the  
second worker tries to update the pg_class tuple, it starts to wait for  
the first worker which leads to a deadlock. Fix it by acquiring  
AccessExclusiveLock on the relation before applying the truncate change as  
we do for normal truncate operation.  
  
Author: Peter Smith, test case by Haiying Tang  
Reviewed-by: Dilip Kumar, Amit Kapila  
Backpatch-through: 11  
Discussion: https://postgr.es/m/CAHut+PsNm43p0jM+idTvWwiGZPcP0hGrHMPK9TOAkc+a4UpUqw@mail.gmail.com  

M src/backend/replication/logical/worker.c
M src/test/subscription/t/010_truncate.pl

Avoid detoasting failure after COMMIT inside a plpgsql FOR loop.

commit   : 0c1b2cb17c2558e4fad897ded1818e465f370973    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Thu, 20 May 2021 18:32:37 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Thu, 20 May 2021 18:32:37 -0400    

Click here for diff

exec_for_query() normally tries to prefetch a few rows at a time  
from the query being iterated over, so as to reduce executor  
entry/exit overhead.  Unfortunately this is unsafe if we have  
COMMIT or ROLLBACK within the loop, because there might be  
TOAST references in the data that we prefetched but haven't  
yet examined.  Immediately after the COMMIT/ROLLBACK, we have  
no snapshots in the session, meaning that VACUUM is at liberty  
to remove recently-deleted TOAST rows.  
  
This was originally reported as a case triggering the "no known  
snapshots" error in init_toast_snapshot(), but even if you miss  
hitting that, you can get "missing toast chunk", as illustrated  
by the added isolation test case.  
  
To fix, just disable prefetching in non-atomic contexts.  Maybe  
there will be performance complaints prompting us to work harder  
later, but it's not clear at the moment that this really costs  
much, and I doubt we'd want to back-patch any complicated fix.  
  
In passing, adjust that error message in init_toast_snapshot()  
to be a little clearer about the likely cause of the problem.  
  
Patch by me, based on earlier investigation by Konstantin Knizhnik.  
  
Per bug #15990 from Andreas Wicht.  Back-patch to v11 where  
intra-procedure COMMIT was added.  
  
Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org  

M src/backend/access/heap/tuptoaster.c
M src/pl/plpgsql/src/pl_exec.c
M src/test/isolation/expected/plpgsql-toast.out
M src/test/isolation/specs/plpgsql-toast.spec

Clean up cpluspluscheck violation.

commit   : 04959a648b597e91cf2865d2bb1e543f5eb9d9a5    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Thu, 20 May 2021 13:03:09 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Thu, 20 May 2021 13:03:09 -0400    

Click here for diff

"typename" is a C++ keyword, so pg_upgrade.h fails to compile in C++.  
Fortunately, there seems no likely reason for somebody to need to  
do that.  Nonetheless, it's project policy that all .h files should  
pass cpluspluscheck, so rename the argument to fix that.  
  
Oversight in 57c081de0; back-patch as that was.  (The policy requiring  
pg_upgrade.h to pass cpluspluscheck only goes back to v12, but it  
seems best to keep this code looking the same in all branches.)  

M src/bin/pg_upgrade/pg_upgrade.h
M src/bin/pg_upgrade/version.c

Fix typo and outdated information in README.barrier

commit   : a399755ba2fc52e81b0a1346674b65d497406ffd    
  
author   : David Rowley <drowley@postgresql.org>    
date     : Tue, 18 May 2021 09:57:05 +1200    
  
committer: David Rowley <drowley@postgresql.org>    
date     : Tue, 18 May 2021 09:57:05 +1200    

Click here for diff

README.barrier didn't seem to get the memo when atomics were added. Fix  
that.  
  
Author: Tatsuo Ishii, David Rowley  
Discussion: https://postgr.es/m/20210516.211133.2159010194908437625.t-ishii%40sraoss.co.jp  
Backpatch-through: 9.6, oldest supported release  

M src/backend/storage/lmgr/README.barrier

Be more careful about barriers when releasing BackgroundWorkerSlots.

commit   : 6fcbaea7af4323e7e2847fb75fde313c832f4947    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Sat, 15 May 2021 12:21:06 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Sat, 15 May 2021 12:21:06 -0400    

Click here for diff

ForgetBackgroundWorker lacked any memory barrier at all, while  
BackgroundWorkerStateChange had one but unaccountably did  
additional manipulation of the slot after the barrier.  AFAICS,  
the rule must be that the barrier is immediately before setting  
or clearing slot->in_use.  
  
It looks like back in 9.6 when ForgetBackgroundWorker was first  
written, there might have been some case for not needing a  
barrier there, but I'm not very convinced of that --- the fact  
that the load of bgw_notify_pid is in the caller doesn't seem  
to guarantee no memory ordering problem.  So patch 9.6 too.  
  
It's likely that this doesn't fix any observable bug on Intel  
hardware, but machines with weaker memory ordering rules could  
have problems here.  
  
Discussion: https://postgr.es/m/4046084.1620244003@sss.pgh.pa.us  

M src/backend/postmaster/bgworker.c

Doc: correct erroneous entry in this week's minor release notes.

commit   : 037cc13f413de8d1d7e332e441d5d71451f2d336    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Fri, 14 May 2021 17:36:20 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Fri, 14 May 2021 17:36:20 -0400    

Click here for diff

The patch to disallow a NULL specification in combination with  
GENERATED ... AS IDENTITY applied to both ALWAYS and BY DEFAULT  
variants of that clause, not only the former.  
  
Noted by Shay Rojansky.  
  
Discussion: https://postgr.es/m/CADT4RqAwD3A=RvGiQU9AiTK-6VeuXcycwPHmJPv_OBCJFYOEww@mail.gmail.com  

M doc/src/sgml/release-11.sgml

Prevent infinite insertion loops in spgdoinsert().

commit   : d776045eb12a6738dfa42c82518dacbfa38ae853    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Fri, 14 May 2021 15:07:34 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Fri, 14 May 2021 15:07:34 -0400    

Click here for diff

Formerly we just relied on operator classes that assert longValuesOK  
to eventually shorten the leaf value enough to fit on an index page.  
That fails since the introduction of INCLUDE-column support (commit  
09c1c6ab4), because the INCLUDE columns might alone take up more  
than a page, meaning no amount of leaf-datum compaction will get  
the job done.  At least with spgtextproc.c, that leads to an infinite  
loop, since spgtextproc.c won't throw an error for not being able  
to shorten the leaf datum anymore.  
  
To fix without breaking cases that would otherwise work, add logic  
to spgdoinsert() to verify that the leaf tuple size is decreasing  
after each "choose" step.  Some opclasses might not decrease the  
size on every single cycle, and in any case, alignment roundoff  
of the tuple size could obscure small gains.  Therefore, allow  
up to 10 cycles without additional savings before throwing an  
error.  (Perhaps this number will need adjustment, but it seems  
quite generous right now.)  
  
As long as we've developed this logic, let's back-patch it.  
The back branches don't have INCLUDE columns to worry about, but  
this seems like a good defense against possible bugs in operator  
classes.  We already know that an infinite loop here is pretty  
unpleasant, so having a defense seems to outweigh the risk of  
breaking things.  (Note that spgtextproc.c is actually the only  
known opclass with longValuesOK support, so that this is all moot  
for known non-core opclasses anyway.)  
  
Per report from Dilip Kumar.  
  
Discussion: https://postgr.es/m/CAFiTN-uxP_soPhVG840tRMQTBmtA_f_Y8N51G7DKYYqDh7XN-A@mail.gmail.com  

M doc/src/sgml/spgist.sgml
M src/backend/access/spgist/spgdoinsert.c

Fix query-cancel handling in spgdoinsert().

commit   : f4ee4082f50bfcaa55452762da34541b164c0ec6    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Fri, 14 May 2021 13:26:55 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Fri, 14 May 2021 13:26:55 -0400    

Click here for diff

Knowing that a buggy opclass could cause an infinite insertion loop,  
spgdoinsert() intended to allow its loop to be interrupted by query  
cancel.  However, that never actually worked, because in iterations  
after the first, we'd be holding buffer lock(s) which would cause  
InterruptHoldoffCount to be positive, preventing servicing of the  
interrupt.  
  
To fix, check if an interrupt is pending, and if so fall out of  
the insertion loop and service the interrupt after we've released  
the buffers.  If it was indeed a query cancel, that's the end of  
the matter.  If it was a non-canceling interrupt reason, make use  
of the existing provision to retry the whole insertion.  (This isn't  
as wasteful as it might seem, since any upper-level index tuples we  
already created should be usable in the next attempt.)  
  
While there's no known instance of such a bug in existing release  
branches, it still seems like a good idea to back-patch this to  
all supported branches, since the behavior is fairly nasty if a  
loop does happen --- not only is it uncancelable, but it will  
quickly consume memory to the point of an OOM failure.  In any  
case, this code is certainly not working as intended.  
  
Per report from Dilip Kumar.  
  
Discussion: https://postgr.es/m/CAFiTN-uxP_soPhVG840tRMQTBmtA_f_Y8N51G7DKYYqDh7XN-A@mail.gmail.com  

M src/backend/access/spgist/spgdoinsert.c

Refactor CHECK_FOR_INTERRUPTS() to add flexibility.

commit   : 8274f4517602112a08f6d06a26ba46ed0c520467    
  
author   : Tom Lane <tgl@sss.pgh.pa.us>    
date     : Fri, 14 May 2021 12:54:26 -0400    
  
committer: Tom Lane <tgl@sss.pgh.pa.us>    
date     : Fri, 14 May 2021 12:54:26 -0400    

Click here for diff

Split up CHECK_FOR_INTERRUPTS() to provide an additional macro  
INTERRUPTS_PENDING_CONDITION(), which just tests whether an  
interrupt is pending without attempting to service it.  This is  
useful in situations where the caller knows that interrupts are  
blocked, and would like to find out if it's worth the trouble  
to unblock them.  
  
Also add INTERRUPTS_CAN_BE_PROCESSED(), which indicates whether  
CHECK_FOR_INTERRUPTS() can be relied on to clear the pending interrupt.  
  
This commit doesn't actually add any uses of the new macros,  
but a follow-on bug fix will do so.  Back-patch to all supported  
branches to provide infrastructure for that fix.  
  
Alvaro Herrera and Tom Lane  
  
Discussion: https://postgr.es/m/20210513155351.GA7848@alvherre.pgsql  

M src/backend/tcop/postgres.c
M src/include/miscadmin.h

Rename the logical replication global "wrconn"

commit   : b0e6e08b1981ebcdb2f3e7576de7ed0d004cfc16    
  
author   : Alvaro Herrera <alvherre@alvh.no-ip.org>    
date     : Wed, 12 May 2021 19:13:54 -0400    
  
committer: Alvaro Herrera <alvherre@alvh.no-ip.org>    
date     : Wed, 12 May 2021 19:13:54 -0400    

Click here for diff

The worker.c global wrconn is only meant to be used by logical apply/  
tablesync workers, but there are other variables with the same name. To  
reduce future confusion rename the global from "wrconn" to  
"LogRepWorkerWalRcvConn".  
  
While this is just cosmetic, it seems better to backpatch it all the way  
back to 10 where this code appeared, to avoid future backpatching  
issues.  
  
Author: Peter Smith <smithpb2250@gmail.com>  
Discussion: https://postgr.es/m/CAHut+Pu7Jv9L2BOEx_Z0UtJxfDevQSAUW2mJqWU+CtmDrEZVAg@mail.gmail.com  

M src/backend/replication/logical/launcher.c
M src/backend/replication/logical/tablesync.c
M src/backend/replication/logical/worker.c
M src/include/replication/worker_internal.h