当前位置: 动力学知识库 > 问答 > 编程问答 >

erlang - PostgreSQL backend behavior upon receiving "Terminate" ('X') after "COMMIT"

问题描述:

We run a postgres server v9.2.8, and use epgsql (erlang) as a client library. And in some cases, which we had on production but weren't able to reproduce in dev environment, we're loosing data.

A function in our application (it should be killed) allows an operator to change session parameters on a running connection. Since connection is usually always busy on production, a "SET SESSION bla-bla" query always crashes pgsql_connection process.

Before crashing, pgsql_connection sends a "Terminate" ('X') signal via pgsql_sock (a wrapper around tcp socket) to a backend. At the same time another erlang process (let's call it "worker") is waiting for a response from postgres backend using the same socket.

Now the question: is it possible that upon receiving a "Terminate" signal from a client, backend can cancel last transaction even if it has sent an "OK" on "COMMIT" statement already?

Because if it is possible, a worker will have a chance to report to the main application process about successfully written transaction while indeed the transaction has been cancelled.

Or, where can I read more details about this? Documentation says (http://www.postgresql.org/docs/9.2/static/protocol-flow.html):

For either normal or abnormal termination, any open transaction is

rolled back, not committed. One should note however that if a frontend

disconnects while a non-SELECT query is being processed, the backend

will probably finish the query before noticing the disconnection. If

the query is outside any transaction block (BEGIN ... COMMIT sequence)

then its results might be committed before the disconnection is

recognized.

– not a crystal clear statement.

网友答案:

Now the question: is it possible that upon receiving a "Terminate" signal from a client, backend can cancel last transaction even if it has sent an "OK" on "COMMIT" statement already?

No. that is fundamentally impossible. If it's committed, it's committed, and there's no going back. That's what "commit" means.

The only time Pg might return success before the commit hits disk and is persistent is if you told it to by setting synchronous_commit = off.

If you're seeing anything different happening then most likely it's a result of attempting to share a single connection between multiple processes (as you establish the connection before fork()) without proper locking or other mutual exclusion to ensure that the connection is locked while a command is in-flight.

Note that the reverse isn't true, which might be what you're thinking of with the quoted documentation passage. A transaction can get committed without returning a successful OK to the client if the client goes away (crashes, loses connection, etc) after issuing the commit command.


What the application is doing, where it sends out-of-sync messages on the wire protocol, is totally broken. It's guaranteed to cause unpredictable problems. The protocol is somewhat robust, so you're not likely to get things like an unintended commit, but you're very likely to get transactions aborted or whole sessions disconnected suddenly.


If you need to be able to roll back/abort committed transactions, then your application design has problems. You're not really ready to commit when you say COMMIT. You would have the same problem if the app process crashed or the whole server crashed between Pg committing the transaction and you doing whatever you need to do.

If you cannot fix the app design to avoid this then you will have to use two-phase transactions, either directly using PREPARE TRANSACTION then COMMIT PREPARED, or indirectly via the XA API. This has significant costs in performance and management overhead, but it's the only option if you need to do special work after database commit but before you're really "done".

The docs you quote are talking about the case where the app has sent a COMMIT but then disconnects before receiving the backend's acknowledgement of the commit. Because TCP/IP is buffered there's no guarantee the COMMIT got flushed to Pg, and if it did there's no guarantee it doesn't accompany the RST that terminates the connection. So in this specific case it's somewhat uncertain whether the transaction will commit or not. An application for which this is a problem would need to have a way of checking whether the last unit of work committed or not when it resumes work, or if it can't do that use two-phase transactions. The docs you quote say nothing about being able to cancel a commit after it's completed, because you can't. Ever.

Assuming that the app has to do some kind of extra work after commit, like moving a file or sending an email or doing work on another data store, then you're probably going to need two-phase transactions. Even then you're vulnerable to issues unless all parties in the distributed transaction support two phase commit, because your "other bit" could get done then your worker or server could crash before the confirmation of its completion is sent to the database to finish phase II of the commit.

You can keep your own two phase commit log of sorts in the DB instead of using true 2PC:

  1. Do the main database work and write a record to the work log table that says "I've done the work in the database and I'm about to do the next part".

  2. Do the next part; and

  3. Update the work log to say the next part is done.

... but this has the same problem, where a crash between parts 2 and 3 causes the app to forget that it did part 2 and repeat it on startup. If you can't live with that, you need to find a way to make part 2 commit completion verifiable, so you can tell if it's done or not, or find a way to make it capable of doing 2-phase commit.

To learn more about this topic, read about XA, distributed transactions, two-phase commit, etc.

分享给朋友:
您可能感兴趣的文章:
随机阅读: