当前位置: 动力学知识库 > 问答 > 编程问答 >

python - pymongo - Serialize/pickle Connection or Database object

问题描述:

I want to write a custom OutputWriter for GAE's Mapreduce framework. This OutputWriter should open a direct tcp connection to an open MongoDB port, and write the results of the reduce step directly to this database.

I'm using pymongo to interact with mongodb. The existing Mapreduce library requires output writers to be JSON serializable. Once the output writer has thus established a connection with the mongodb instance like so:

from pymongo import Connection

conn = Connection(host=MONGODB_HOST, port=MONGODB_PORT)

db = conn.test_db

db.authenticate(MONGODB_USERNAME, MONGODB_PASSWD)

I'd like to either serialize Connection (of type pymongo.connection.Connection) or db itself (a pymongo.database.Database). Naturally, those objects aren't JSON serializable, so I thought I could just make a JSON dict with a pickled database inside, but it seems that pymongo doesn't natively support pickling these objects, i.e. neither has a __getstate__ method.

I assume I could simply store the connection and authentication parameters, and reopen a connection when the OutputWriter is deserialized, but that seems overly hacky and time and resource intensive.

Can someone point me to a workaround, or perhaps a different kind of serialization I haven't thought of?

网友答案:

I assume I could simply store the connection and authentication parameters, and reopen a connection when the OutputWriter is deserialized, but that seems overly hacky and time and resource intensive.

What else would you expect to be able to do? A database connection is, in general, a wrapper around some objects that lives outside of Python (sockets, file handles, instances of opaque objects created by a C library, etc.), so there's no way to just store one and restore it in a later instance of the process, pass it to a different process, etc. So, any general-purpose serialization for a class like this would have to work by storing the connection parameters and re-connecting.

But there are many cases where you wouldn't want to do that. (Also, remember that making something pickleable also makes it copyable, and it's far from clear that you'd always want to copy a database connection by opening a new distinct but equivalent connection.) Which is why most database connection objects and similar things are not pickleable.

Meanwhile, if you're trying to pass these around within a process, while the connection is still alive… then you shouldn't be pickling them in the first place, just pass references to the connection around.

So anyway, I'd suggest you do exactly what you suggested but didn't want to do, but wrap it up by subclassing (or monkeypatching) the two classes so they can be pickled directly, instead of passing a bunch of separate parameters around and making everyone else have to know how to deal with it.

I don't think __getstate__ will work here. That would imply that you can make a database connection by default-constructing the instance and then setting attributes or calling methods after the fact, but most database connection classes require you to pass arguments into the constructor call to be used at __new__ or __init__ time. You could probably do this with just __getnewargs__ (which is actually even simpler than __getstate__), however. If not, you'll need the more complex __reduce__ mechanism.

分享给朋友:
您可能感兴趣的文章:
随机阅读: