├── README.md ├── client.py ├── exec_client.py ├── requirements.txt ├── server.py └── util.py /README.md: -------------------------------------------------------------------------------- 1 | # pyzdb - a lightweight database with Python syntax queries, using ZeroMQ 2 | 3 | **Please note this project's name change from pydb to pyzdb.** 4 | 5 | pyzdb ("pies db") is a database for storing nested `list` and `dict` and allows Python syntax queries instead of some variation of SQL. A deliberate choice is made to make no optimization on the queries so you know exactly what paths queries take. 6 | 7 | ## Installation 8 | 9 | pyzdb depends on 10 | 11 | - [pyzmq](https://github.com/zeromq/pyzmq) 12 | - [undoable](https://github.com/asrp/undoable) 13 | - [portalocker](https://pypi.python.org/pypi/portalocker) (not needed yet, under consideration) 14 | 15 | Install with 16 | 17 | pip install -r requirements.txt 18 | 19 | Note that undoable is not yet on PyPI and is installed using the `-e` flag. Alternatively, it can be downloaded manually and put in the same directory as `server.py` and `client.py`. 20 | 21 | ## Running 22 | 23 | In one terminal, run 24 | 25 | python server.py 26 | 27 | In a different terminal, run 28 | 29 | python client.py 30 | 31 | to get a prompt to access the database 32 | 33 | > db 34 | 35 | ### Sample session 36 | 37 | > db 38 | {} 39 | > db['x'] = 3 40 | None 41 | > db['x'] 42 | 3 43 | > db['l'] = range(10) 44 | None 45 | > db 46 | {u'x': 3, u'l': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]} 47 | > db['l'][4] = ['a', 'b', 'c'] 48 | None 49 | > db.undo() 50 | None 51 | > db 52 | {u'x': 3, u'l': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]} 53 | > db.undo() 54 | None 55 | > db.redo() 56 | None 57 | > db.redo() 58 | None 59 | > db 60 | {u'x': 3, u'l': [0, 1, 2, 3, [u'a', u'b', u'c'], 5, 6, 7, 8, 9]} 61 | > [v for v in db['l'][4]] 62 | [u'a', u'b', u'c'] 63 | > [k for k in db] 64 | [u'x', u'l'] 65 | > db['l'][4].append('d') 66 | > db.save() 67 | 68 | The server can be stopped and restarted to continue the session (without needing to restart the client). The most common use in an application is to import and use the client while the server is still run the same way. 69 | 70 | from pyzdb.client import client 71 | db, socket = client() 72 | db['x'] = 3 73 | x = db['x']._run() 74 | db['l'] = range(10) 75 | odd_squares = [v*v for v in db['l'] if v%2] 76 | db.save() 77 | 78 | ### `lock` example 79 | 80 | from pyzdb.client import client 81 | db, socket = client() 82 | db.lock() 83 | x = db['x']._run() 84 | db['x'] = x + 1 85 | db.unlock() 86 | 87 | (or simply `db['x'] = db['x']._run() + 1` but this will also run *two* queries instead of one.) 88 | 89 | ### Multiple read-only servers example 90 | 91 | In one terminal 92 | 93 | python server.py 94 | 95 | In a second terminal 96 | 97 | python server.py -ro 98 | 99 | In a third terminal 100 | 101 | python client.py tcp://localhost:5559 tcp://localhost:5561 102 | 103 | And proceed as previous examples. 104 | 105 | `tcp://localhost:5559` is the read-write URI and `tcp://localhost:5561` is the read-only URI. 106 | 107 | Note that all reads gets data from some consistent version of the database that's not necessarily the latest version so something like this from the client is possible. 108 | 109 | > db['x'] = 3 110 | None 111 | > db.save() 112 | None 113 | > db['x'] = 4 114 | None 115 | > db.save() 116 | None 117 | > db['x'] 118 | 3 119 | 120 | If no parameters are passed to `client.py` or only a single parameter (the read-write URI), the client will only connect to the read-write server. This is the intended method for using only one database server. 121 | 122 | ## Intended use 123 | 124 | One intended use is a single instance of the database server with any number of clients (for example a web server) running on the same machine. The total database size isn't too large and large chunks are stored externally in files and represented by `File` objects in the database. 125 | 126 | Another possibility is to have a single read-write server with multiple read-only server. The read-only server reload the database from the filesystem so to update them, update the database file. 127 | 128 | ## Architecture 129 | 130 | ![Architecture](https://asrp.github.io/blog/multi-write-architecture.svg) 131 | 132 | `server.py` (with no arguments) starts 133 | 134 | - one queued router-dealer for read-only servers 135 | - one queued router-dealer for a read-write server 136 | - a read-write reply (`zmq.REP`) server. 137 | 138 | The router-dealers serializes (and queues) incoming requests and sends one request at a time to the reply server of the right type. The reply server handles the request and answers the client (of that request). 139 | 140 | All messages are encoded in JSON. 141 | 142 | [This post discusses choices when multiple read-only servers were added](https://asrp.github.io/blog/pyzdb_multiple_read.html). 143 | 144 | ## Debugging 145 | 146 | `exec_client.py` is provided to help debugging. All requests sent from the client are executed (`exec in globals()`) on the server. This feature should probably be disabled for any public-facing program (safest would be to delete it from `server.py`). 147 | 148 | ## Features and non-features 149 | 150 | ### Data type 151 | 152 | All data is JSON encoded and decoded so only JSON-encodable data can be stored, although its possible to write your own encoder to support more types of data. 153 | 154 | ### Connection type 155 | 156 | In theory, this database could allow any type of connection that ZeroMQ allows but most tests were done using TCP. 157 | 158 | ### Query syntax 159 | 160 | As seen the above examples, the first argument returned by `client` is treated as the root of nested `list` and `dict` and regular Python syntax is used to describe the entries we want to read or modify from that root (such as `db['l'][4]`). 161 | 162 | To read an entry that is not an iterator, an extra `._run()` function needs to be called (such as `db['l'][4]._run()`). This is because of the lazy evaluation implemented so `db['l'][4]` never actually sends any requests. 163 | 164 | ### Serializability 165 | 166 | `lock` and `unlock` in allows a client to get exclusive access to the database. No writes *or reads* are possible from other clients in the meantime. The client program has to be written so the order in which other accesses are treated are unimportant. 167 | 168 | No deadlock (dead client) checks are implemented. 169 | 170 | No support for locking only part of the database is implemented although this could be implemented using `lock` and `unlock`. 171 | 172 | db.lock() 173 | if not db['locks'].get(('l', 4), False) and not db['locks'].get(('l'), False): 174 | db['locks'][('l', 4)] = True 175 | else: 176 | db.unlock() 177 | # return and do something else meanwhile 178 | db.unlock() 179 | # Do stuff on db['l'][4] 180 | db.lock() 181 | assert(db['locks'].get(('l', 4), False)) 182 | db['locks'][('l', 4)] = False 183 | db.unlock() 184 | 185 | ### Disk storage 186 | 187 | The database itself is stored as a single pickled file but large blobs can be stored on the filesystem and a path in the database. If the client is on the same 188 | 189 | Large files are "transfered" using a `File("/path/to/file")` object. The file needs to already be on the server's filesystem before a `File` object is stored. 190 | 191 | ### Rollback 192 | 193 | `Server.undo` and `Server.redo` are available but need to be called manually from a client. 194 | 195 | ### Multiple database servers 196 | 197 | Its now possible to have multiple read-only servers and one read-write server. 198 | 199 | The read-only servers should reload the database from disk periodically (with the database file made available to them through some other means like a network filesystem or simply copying). No system for sending just the changes are available out of the box. 200 | 201 | ### Authentication 202 | 203 | There is no authentication mechanism. The intended usage is to have appropriate firewall rules outside the database. 204 | 205 | ## Discussion 206 | 207 | [Here is a discussion about adding multiple read-only servers](https://asrp.github.io/blog/pyzdb_multiple_read.html) 208 | 209 | ## Other similar projects 210 | 211 | - [Tinydb](https://github.com/msiemens/tinydb) 212 | - [Python's shelve](https://docs.python.org/2/library/shelve.html) 213 | - [Blitzdb](https://github.com/adewes/blitzdb) 214 | -------------------------------------------------------------------------------- /client.py: -------------------------------------------------------------------------------- 1 | import zmq 2 | import json 3 | import traceback 4 | from util import File, Encoder, read_only 5 | import logging 6 | 7 | class Caller(object): 8 | def __init__(self, prefix, sockets): 9 | self.prefix = prefix 10 | self._sockets = sockets 11 | 12 | def __getattr__(self, attrib): 13 | def dummyfunc(*args, **kwargs): 14 | self._run(attrib, args, kwargs) 15 | return dummyfunc 16 | 17 | def __iter__(self, *args, **kwargs): 18 | return iter(self._run("__iter__", args, kwargs)) 19 | 20 | def __getitem__(self, *args, **kwargs): 21 | return Caller(self.prefix + [args[0]], self._sockets) 22 | 23 | def __setitem__(self, *args, **kwargs): 24 | self._run("__setitem__", args, kwargs) 25 | 26 | def __delitem__(self, *args, **kwargs): 27 | self._run("__delitem__", args, kwargs) 28 | 29 | def __len__(self, *args, **kwargs): 30 | return int(self._run("__len__", args, kwargs)) 31 | 32 | def __contains__(self, *args, **kwargs): 33 | return self._run("__contains__", args, kwargs) == "True" 34 | 35 | def _run(self, func=None, args=(), kwargs=None): 36 | kwargs = kwargs if kwargs is not None else {} 37 | logging.debug("Running %s %s %s on %s", func, 38 | args, kwargs, self.prefix) 39 | message = json.dumps({"mode": "read" if func in read_only else "write", 40 | "index": self.prefix, 41 | "func": func, 42 | "args": args, 43 | "kwargs": kwargs}, 44 | cls=Encoder) 45 | if self._sockets['lock'] is not None: 46 | socket = self._sockets['lock'] 47 | elif func in read_only and self._sockets['read'] is not None: 48 | socket = self._sockets['read'] 49 | else: 50 | socket = self._sockets['write'] 51 | socket.send(message) 52 | answer = socket.recv() 53 | answer = json.loads(answer) 54 | return answer 55 | 56 | def lock(self): 57 | assert(self._sockets['lock'] is None) 58 | self._sockets['write'].send(json.dumps({"mode": "lock", "action": "lock"})) 59 | answer = json.loads(self._sockets['write'].recv()) 60 | assert(answer["locked"] == True) 61 | self._uri = answer["uri"] 62 | logging.debug("Reconnecting on %s" % self._uri) 63 | self._sockets['lock'] = zmq.Context().socket(zmq.REQ) 64 | self._sockets['lock'].connect(answer["uri"]) 65 | 66 | def unlock(self): 67 | socket = self._sockets['lock'] 68 | socket.send(json.dumps({"mode": "unlock", "action": "unlock"})) 69 | answer = json.loads(socket.recv()) 70 | assert(answer["locked"] == False) 71 | socket.close() 72 | self._sockets['lock'] = None 73 | del self._uri 74 | 75 | def client(write_uri="tcp://localhost:5559", read_uri=None): 76 | context = zmq.Context() 77 | logging.debug("Connecting to server on %s %s", write_uri, read_uri) 78 | sockets = {"write": context.socket(zmq.REQ), 79 | "read": context.socket(zmq.REQ) if read_uri is not None else None, 80 | "lock": None} 81 | sockets["write"].connect(write_uri) 82 | if sockets["read"] is not None: 83 | sockets["read"].connect(read_uri) 84 | return Caller([], sockets), sockets 85 | 86 | if __name__ == "__main__": 87 | import argparse 88 | parser = argparse.ArgumentParser() 89 | parser.add_argument("write_uri", default="tcp://localhost:5559", nargs='?') 90 | parser.add_argument("read_uri", default=None, nargs='?') #5561 91 | parser.add_argument("-v", "--verbosity", action="count", default=0) 92 | args = parser.parse_args() 93 | if args.verbosity >= 1: 94 | logging.basicConfig(level=logging.DEBUG) 95 | 96 | db, sockets = client(write_uri=args.write_uri, read_uri=args.read_uri,) 97 | while True: 98 | command = raw_input("> ") 99 | try: 100 | try: 101 | co = compile(command, "", "eval") 102 | except SyntaxError: 103 | co = compile(command, "", "exec") 104 | 105 | ret = eval(co, globals()) 106 | # getitem heuristic guess 107 | if type(ret) == Caller: 108 | ret = ret._run() 109 | print ret 110 | except: 111 | traceback.print_exc() 112 | -------------------------------------------------------------------------------- /exec_client.py: -------------------------------------------------------------------------------- 1 | import zmq 2 | import json 3 | import sys 4 | 5 | context = zmq.Context() 6 | print "Connecting to server..." 7 | socket = context.socket(zmq.REQ) 8 | socket.connect(sys.argv[1] if len(sys.argv)>1 else "tcp://localhost:5559") 9 | while True: 10 | request = raw_input(">>> ") 11 | socket.send(json.dumps({"mode": "exec", "command": request})) 12 | message = socket.recv() 13 | print "Received reply", message 14 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pyzmq 2 | -e git://github.com/asrp/undoable#egg=undoable 3 | -------------------------------------------------------------------------------- /server.py: -------------------------------------------------------------------------------- 1 | import zmq 2 | from zmq.devices import ProcessDevice 3 | import random 4 | from undoable import observed_dict, UndoLog, deepwrap 5 | from util import File, Encoder 6 | import json 7 | import traceback 8 | import sys, os, errno 9 | from cStringIO import StringIO 10 | import time 11 | #from lock import acquire_lock, release_lock 12 | import cPickle as pickle 13 | import copy_reg 14 | import types 15 | import shutil 16 | import logging 17 | 18 | def reduce_method(m): 19 | return (getattr, (m.__self__, m.__func__.__name__)) 20 | 21 | copy_reg.pickle(types.MethodType, reduce_method) 22 | 23 | class Database(observed_dict): 24 | @staticmethod 25 | def load(filename, *args, **kwargs): 26 | if os.path.isfile(filename): 27 | # Hack! Need to debug. 28 | output = pickle.load(open(filename)) 29 | output.undolog.undoroot = output.undolog.root 30 | output.timestamp = os.stat(filename).st_mtime 31 | return output 32 | else: 33 | return Database(filename, *args, **kwargs) 34 | 35 | def __init__(self, filename, bigfiledir=os.path.join([".", "bigfiles"]), 36 | *args, **kwargs): 37 | observed_dict.__init__(self, *args, **kwargs) 38 | self.undolog = UndoLog() 39 | self.undolog.add(self) 40 | self.filename = filename 41 | self.bigfiledir = bigfiledir 42 | self.timestamp = 0 43 | 44 | def save(self): 45 | #acquire_lock(self.filename + ".lock", "exclusive") 46 | pickle.dump(self, open(self.filename + ".new", "w")) 47 | os.rename(self.filename + ".new", self.filename) 48 | #release_lock(self.filename + ".lock") 49 | 50 | def newfile(self, filename): 51 | self["_filenum"] = self.get("_filenum", 0) + 1 52 | location = os.path.join(self.bigfiledir, str(self["_filenum"])) 53 | self["_files"][location] = filename 54 | return os.path.join(self.bigfiledir, str(self["_filenum"])) 55 | 56 | def wrapfile(self, elem): 57 | if type(elem) == dict and "_customtype" in elem: 58 | if elem["_customtype"] == "file": 59 | newname = self.newfile(elem["filename"]) 60 | if "content" not in elem: 61 | shutil.move(os.path.join(elem["location"]), newname) 62 | return File(newname, elem["filename"]) 63 | else: 64 | open(newname, "w").write(elem["content"]) 65 | return File(newname, newname) 66 | else: 67 | return None 68 | 69 | def undo(self): 70 | self.undolog.undo() 71 | 72 | def redo(self): 73 | self.undolog.redo() 74 | 75 | def router_dealer(client_uri, server_uri, prefix='write'): 76 | pd = ProcessDevice(zmq.QUEUE, zmq.ROUTER, zmq.DEALER) 77 | pd.bind_in(client_uri) 78 | pd.bind_out(server_uri) 79 | pd.setsockopt_in(zmq.IDENTITY, '%s-router' % prefix) 80 | pd.setsockopt_out(zmq.IDENTITY, '%s-dealer' % prefix) 81 | return pd 82 | 83 | class Server(object): 84 | def __init__(self, db, server_uri, lock_uri=None, read_only=False): 85 | self.server_uri = server_uri 86 | self.rep_uri = server_uri.replace("*", "localhost") 87 | self.lock_uri = lock_uri 88 | self.auto_reload = zmq.NOBLOCK if read_only else 0 89 | self.read_only = read_only 90 | self.db = db 91 | self.running = False 92 | 93 | def start(self): 94 | self.context = zmq.Context() 95 | self.socket = self.context.socket(zmq.REP) 96 | self.socket.connect(self.rep_uri) 97 | 98 | def reload_db(self, filename=None): 99 | filename = filename if filename is not None else self.db.filename 100 | if not os.path.isfile(filename) or self.db.timestamp < os.stat(filename).st_mtime: 101 | if os.path.isfile(filename): 102 | logging.debug("Reloading database from %s", filename) 103 | logging.debug("Timestamps old=%s new=%s", self.db.timestamp, os.stat(filename).st_mtime) 104 | self.db = Database.load(filename) 105 | 106 | def run(self): 107 | self.running = True 108 | while self.running: 109 | try: 110 | message = self.socket.recv(self.auto_reload) 111 | except zmq.ZMQError as e: 112 | self.reload_db() 113 | time.sleep(0.1) 114 | continue 115 | logging.debug("Received request: %s" % message) 116 | try: 117 | message = json.loads(message) 118 | if message["mode"] == "exec": 119 | # Make extra checks here 120 | old_stdout = sys.stdout 121 | stdout = sys.stdout = StringIO() 122 | try: 123 | co = compile(message["command"], "", "single") 124 | exec co in globals() 125 | except: 126 | output = sys.exc_info() 127 | else: 128 | output = stdout.getvalue() 129 | sys.stdout = old_stdout 130 | elif message["mode"] == "readall": 131 | output = db 132 | elif message["mode"] == "lock": 133 | assert(not self.read_only) 134 | output = {"locked": True, "uri": self.lock_uri} 135 | elif message["mode"] == "unlock": 136 | assert(not self.read_only) 137 | output = {"locked": False} 138 | else: 139 | entry = self.db 140 | for key in message["index"]: 141 | entry = entry[key] 142 | if not message.get("func"): 143 | output = entry 144 | else: 145 | func = getattr(entry, message["func"]) 146 | message["args"] = deepwrap(message["args"], entry.callbacks, entry.undocallbacks, self.db.wrapfile, skiproot=True) 147 | message["kwargs"] = deepwrap(message["kwargs"], entry.callbacks, entry.undocallbacks, self.db.wrapfile, skiproot=True) 148 | output = func(*message["args"], **message["kwargs"]) 149 | except: 150 | output = traceback.format_exc() 151 | logging.error(traceback.print_exc()) 152 | if type(output).__name__ in ['listiterator', 'dictionary-keyiterator']: 153 | output = list(output) 154 | try: 155 | output = json.dumps(output, cls=Encoder) 156 | except: 157 | output = str(output) 158 | self.socket.send(output) 159 | if message["mode"] == "lock": 160 | self.normal_socket = self.socket 161 | self.socket = zmq.Context().socket(zmq.REP) 162 | self.socket.bind(self.lock_uri.replace("localhost", "*")) 163 | logging.debug("Locked and listening on %s" % self.lock_uri) 164 | elif message["mode"] == "unlock": 165 | self.socket.close() 166 | self.socket = self.normal_socket 167 | logging.debug("Unlocked") 168 | 169 | if __name__ == "__main__": 170 | import argparse 171 | parser = argparse.ArgumentParser() 172 | parser.add_argument("db_filename", default="/tmp/db.pkl", nargs='?') 173 | parser.add_argument("write_client_uri", default="tcp://*:5559", nargs='?') 174 | parser.add_argument("write_server_uri", default="tcp://*:5560", nargs='?') 175 | parser.add_argument("read_client_uri", default="tcp://*:5561", nargs='?') 176 | parser.add_argument("read_server_uri", default="tcp://*:5562", nargs='?') 177 | parser.add_argument("lock_uri", default="tcp://localhost:5558", nargs='?') 178 | parser.add_argument("-ro", "--read-only", action="store_true") 179 | parser.add_argument("-v", "--verbosity", action="count", default=0) 180 | args = parser.parse_args() 181 | if args.verbosity == 1: 182 | logging.basicConfig(level=logging.INFO) 183 | elif args.verbosity >= 2: 184 | logging.basicConfig(level=logging.DEBUG) 185 | 186 | db = Database.load(args.db_filename) 187 | logging.info("Starting server: %s", args) 188 | if args.read_only: 189 | server = Server(db, args.read_server_uri, read_only=True) 190 | else: 191 | write_router = router_dealer(args.write_client_uri, args.write_server_uri) 192 | read_router = router_dealer(args.read_client_uri, args.read_server_uri) 193 | server = Server(db, args.write_server_uri, args.lock_uri) 194 | write_router.start() 195 | read_router.start() 196 | server.start() 197 | server.run() 198 | -------------------------------------------------------------------------------- /util.py: -------------------------------------------------------------------------------- 1 | import json 2 | 3 | class Encoder(json.JSONEncoder): 4 | def default(self, obj): 5 | if isinstance(obj, File): 6 | return {"_customtype":"file", "filename": obj.filename, "location": obj.location} 7 | else: 8 | return json.JSONEncoder.default(self, obj) 9 | 10 | class File(object): 11 | def __init__(self, location, filename = None): 12 | self.location = location 13 | if filename is None: 14 | filename = os.path.basename(location) 15 | self.filename = filename 16 | 17 | read_only = [None, "__getitem__", "__iter__", "__len__", "__contains__", "keys", "items", "values", "get", "__eq__", "__ne__"] 18 | --------------------------------------------------------------------------------