├── .gitignore ├── README ├── docs ├── broker.txt ├── client.txt ├── configs.txt ├── dictionary.txt ├── protocol.txt ├── scratchpad.txt └── worker.txt └── scale0 ├── scale0.py ├── test_client.py ├── test_worker.py └── tnetstrings.py /.gitignore: -------------------------------------------------------------------------------- 1 | nbproject 2 | *.pyc 3 | *.swp 4 | # I generally keep tornado installed with my projects so I can 5 | # switch versions for my projects. Ignore to keep out of the repo 6 | # for this project 7 | tornado/ 8 | 9 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | Scale0 - A dynamic service oriented smart load balancer. 2 | 3 | What does that mean? 4 | 5 | Dynamic - Scale0 is configured by the backend workers that connect to it. 6 | There are no virtual servers, pool members or other configuration necessary. 7 | 8 | Service Oriented - Scale0 routes frontend request to backends by the services 9 | the backends announce they support. 10 | 11 | Smart Load Balancer - Scale0 uses a least recently used queue methodology to 12 | route requests. 13 | 14 | Huh? I still don't get it. 15 | 16 | Say you have a dynamic website and you want to have multiple backend servers 17 | processing requests received by a frontend webserver. A typical scenario might 18 | be several Tornado or node.js servers being proxied by an Nginx server. 19 | 20 | A traffic spike happens, you need to get more backends up to handle the load. 21 | You hopefully have your Nginx servers load balanced so you can reconfigure them 22 | to support more backends. Your architecture might look like 23 | 24 | ------------------ 25 | | Backend Server | 26 | ------------------ 27 | 28 | ---------------- ------------------ 29 | | Proxy Server | | Backend Server | 30 | ---------------- ------------------ 31 | 32 | ----------------- ----------------- 33 | | Load Balancer | | Backend Server | 34 | ----------------- ------------------ 35 | 36 | ---------------- ------------------ 37 | | Proxy Server | | Backend Server | 38 | --------------- ------------------ 39 | ------------------ 40 | | Backend Server | 41 | ------------------ 42 | 43 | 44 | 45 | Adding more servers will require reconfiguring the proxy servers. Generally you 46 | can do this by turning off one in the pool, adding the new members, bringing it 47 | back up and doing the same on the other. If you're running in active/active 48 | mode on your load balancer you're cutting your input capability in half while 49 | trying to scale up. 50 | 51 | Alternatively you could of course stand up another proxy server with new backends 52 | and add it to the pool on the load balancer. Let's not pretend that the solution 53 | is impossible without Scale0, Scale0 just provides an alternative solution that 54 | allows you scale all the way at the backend, instead of messing around with the 55 | front door. 56 | 57 | Using Scale0, and something that supports ZeroMQ like Mongrel2 (or a module for 58 | Nginx would nice, hint hint) you could do something like this. 59 | 60 | ------------------ 61 | | Backend Server | 62 | ------------------ 63 | 64 | ---------------- ------------------ 65 | | Proxy Server | | Backend Server | 66 | ---------------- ------------------ 67 | 68 | ----------------- ---------- ------------------ 69 | | Load Balancer | | Scale0 | | Backend Server | 70 | ----------------- ---------- ------------------ 71 | 72 | ---------------- ------------------ 73 | | Proxy Server | | Backend Server | 74 | ---------------- ------------------ 75 | 76 | ------------------ 77 | | Backend Server | 78 | ------------------ 79 | 80 | Here's where the magic starts. Say you need to scale up. Just turn on more 81 | backend servers and connect them to Scale0. That's it, you can increase your 82 | processing capacity, or decrease it, at will. Scale0 is the elasticity that 83 | allows you to scale your system real time for load. 84 | 85 | Scale0 is also service agnostic. At this time it's not being designed with 86 | streams in mind, but for protocols like http or smtp it should work great. 87 | It will require services to be built supporting ZeroMQ on the frontend to 88 | support it, Scale0 is the first piece towards a scalable architecture. 89 | 90 | Note: Scale0 is in an early state of development. Some of the current ideas 91 | include using PUSH/PULL sockets for front end communication, allowing you to 92 | stand up more Scale0 servers. Then PUB/SUB for the backend, using ZMQ Identity 93 | to manage subscriptions. Internally Scale0 uses a Least Recently Used Queue 94 | methodology requiring backend servers to announce their readiness for accepting 95 | a request. More information will be in the docs once the project is finished 96 | being fleshed out. 97 | 98 | Note2: Hey! Doesn't that make Scale0 a single point of failure in an environment 99 | built for avoiding that? Part of the plans for Scale0 include the ability to set 100 | up failover servers. Backend and proxy servers will need to know about the failover 101 | server, and the plan is to allow Scale0 to provide that information and also build 102 | in a protocol and management interface to allow you to fail over manually as well, 103 | say when you need to do OS upgrades on the Scale0 server. 104 | -------------------------------------------------------------------------------- /docs/broker.txt: -------------------------------------------------------------------------------- 1 | * Been working scratchpad, this is out of date from actual implementation 2 | * now. Needs updating. 3 | 4 | The broker is the core of the Scale0 project. It provides the application 5 | logic and standards for peer and client connections. 6 | 7 | Every attempt is made to stay in accord with any standards that exist 8 | on the zeromq website, [ url goes here, working offline right now ]. The 9 | other issues Scale0 offers solutions for are reliability, scalability and 10 | mangement flexibility. 11 | 12 | ARCHITECTURE 13 | *** This design is going to make me need to go back over everything I've 14 | *** already typed up. Thought of this today while at work, and I think it 15 | *** should simplify the broker implementation. 16 | 17 | *** The language of choice to developer this initially is Python. As such 18 | *** I'll be using the multiprocessing library to get around the Global 19 | *** Interpreter Lock. In other languages it may be possible to use threads, 20 | *** and take advantage of inproc://, though from what I've read this may not 21 | *** be the best idea. In Python's case, wasn't really an option so there will 22 | *** be the ipc:// overhead. 23 | *** 24 | *** ipc:// might be the overall better answer, the guide even has a comment 25 | *** premature optimization. Anyway, I've put too much time into this for now 26 | *** without writing any code. And some point I have to come up with an idea 27 | *** and either prove or disprove it with code. 28 | 29 | Brokers will have two parts, Dispatchers and Routers. ( I'm not sold on the 30 | name Dispatcher yet, because it will do more ). At a minimum each Broker will 31 | consume 2 processes. While it could be written to only consume 1, forcing a 2 32 | process minimum will be easier, and consuming only a single core would really 33 | only be useful for dev/test implementations. 34 | 35 | Dispatcher - The Dispatcher is what will provide the ports Clients and Workers 36 | connect to and maintain communication with other peers. The Dispatcher will 37 | manage all configuration related tasks and will push all router/worker 38 | requests down to Router threads. 39 | 40 | Router - Router processes send Tasks to Workers, and send the Task Reply back 41 | to the Dispatcher who sends it Client who made the request. 42 | 43 | The startup process for a broker will look like this. 44 | - Dispatcher ioloop started. 45 | - Dispatcher ioloop connects to any Peers and validates it's config. 46 | - Dispatcher binds an ipc PUSH socket. 47 | - Dispatcher kicks off Router processes with their own ioloops. 48 | - Each router ioloop established a PULL socket to the Dispatcher PUSH. 49 | - Dispatcher validates Routers are connected. 50 | - Dispatcher creates a zmq_context and binds the Worker XREP socket. 51 | * somehow that context is passed to Workers? Need to mess around with this part 52 | - Dispatcher waits for Workers to connect, outputs to console as they do. 53 | - Once config option for minimum workers is reached, Dispatcher creates a 54 | zmq_context for the Clients and binds an XREP socket to it 55 | * somehow that context is passed to Workers? Need to mess around with this part 56 | - At this point the Broker is operational and passing messages. 57 | 58 | 59 | SERVICES 60 | 61 | Scale0 brokers will provide the following services: 62 | 63 | Service Discovery: In order to meet the requirements of being service 64 | agnostic, service discovery will need to be built in. Worker clients will 65 | be able to state the serivce(s) they provide on connection to the network 66 | and change that configuration realtime. Other core services will be built 67 | on this framework. This is handled by the Dispatcher. 68 | 69 | Configuration: A peer or client on startup should only need to be informed 70 | of at least one broker to connect to. It will the request the configuration 71 | it needs on startup. Not all brokers will be required to provide 72 | configuration services. However, at least one server per network will need 73 | to provide the service. This is handled by the Dispatcher. 74 | 75 | Heartbeat: Heartbeat will be the way that clients and peers are monitored 76 | for availability. This will also be used to keep a configuration up to 77 | date on configuration service brokers. Heartbeat service servers can be 78 | configured to share their heartbeat table with other peers using push or 79 | pull. For example, in a active/standby peer topology the master can receive 80 | all heartbeart information and share it with the standby. Pull can be used 81 | by peers requesting information about available workers clients. These 82 | requests wil be recieved by the Dispatcher and passed to Routers like any 83 | other Task. 84 | 85 | CONFIGURATION 86 | -------------------------------------------------------------------------------- /docs/client.txt: -------------------------------------------------------------------------------- 1 | [ CLIENT ] 2 | Clients are the applications that generate Tasks and send them to the Broker 3 | for processing. These are the classic frontend applications. The requirement 4 | for the frontend are to be able to support ZeroMQ REQ or XREQ connections and 5 | follow protocol requirements for a Client. 6 | 7 | An example connection would look like this. 8 | 9 | C ------TASK ----> B 10 | C <-- TASKREPLY -- B 11 | 12 | Brokers will also accept PING requests and reply with PONG in order for the 13 | Client to determine if the Broker is available. NEEDBROKER requests can be 14 | sent to the Broker in order to get a new Broker if the Broker is available to 15 | serivce that request. 16 | 17 | For more information about other responses please see the protocol document. 18 | It's expected that all Clients will be able to handle all error codes for 19 | example. 20 | -------------------------------------------------------------------------------- /docs/configs.txt: -------------------------------------------------------------------------------- 1 | Keeping the configuration within the environment up to date. The premise 2 | that this system is built on is that for the most part the config will be 3 | stable. That is new brokers and Workers will not be added and removed 4 | a lot, and config changes by admins will be rare. 5 | 6 | When a config change is made, it will be sent on the PUB/SUB config 7 | network. All brokers with the config service can PUB. All brokers should 8 | have a SUB port on all brokers with the config service. 9 | 10 | On receipt of a config, it will check the version with it's running version. 11 | If they are different it will update it's config. If it's the same it will 12 | check the hash of it's config against the published one. 13 | 14 | *** Not sure what to do if the hashes don't match yet. Conflict resolution 15 | could happen here and then it could push. Conflict resolution measures could 16 | be a bit complicated though. *** 17 | 18 | [ WORKER CONFIG ] 19 | This is config that each worker stores. Both identity and services should 20 | be sent with a CHECKCONFIG request. 21 | 22 | identity: 23 | unique identifier for this Worker 24 | 25 | services: 26 | List of services processed by the Worker. 27 | 28 | family_config: 29 | config (including fallback family config) for family 30 | 31 | prefered_broker: 32 | This is the broker that the Worker should attempt to connect to. 33 | If it is unable to connect to this broker it should send a NEEDBROKER 34 | request to another broker in the family_config if any exist. 35 | 36 | 37 | [ BROKER CONFIG ] 38 | This details all the information a brokers require for operation. Not all 39 | options need to be pushed to Workers. What options are required for Workers 40 | is detailed in the [ Worker CONFIG ] section above. 41 | 42 | identity: 43 | unique identifier for this broker 44 | 45 | frontend_port: 46 | The port on which frontends connect. 47 | 48 | backend_port: 49 | The port on which backends connect. 50 | 51 | config_pub_port: 52 | The port on which this broker publishes new configurations. 53 | 54 | config_sub_ports: 55 | The port(s) on which the broker listens for new configurations. 56 | 57 | *** The idea here is that everything, when possible, should listen for 58 | configuration changes from more than one source in order to provide 59 | reliability and redundancy. **** 60 | 61 | state: joining, active, disabled 62 | joining is the default state when adding a broker to the network. Once 63 | it's received it's config it will go to an active state. Brokers can 64 | also be disabled. In a disabled state they will provide no response to 65 | requests except for a management request to re-enable. 66 | 67 | peers: List of peer sockets to communicate with. This is the port used for 68 | heartbeat and work status communication. 69 | 70 | default_heartbeat_interval: 71 | Workers can negotiate a longer heartbeat interval, but not a shorter one. 72 | 73 | services: 74 | list of services managed by the broker, or all 75 | 76 | LRU: 77 | The least recently used list which will be a list of backends serviced by 78 | the broker. Each item on the list will have the following properties 79 | services: list of services the backend supports 80 | last_heartbeat: used to determine if the backend is alive or not. 81 | 82 | error_Workers: 83 | If due to a configuration issue or some other problem a Worker is 84 | repeatedly not negotiating correctly a Worker can be ignored as much 85 | as is possible with zeromq. 86 | 87 | family: 88 | This option may be blank, if families aren't used in the network. Otherwise 89 | this is a single item for the family a broker is a member of. 90 | 91 | fallback_families: 92 | While a broker may be a member of only one family, it can have several 93 | fallback families. Fallback families are used to route Workers to in the event 94 | that the primary family is unable to process it's request. 95 | 96 | family_config: 97 | This is a list of family configs. The Broker should have it's own family 98 | config plus any fallback family configs as well. See [ FAMILY CONFIG ] 99 | below for family config options. 100 | 101 | [ FAMILY CONFIG ] 102 | 103 | name: 104 | Unique name for the family. 105 | 106 | brokers: 107 | List of brokers in the family. Includes: 108 | identity: broker identity 109 | services: services served by broker 110 | config_sub_port: Port to subscribe to for config updates 111 | 112 | admins: 113 | List of username and 1 way hashed passwords used for administration of the 114 | family. 115 | -------------------------------------------------------------------------------- /docs/dictionary.txt: -------------------------------------------------------------------------------- 1 | This is a list of terms used and their meanings. A consistent dictionary will 2 | make it a lot easier for people reading the documentation to understand it. 3 | 4 | Broker: The primary middleware for the network. This term is used 5 | when referring to the middleware itself and client communications. 6 | 7 | Fallback Broker: Fallback Brokers are an optional class of brokers. They can 8 | step in for a Broker when it goes down. These are generally hard wired 9 | together, preferably with a cross over cable. See the binary star pattern 10 | discussed in the ZeroMQ guide for more information. 11 | 12 | Dispatcher: The Broker application piece that handles the Client and Worker 13 | ports as well as handles configuration management for the Broker. 14 | 15 | Router: This is the Broker application piece that the Dispatcher passes 16 | Tasks to. This handles the LRU and Wait Queues. 17 | 18 | Peer: Special term for referring to Brokers when there is Broker to Broker 19 | communication. 20 | 21 | Family: This refers to a cluster of Brokers within a single location. This can 22 | mean Brokers running on the same server using ipc://, or servers on a 23 | network using tcp:// 24 | 25 | Community: This refers to a group of Families. When it becomes necessary to 26 | route outside of a Family this is when Community comes in. This isn't 27 | well thought out yet, I'll come back to it when I get to this layer. 28 | Community may even be better as an added Worker server that can route 29 | requests between 2 families? 30 | 31 | Worker: This is a backend server, used for data processing. It maintains a 32 | connection with a Broker and handles work requests that are passed to it. 33 | 34 | Client: This is a frontend server. This passes work requests to a Broker and 35 | waits for a reply. The only time a Broker needs to be aware of a client is 36 | when it is accepting a request from one. 37 | 38 | LRU Queue: This is the queue a Broker keeps of Workers ready to accept a 39 | request. If a Fallback Broker is configured this queue must be kept in 40 | sync on both the Broker and the Fallback Broker. 41 | 42 | Wait Queue: This is the list of Task Id's and what Worker they've been 43 | assigned to. It's expected that all Workers can accept multiple requests 44 | at a time, this is primarily used to validate work replies. If a work 45 | reply is sent and it doesn't exist in this queue, it's dropped quietly. 46 | 47 | Task: This is a unit of work that get's routed to a Worker. 48 | 49 | Task Reply: This is response from a Worker that gets routed to a Client 50 | -------------------------------------------------------------------------------- /docs/protocol.txt: -------------------------------------------------------------------------------- 1 | [ PROTOCOL ] 2 | CONNECT: When intiating a new connection to a broker a worker must validate 3 | it's config. 4 | Parameters: 5 | config_type: 6 | The format the worker accepts the config in. Can be JSON, XML 7 | or ZFL. 8 | 9 | If the application already has a config, it will also pass a configversion 10 | option. 11 | 12 | CONFIG - Broker reply with configuration. This will be sent after CONNECT 13 | requests and when the Broker needs to inform the Worker of new CONFIG. 14 | Parameters: 15 | config_type: 16 | The format the config is in. 17 | config: 18 | The config. 19 | 20 | NEEDBROKER - In the event that a worker is unable to connect to it's prefered 21 | broker, it may try to connect to another one with a NEEDBROKER request. This 22 | will include the worker identity and services offered. 23 | 24 | CHANGEBROKER - This is a reply back to a Worker or Client REQ/XREQ socket 25 | telling it to connect to a different Broker. This can be a response to a 26 | NEEDBROKER request or used when a Broker needs to push the Worker to 27 | another Broker, say if it's coming down for maintenance or other reasons. 28 | 29 | PING: This is the standard heartbeat request. When a Broker receives a PING 30 | from a Worker it will add it to the LRU queue if it's not already in it. 31 | Parameters: 32 | heartbeat_interval: 33 | The interval at which the broker can expect heartbeat requests 34 | from the worker. Worker only. 35 | 36 | time: 37 | Time according to worker or client when heartbeat request was sent. 38 | If this is way off from the time on the server, then there 39 | is a problem and an error will be returned while the worker 40 | will be disconnected. 41 | 42 | config_version: 43 | Worker only, configuration version. 44 | 45 | PONG: This is the response to PING 46 | Parameters: 47 | time: 48 | Time on the Broker when the request was sent. This can be used 49 | to help keep servers in sync. This can also be used by the Worker 50 | to determine if it's slow in processing requests so it can stop 51 | sending PINGs if it's overloaded. 52 | 53 | config_version: 54 | Configuration version. 55 | 56 | TASK: This is the request asking a worker to do some work. This originates 57 | from the Client. The Router will add the following configuration items and 58 | will remove the worker from the LRU queue and add it to the wait queue. 59 | Parameters: 60 | time: 61 | Time according to worker when heartbeat request was sent. 62 | If this is way off from the time on the server, then there 63 | is a problem and an error will be returned while the worker 64 | will be disconnected. 65 | 66 | config_version: 67 | Configuration version. 68 | 69 | id: This will be the identity of the frontend 70 | that made the request and the message id, used for routing. 71 | 72 | body: The body of the request. 73 | 74 | TASKREPLY: This is the reply to a work request. If the id isn't in the wait 75 | queue for the Broker this message is discarded. 76 | Parameters: 77 | time: 78 | Time according to worker when heartbeat request was sent. 79 | If this is way off from the time on the server, then there 80 | is a problem and an error will be returned while the worker 81 | will be disconnected. 82 | 83 | config_version: 84 | Configuration version. 85 | 86 | id: This will be the identity of the frontend 87 | that made the request and the message id, used for routing. 88 | 89 | body: The body of the request. 90 | errors about the work request, it will be up to the frontend to manage that. 91 | 92 | ERROR: 93 | These are the Error codes a server may return to a client. These are based 94 | on HTTP status codes. 95 | 301: Moved Permanently: Sent to Worker and Clients when the Broker 96 | needs them to move to another Broker. The Worker can expect a 97 | NEEDBROKER request to function correctly from the Broker. 98 | 426: Upgrade Required. Sent when the Worker is sending a config version 99 | different from the Brokers. The Broker should issue a CONNECT to 100 | get the new config. 101 | 444: No Response Message. Used when the Broker believes the client 102 | is malware. 103 | 501: Not Implemented. Sent when the Worker or Client sends a request 104 | the server doesn't expect or isn't aware of. 105 | 503: Service Unavailable. Sent when the Broker is not accepting 106 | connections. A Worker may attempt a NEEDBROKER request to see 107 | if it can be redirected. If this is supplied in response to a 108 | NEEDBROKER request the Worker should not retry. 109 | 110 | FORCENEGOTIATE: 111 | Sent by broker to worker in the event the broker isn't sure if the worker 112 | is in an acceptable state. Upon getting this message from a broker a worker 113 | should initiate CONNECT. 114 | 115 | 116 | -------------------------------------------------------------------------------- /docs/scratchpad.txt: -------------------------------------------------------------------------------- 1 | *** TODO: I am being very inconsistent about frontend, backend, worker, 2 | *** worker... If I am to expect anyone to be able to follow along I'm going 3 | *** to have to come up with a consistent dictionary. 4 | *** 5 | *** Dictionary started, will remove this note when I'm done converting 6 | *** all existing documentation to the terminology. 7 | 8 | While writing documentation to build against I'll have ideas for things that 9 | should be documented better later. That's what this file is for. 10 | 11 | [ CONNECTIONS ] 12 | 13 | Examples of connections to wrap my head around how they work. 14 | C = worker 15 | B = broker 16 | 17 | - worker startup, broker down: 18 | * worker must be informed of a broker to connect to. If it's saving it's config 19 | * locally it may remember, and also have the group information to fallback on 20 | * to find a new broker. This will be encouraged for workers, but not required. 21 | 22 | C --- CONNECT ---> B 23 | No response from B, worker fails to start with error. 24 | 25 | - worker startup, broker up 26 | * worker must be informed of a broker to connect to. If it's saving it's config 27 | * locally it may remember, and also have the group information to fallback on 28 | * to find a new broker. This will be encouraged for workers, but not required. 29 | C ------CONNECT ----> B 30 | C <---- CONFIG ------ B 31 | C ----- PING -------> B 32 | C <---- PONG -------- B 33 | 34 | * It's at this point that the broker registers the worker in the LRU queue. 35 | 36 | [ DIAGRAMS ] 37 | * Simple text diagrams of what I'm thinking about for implementation. I've been 38 | * tossing around a lot of ideas in my head and just want to get them down where 39 | * I can look at them. 40 | 41 | 42 | [ FE XREP ] [ BE XREP ] 43 | -------------- 44 | | Dispatcher | 45 | -------------- 46 | [ ROUTER XREP ] 47 | 48 | 49 | 50 | [ ROUTER XREQ ] [ ROUTER XREQ ] 51 | ---------- ---------- 52 | | Router | | Router | 53 | ---------- ---------- 54 | 55 | 56 | Dispatcher: The purpose of this process is to accept and get rid of tasks 57 | as quickly as possible. Tasks could be the core Task processing objects 58 | coming in from a front end, or configuration management requests from a 59 | router. LRU and Wait Queues are stored in this object. Communication 60 | outside of the broker happens via the XREP listeners, which are usually 61 | tcp. Internal requests to the router happen via the PUSH and XREP sockets. 62 | These would usually be ipc. 63 | 64 | The Dispatcher acts as a global state object for the entire application, 65 | which helps avoid conflicts in configuration and queue management within 66 | a single broker. 67 | 68 | * The Dispatcher should not need to process any message, message processing 69 | * will be delegated to a Router for everything. 70 | 71 | Router: Routers are single process applications that handle whatever process of 72 | tasks are required. Any data parsing or other "thinking" should be handled 73 | here, as these can be load balanced across cores. The idea is to run one 74 | Router per available core. For example on a 6 core system you could expect 75 | 1 core for the OS, 1 core for the Dispatcher and then 4 cores left for 76 | Router objects. That's a very simple factoring and can likely be adjusted 77 | up or down. Likely you can get more Routers on a 6 core box and see a 78 | benefit depending on what the traffic actually looks like. 79 | 80 | 81 | [ MESSAGE FORMAT ] 82 | 83 | Figuring out the Message Format makes sense. Using pyzmq Pickle and JSON is 84 | bundled in, going to use JSON just to make it more portable, allowing for 85 | integration with other languages to act as Routers (or Dispatcher but that's 86 | mainly irrelevant since the Dispatcher doesn't parse anything). Since 87 | all message passing happens on libzmq sockets, it should be possible to write 88 | or embed Dispatchers and Routers in other applications. 89 | 90 | * This has a strong change of changing once the application starts being 91 | * written. 92 | 93 | Messages are multipart messages. Since the idea is for the Dispatcher to not 94 | parse anything at all, this can simply be left at 2 parts. Header and Body. 95 | The Header will be information about the message and relevant to all processing 96 | done by the Router, the body will be the actual message that gets passed to the 97 | application (Worker or Client). 98 | 99 | Header: Simple JSON formatted object with basic routing information. 100 | Some examples: 101 | 102 | Incoming request, passed from Client through Dispatch to Router: 103 | { 104 | "request": "TASK", 105 | "message_id: "", 106 | "service": "www.mydomain.com", 107 | "type": "REQUEST" 108 | } 109 | 110 | The Router can then take this request, add a message_id and send it on 111 | it's way to the Worker. It will also pass the message_id up to the 112 | Dispatcher which can then plug it into the Wait Queue. 113 | 114 | The Worker can then do what it needs to do, then send a message back with 115 | the header: 116 | { 117 | "request": "TASK", 118 | "message_id: "C4529276-BF5C-4502-BDF3-DF32B122D055", 119 | "service": "www.mydomain.com", 120 | "type": "RESPONSE" 121 | } 122 | 123 | The Dispatcher would send this to the Router. The Router would send a Wait 124 | Queue delete request to the Dispatcher. The Dispatcher will determine if 125 | message_id exists in the Queue, deleting it if it does and return a true/ 126 | false request to the Router. If the message existed and was therefore 127 | deleted then the Router will pass the message up to the Dispatcher on 128 | it's PUSH socket, the Dispatcher will then send it up to the Client. 129 | 130 | * ok, how does the Dispatcher know which Client to send it to. Perhaps the 131 | * RESPONSE Tasks going to a Client need to have some sort of quick 132 | * addressing the Dispatcher can use without having to parse a JSON string. 133 | * Need to think about this more and also have to stop working on this now. 134 | * 135 | * Reading chapter 3 of the guide again. I think router/dealer sockets will 136 | * handle this for the most part. Might be time to write a test program. 137 | -------------------------------------------------------------------------------- /docs/worker.txt: -------------------------------------------------------------------------------- 1 | [ WORKER ] 2 | Workers are applications that accept Tasks from Brokers and send Task replies 3 | when they are done processing. The requirements for a Worker application are 4 | 5 | - Able to make ZeroMQ REQ sockets 6 | This is the core way that a Worker accepts tasks. REQ or XREQ depends on the 7 | ability of the worker to accept multiple tasks. 8 | 9 | - Follow the handshake and heartbeart requirements as follows. 10 | 11 | Initial Connection: 12 | Worker connects to the broker it's been told to connect to. It's up to 13 | the application developer to determine how an application is told which 14 | broker to connect to. This may be a command line switch, or the application 15 | can save it's config from a previous run and get the Broker connection 16 | information from there. Choice of REQ or XREQ is up to the client. 17 | 18 | An example connection looks like this. 19 | 20 | W ------CONNECT ----> B 21 | W <---- CONFIG ------ B 22 | W ----- PING -------> B 23 | W <---- PONG -------- B 24 | 25 | It's important to note that the PING is necessary to start receiving 26 | requests. This informs the Broker that the application is ready to take 27 | another Task. 28 | 29 | If the Worker is Asynchronous it should send a new PING immediately on 30 | receipt of a Task. This will allow the Broker to keep feeding the Worker 31 | Tasks as fast as it can. This also allows the developer of the Worker 32 | application to manage load internally. If the Worker is self aware enough 33 | to know when it should not be accepting more Tasks, it can simply not send 34 | a PING until it is ready. 35 | 36 | For more information about other responses please see the protocol document. 37 | It's expected that all Workers will be able to handle all error codes for 38 | example. 39 | 40 | -------------------------------------------------------------------------------- /scale0/scale0.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # 3 | # -*- coding: utf-8 -*- 4 | # 5 | # Copyright 2011 Joseph Bowman 6 | # 7 | # Licensed under the Apache License, Version 2.0 (the "License"); you may 8 | # not use this file except in compliance with the License. You may obtain 9 | # a copy of the License at 10 | # 11 | # http://www.apache.org/licenses/LICENSE-2.0 12 | # 13 | # Unless required by applicable law or agreed to in writing, software 14 | # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT 15 | # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the 16 | # License for the specific language governing permissions and limitations 17 | # under the License. 18 | 19 | import sys 20 | import calendar 21 | import time 22 | import zmq 23 | import uuid 24 | import tnetstrings 25 | from zmq.eventloop import ioloop, zmqstream 26 | 27 | class Dispatcher(): 28 | def __init__(self, 29 | client_socket_uri="tcp://127.0.0.1:8080", 30 | worker_xrep_socket_uri="tcp://127.0.0.1:8081", 31 | pub_socket_uri="tcp://127.0.0.1:8082", 32 | my_id=str(uuid.uuid4()), 33 | routers=2, heartbeat=1, liveness=3): 34 | 35 | self.my_id = my_id 36 | self.heartbeat_interval = heartbeat * 1000 37 | self.heartbeat_liveness = liveness 38 | 39 | """ Workers info would look something like 40 | { 41 | "worker1": { "services": ["web"], "last_ping": int(time.time())} 42 | "worker2": { "services": ["web"], "last_ping": int(time.time())} 43 | "worker3": { "services": ["news", "mail"], "last_ping": int(time.time())} 44 | } 45 | Eventually I'll move it to an object with getter and setters which 46 | can use something like gaeutilities event to notify the main 47 | application when a worker is added. That way requests don't 48 | get dropped. 49 | 50 | *id is usually a uuid, but really as long as they are unique Scale0 should not care. 51 | """ 52 | 53 | self.workers = {} 54 | self.LRU = [] 55 | 56 | self.context = zmq.Context.instance() 57 | self.loop = ioloop.IOLoop.instance() 58 | 59 | self.worker_xrep_socket = self.context.socket(zmq.XREP) 60 | self.worker_xrep_socket.setsockopt(zmq.IDENTITY, "%s-worker" % self.my_id) 61 | self.worker_xrep_socket.bind(worker_xrep_socket_uri) 62 | 63 | self.worker_xrep_stream = zmqstream.ZMQStream(self.worker_xrep_socket, self.loop) 64 | self.worker_xrep_stream.on_recv(self.worker_handler) 65 | 66 | self.pub_socket = self.context.socket(zmq.PUB) 67 | self.pub_socket.setsockopt(zmq.IDENTITY, "%s_broker_pub" % self.my_id) 68 | self.pub_socket.bind(pub_socket_uri) 69 | 70 | self.pub_stream = zmqstream.ZMQStream(self.pub_socket, self.loop) 71 | ioloop.PeriodicCallback(self.send_pings, self.heartbeat_interval, self.loop).start() 72 | 73 | self.loop.start() 74 | 75 | def worker_handler(self, message): 76 | """ worker_handler handles messages from worker sockets. Messages 77 | are 3+ part ZeroMQ multipart messages. (worker_id, command, request). 78 | 79 | worker_id is supplied as part of the ROUTER socket requirements and is 80 | used to send replies back. 81 | 82 | command is mapped to functions. This allows an undefined method error 83 | to be thrown if the command isn't an acceptable method. Also just 84 | easier to maintain the code if each command is it's own method. 85 | 86 | request is the rest of the message, can be multiple parts and Scale0 87 | will generally ignore it except to pass it on. 88 | """ 89 | sock = self.worker_xrep_stream 90 | 91 | getattr(self, message[1].lower())(sock, message) 92 | 93 | def send_pings(self): 94 | """ pings are the heartbeat check to determine if the workers listed 95 | in the LRU queue are still available. A socket is created and the ping 96 | is sent to the listener socket on the worker. The worker will reply 97 | with a ping back to the worker_response_socket. 98 | """ 99 | ping_time = str(time.time()) 100 | self.pub_socket.send_multipart(["PING", ping_time ]) 101 | 102 | def ping(self, sock, message): 103 | """ ping message received is a reply to a ping for a worker in the LRU queue. """ 104 | (worker_id, command, request) = message 105 | if self.workers.has_key(worker_id): 106 | self.workers[worker_id]["last_ping"] = float(request) 107 | print 'got ping from %s' % worker_id 108 | 109 | 110 | def heartbeat(self, sock, message): 111 | """ For heartbeat we just shoot the request right back at the sender. 112 | Don't even bother to parse anything to save time. 113 | """ 114 | print message 115 | self.pub_socket.send_multipart(message) 116 | 117 | def ready(self, sock, message): 118 | """ ready is the worker informing Scale0 it can accept more jobs. 119 | """ 120 | 121 | (worker_id, command, services) = message 122 | self.workers[worker_id] = {"services": services.split(","), 123 | "last_ping": time.time()} 124 | self.LRU.append(worker_id) 125 | self.pub_socket.send_multipart([worker_id, "OK"]) 126 | print "Worker %s READY" % worker_id 127 | 128 | if __name__ == "__main__": 129 | Dispatcher() 130 | -------------------------------------------------------------------------------- /scale0/test_client.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joerussbowman/Scale0/663cca669d7f968a6f0237e0b4af0a327e831aee/scale0/test_client.py -------------------------------------------------------------------------------- /scale0/test_worker.py: -------------------------------------------------------------------------------- 1 | import calendar 2 | import time 3 | import zmq 4 | import uuid 5 | import tnetstrings 6 | from zmq.eventloop import ioloop, zmqstream 7 | 8 | class Worker(): 9 | def __init__(self, connect_to, listen_on="tcp://127.0.0.1:9080"): 10 | """ The worker connects to a socket to communicate with the Dispatcher 11 | in the Broker. This allows the Dispatcher to manage it's LRU queue using 12 | the worker. A listener socket is instatiated. This is the socket that the 13 | Router in the Broker will make requests to. 14 | """ 15 | self.my_id = str(uuid.uuid4()) 16 | self.context = zmq.Context.instance() 17 | self.loop = ioloop.IOLoop.instance() 18 | self.listen_on = listen_on 19 | 20 | self.xreq_socket = self.context.socket(zmq.XREQ) 21 | self.xreq_socket.setsockopt(zmq.IDENTITY, "%s" % self.my_id) 22 | self.xreq_socket.connect(connect_to) 23 | self.xreq_stream = zmqstream.ZMQStream(self.xreq_socket, self.loop) 24 | 25 | self.sub_socket = self.context.socket(zmq.SUB) 26 | self.sub_socket.connect("tcp://127.0.0.1:8082") 27 | self.sub_socket.setsockopt(zmq.SUBSCRIBE, self.my_id) 28 | print "subscribed to %s" % self.my_id 29 | self.sub_socket.setsockopt(zmq.SUBSCRIBE,"PING") 30 | self.sub_stream = zmqstream.ZMQStream(self.sub_socket, self.loop) 31 | 32 | self.heartbeat_stamp = None 33 | self.heartbeats = [] 34 | 35 | 36 | """ self.connection_state can be 1 of 3 ints 37 | 0: not connected (not in LRU queue on broker) 38 | 1: connection pending (READY sent) 39 | 2: connected (OK recieved, in LRU queue) 40 | """ 41 | self.connection_state = 0 42 | 43 | self.xreq_stream.on_recv(self.xreq_handler) 44 | self.sub_stream.on_recv(self.sub_handler) 45 | 46 | ioloop.DelayedCallback(self.connect, 1000, self.loop).start() 47 | ioloop.PeriodicCallback(self.send_heartbeat, 1000, self.loop).start() 48 | 49 | self.loop.start() 50 | 51 | def send_heartbeat(self): 52 | if self.connection_state == 2: 53 | self.heartbeat_stamp = str(time.time()) 54 | print 'sending heartbeat %s' % self.heartbeat_stamp 55 | self.heartbeats.append(self.heartbeat_stamp) 56 | self.xreq_socket.send_multipart(["HEARTBEAT", self.heartbeat_stamp]) 57 | 58 | def xreq_handler(self, msg): 59 | (command, request) = msg 60 | if command == "HEARTBEAT": 61 | if request == self.heartbeat_stamp: 62 | print 'Got valid heartbeat %s' % request 63 | else: 64 | print "Heartbeat timestamp mismatch %s" % request 65 | self.heartbeats.remove(request) 66 | print self.heartbeats 67 | 68 | def sub_handler(self, msg): 69 | """ Trying to move to pub/sub for getting messages to workers. """ 70 | if msg[0] == "PING": 71 | self.xreq_socket.send_multipart(msg) 72 | if msg[0] == self.my_id: 73 | (id, command) = msg[:2] 74 | if command == "OK": 75 | self.connection_state = 2 76 | print 'In LRU Queue' 77 | if command == "HEARTBEAT": 78 | data = msg[2] 79 | print "Got heartbeat timestamp %s" % data 80 | 81 | def connect(self): 82 | if self.connection_state < 1: 83 | print 'connecting to broker' 84 | self.xreq_socket.send_multipart(["READY", 85 | "test"]) 86 | self.connection_state = 1 87 | 88 | if __name__ == "__main__": 89 | Worker("tcp://127.0.0.1:8081") 90 | -------------------------------------------------------------------------------- /scale0/tnetstrings.py: -------------------------------------------------------------------------------- 1 | # Note this implementation is more strict than necessary to demonstrate 2 | # minimum restrictions on types allowed in dictionaries. 3 | 4 | def dump(data): 5 | if type(data) is long or type(data) is int: 6 | out = str(data) 7 | return '%d:%s#' % (len(out), out) 8 | elif type(data) is str: 9 | return '%d:' % len(data) + data + ',' 10 | elif type(data) is dict: 11 | return dump_dict(data) 12 | elif type(data) is list: 13 | return dump_list(data) 14 | elif data == None: 15 | return '0:~' 16 | elif type(data) is bool: 17 | out = repr(data).lower() 18 | return '%d:%s!' % (len(out), out) 19 | else: 20 | assert False, "Can't serialize stuff that's %s." % type(data) 21 | 22 | 23 | def parse(data): 24 | payload, payload_type, remain = parse_payload(data) 25 | 26 | if payload_type == '#': 27 | value = int(payload) 28 | elif payload_type == '}': 29 | value = parse_dict(payload) 30 | elif payload_type == ']': 31 | value = parse_list(payload) 32 | elif payload_type == '!': 33 | value = payload == 'true' 34 | elif payload_type == '~': 35 | assert len(payload) == 0, "Payload must be 0 length for null." 36 | value = None 37 | elif payload_type == ',': 38 | value = payload 39 | else: 40 | assert False, "Invalid payload type: %r" % payload_type 41 | 42 | return value, remain 43 | 44 | def parse_payload(data): 45 | assert data, "Invalid data to parse, it's empty." 46 | length, extra = data.split(':', 1) 47 | length = int(length) 48 | 49 | payload, extra = extra[:length], extra[length:] 50 | assert extra, "No payload type: %r, %r" % (payload, extra) 51 | payload_type, remain = extra[0], extra[1:] 52 | 53 | assert len(payload) == length, "Data is wrong length %d vs %d" % (length, len(payload)) 54 | return payload, payload_type, remain 55 | 56 | def parse_list(data): 57 | if len(data) == 0: return [] 58 | 59 | result = [] 60 | value, extra = parse(data) 61 | result.append(value) 62 | 63 | while extra: 64 | value, extra = parse(extra) 65 | result.append(value) 66 | 67 | return result 68 | 69 | def parse_pair(data): 70 | key, extra = parse(data) 71 | assert extra, "Unbalanced dictionary store." 72 | value, extra = parse(extra) 73 | 74 | return key, value, extra 75 | 76 | def parse_dict(data): 77 | if len(data) == 0: return {} 78 | 79 | key, value, extra = parse_pair(data) 80 | assert type(key) is str, "Keys can only be strings." 81 | 82 | result = {key: value} 83 | 84 | while extra: 85 | key, value, extra = parse_pair(extra) 86 | result[key] = value 87 | 88 | return result 89 | 90 | 91 | 92 | def dump_dict(data): 93 | result = [] 94 | for k,v in data.items(): 95 | result.append(dump(str(k))) 96 | result.append(dump(v)) 97 | 98 | payload = ''.join(result) 99 | return '%d:' % len(payload) + payload + '}' 100 | 101 | 102 | def dump_list(data): 103 | result = [] 104 | for i in data: 105 | result.append(dump(i)) 106 | 107 | payload = ''.join(result) 108 | return '%d:' % len(payload) + payload + ']' 109 | 110 | 111 | --------------------------------------------------------------------------------