├── .gitignore
├── README
├── docs
    ├── broker.txt
    ├── client.txt
    ├── configs.txt
    ├── dictionary.txt
    ├── protocol.txt
    ├── scratchpad.txt
    └── worker.txt
└── scale0
    ├── scale0.py
    ├── test_client.py
    ├── test_worker.py
    └── tnetstrings.py


/.gitignore:
--------------------------------------------------------------------------------
1 | nbproject
2 | *.pyc
3 | *.swp
4 | # I generally keep tornado installed with my projects so I can
5 | # switch versions for my projects. Ignore to keep out of the repo
6 | # for this project
7 | tornado/
8 | 
9 | 


--------------------------------------------------------------------------------
/README:
--------------------------------------------------------------------------------
  1 | Scale0 - A dynamic service oriented smart load balancer.
  2 | 
  3 | What does that mean?
  4 | 
  5 | Dynamic - Scale0 is configured by the backend workers that connect to it.
  6 | There are no virtual servers, pool members or other configuration necessary.
  7 | 
  8 | Service Oriented - Scale0 routes frontend request to backends by the services
  9 | the backends announce they support.
 10 | 
 11 | Smart Load Balancer - Scale0 uses a least recently used queue methodology to 
 12 | route requests.
 13 | 
 14 | Huh? I still don't get it.
 15 | 
 16 | Say you have a dynamic website and you want to have multiple backend servers
 17 | processing requests received by a frontend webserver. A typical scenario might
 18 | be several Tornado or node.js servers being proxied by an Nginx server.
 19 | 
 20 | A traffic spike happens, you need to get more backends up to handle the load.
 21 | You hopefully have your Nginx servers load balanced so you can reconfigure them
 22 | to support more backends. Your architecture might look like
 23 | 
 24 |                                         ------------------
 25 |                                         | Backend Server |
 26 |                                         ------------------
 27 | 
 28 |                       ----------------  ------------------
 29 |                       | Proxy Server |  | Backend Server |
 30 |                       ----------------  ------------------
 31 | 
 32 | -----------------                       -----------------
 33 | | Load Balancer |                       | Backend Server |
 34 | -----------------                       ------------------
 35 | 
 36 |                       ----------------  ------------------
 37 |                       | Proxy Server |  | Backend Server |
 38 |                       ---------------   ------------------
 39 |                                         ------------------
 40 |                                         | Backend Server |
 41 |                                         ------------------
 42 | 
 43 | 
 44 | 
 45 | Adding more servers will require reconfiguring the proxy servers. Generally you
 46 | can do this by turning off one in the pool, adding the new members, bringing it
 47 | back up and doing the same on the other. If you're running in active/active
 48 | mode on your load balancer you're cutting your input capability in half while
 49 | trying to scale up.
 50 | 
 51 | Alternatively you could of course stand up another proxy server with new backends
 52 | and add it to the pool on the load balancer. Let's not pretend that the solution
 53 | is impossible without Scale0, Scale0 just provides an alternative solution that
 54 | allows you scale all the way at the backend, instead of messing around with the
 55 | front door.
 56 | 
 57 | Using Scale0, and something that supports ZeroMQ like Mongrel2 (or a module for
 58 | Nginx would nice, hint hint) you could do something like this.
 59 | 
 60 |                                                         ------------------
 61 |                                                         | Backend Server |
 62 |                                                         ------------------
 63 | 
 64 |                     ----------------                    ------------------
 65 |                     | Proxy Server |                    | Backend Server |
 66 |                     ----------------                    ------------------
 67 | 
 68 | -----------------                       ----------      ------------------
 69 | | Load Balancer |                       | Scale0 |      | Backend Server |
 70 | -----------------                       ----------      ------------------
 71 | 
 72 |                     ----------------                    ------------------
 73 |                     | Proxy Server |                    | Backend Server |
 74 |                     ----------------                    ------------------
 75 | 
 76 |                                                         ------------------
 77 |                                                         | Backend Server |
 78 |                                                         ------------------
 79 | 
 80 | Here's where the magic starts. Say you need to scale up. Just turn on more
 81 | backend servers and connect them to Scale0. That's it, you can increase your
 82 | processing capacity, or decrease it, at will. Scale0 is the elasticity that
 83 | allows you to scale your system real time for load.
 84 | 
 85 | Scale0 is also service agnostic. At this time it's not being designed with
 86 | streams in mind, but for protocols like http or smtp it should work great.
 87 | It will require services to be built supporting ZeroMQ on the frontend to
 88 | support it, Scale0 is the first piece towards a scalable architecture. 
 89 | 
 90 | Note: Scale0 is in an early state of development. Some of the current ideas
 91 | include using PUSH/PULL sockets for front end communication, allowing you to
 92 | stand up more Scale0 servers. Then PUB/SUB for the backend, using ZMQ Identity
 93 | to manage subscriptions. Internally Scale0 uses a Least Recently Used Queue
 94 | methodology requiring backend servers to announce their readiness for accepting
 95 | a request. More information will be in the docs once the project is finished
 96 | being fleshed out.
 97 | 
 98 | Note2: Hey! Doesn't that make Scale0 a single point of failure in an environment
 99 | built for avoiding that? Part of the plans for Scale0 include the ability to set
100 | up failover servers. Backend and proxy servers will need to know about the failover
101 | server, and the plan is to allow Scale0 to provide that information and also build
102 | in a protocol and management interface to allow you to fail over manually as well,
103 | say when you need to do OS upgrades on the Scale0 server.
104 | 


--------------------------------------------------------------------------------
/docs/broker.txt:
--------------------------------------------------------------------------------
 1 | * Been working scratchpad, this is out of date from actual implementation
 2 | * now. Needs updating.
 3 | 
 4 | The broker is the core of the Scale0 project. It provides the application
 5 | logic and standards for peer and client connections. 
 6 | 
 7 | Every attempt is made to stay in accord with any standards that exist
 8 | on the zeromq website, [ url goes here, working offline right now ]. The
 9 | other issues Scale0 offers solutions for are reliability, scalability and
10 | mangement flexibility.
11 | 
12 | ARCHITECTURE
13 | *** This design is going to make me need to go back over everything I've
14 | *** already typed up. Thought of this today while at work, and I think it
15 | *** should simplify the broker implementation.
16 | 
17 | *** The language of choice to developer this initially is Python. As such
18 | *** I'll be using the multiprocessing library to get around the Global
19 | *** Interpreter Lock. In other languages it may be possible to use threads,
20 | *** and take advantage of inproc://, though from what I've read this may not
21 | *** be the best idea. In Python's case, wasn't really an option so there will
22 | *** be the ipc:// overhead.
23 | ***
24 | *** ipc:// might be the overall better answer, the guide even has a comment
25 | *** premature optimization. Anyway, I've put too much time into this for now
26 | *** without writing any code. And some point I have to come up with an idea
27 | *** and either prove or disprove it with code.
28 | 
29 | Brokers will have two parts, Dispatchers and Routers. ( I'm not sold on the
30 | name Dispatcher yet, because it will do more ). At a minimum each Broker will
31 | consume 2 processes. While it could be written to only consume 1, forcing a 2
32 | process minimum will be easier, and consuming only a single core would really
33 | only be useful for dev/test implementations.
34 | 
35 | Dispatcher - The Dispatcher is what will provide the ports Clients and Workers
36 | connect to and maintain communication with other peers. The Dispatcher will
37 | manage all configuration related tasks and will push all router/worker
38 | requests down to Router threads.
39 | 
40 | Router - Router processes send Tasks to Workers, and send the Task Reply back
41 | to the Dispatcher who sends it Client who made the request.
42 | 
43 | The startup process for a broker will look like this.
44 |  - Dispatcher ioloop started.
45 |  - Dispatcher ioloop connects to any Peers and validates it's config.
46 |  - Dispatcher binds an ipc PUSH socket.
47 |  - Dispatcher kicks off Router processes with their own ioloops.
48 |  - Each router ioloop established a PULL socket to the Dispatcher PUSH.
49 |  - Dispatcher validates Routers are connected.
50 |  - Dispatcher creates a zmq_context and binds the Worker XREP socket. 
51 | * somehow that context is passed to Workers? Need to mess around with this part
52 |  - Dispatcher waits for Workers to connect, outputs to console as they do.
53 |  - Once config option for minimum workers is reached, Dispatcher creates a
54 |     zmq_context for the Clients and binds an XREP socket to it
55 | * somehow that context is passed to Workers? Need to mess around with this part
56 |  - At this point the Broker is operational and passing messages.
57 | 
58 | 
59 | SERVICES
60 | 
61 | Scale0 brokers will provide the following services:
62 | 
63 | Service Discovery: In order to meet the requirements of being service
64 | agnostic, service discovery will need to be built in. Worker clients will
65 | be able to state the serivce(s) they provide on connection to the network
66 | and change that configuration realtime. Other core services will be built
67 | on this framework. This is handled by the Dispatcher.
68 | 
69 | Configuration: A peer or client on startup should only need to be informed
70 | of at least one broker to connect to. It will the request the configuration
71 | it needs on startup. Not all brokers will be required to provide 
72 | configuration services. However, at least one server per network will need
73 | to provide the service. This is handled by the Dispatcher.
74 | 
75 | Heartbeat: Heartbeat will be the way that clients and peers are monitored
76 | for availability. This will also be used to keep a configuration up to
77 | date on configuration service brokers. Heartbeat service servers can be
78 | configured to share their heartbeat table with other peers using push or
79 | pull. For example, in a active/standby peer topology the master can receive
80 | all heartbeart information and share it with the standby. Pull can be used
81 | by peers requesting information about available workers clients. These
82 | requests wil be recieved by the Dispatcher and passed to Routers like any
83 | other Task.
84 | 
85 | CONFIGURATION
86 | 


--------------------------------------------------------------------------------
/docs/client.txt:
--------------------------------------------------------------------------------
 1 | [ CLIENT ]
 2 | Clients are the applications that generate Tasks and send them to the Broker
 3 | for processing. These are the classic frontend applications. The requirement
 4 | for the frontend are to be able to support ZeroMQ REQ or XREQ connections and
 5 | follow protocol requirements for a Client.
 6 | 
 7 | An example connection would look like this.
 8 | 
 9 | C ------TASK ----> B
10 | C <-- TASKREPLY -- B
11 | 
12 | Brokers will also accept PING requests and reply with PONG in order for the
13 | Client to determine if the Broker is available. NEEDBROKER requests can be
14 | sent to the Broker in order to get a new Broker if the Broker is available to
15 | serivce that request.
16 | 
17 | For more information about other responses please see the protocol document.
18 | It's expected that all Clients will be able to handle all error codes for
19 | example.
20 | 


--------------------------------------------------------------------------------
/docs/configs.txt:
--------------------------------------------------------------------------------
  1 | Keeping the configuration within the environment up to date. The premise
  2 | that this system is built on is that for the most part the config will be
  3 | stable. That is new brokers and Workers will not be added and removed
  4 | a lot, and config changes by admins will be rare.
  5 | 
  6 | When a config change is made, it will be sent on the PUB/SUB config
  7 | network. All brokers with the config service can PUB. All brokers should
  8 | have a SUB port on all brokers with the config service. 
  9 | 
 10 | On receipt of a config, it will check the version with it's running version.
 11 | If they are different it will update it's config. If it's the same it will
 12 | check the hash of it's config against the published one. 
 13 | 
 14 | *** Not sure what to do if the hashes don't match yet. Conflict resolution
 15 | could happen here and then it could push. Conflict resolution measures could
 16 | be a bit complicated though. ***
 17 | 
 18 | [ WORKER CONFIG ]
 19 | This is config that each worker stores. Both identity and services should
 20 | be sent with a CHECKCONFIG request.
 21 | 
 22 | identity:
 23 |     unique identifier for this Worker
 24 | 
 25 | services:
 26 |     List of services processed by the Worker.
 27 | 
 28 | family_config:
 29 |     config (including fallback family config) for family
 30 | 
 31 | prefered_broker: 
 32 |     This is the broker that the Worker should attempt to connect to.
 33 |     If it is unable to connect to this broker it should send a NEEDBROKER
 34 |     request to another broker in the family_config if any exist.
 35 | 
 36 | 
 37 | [ BROKER CONFIG ]
 38 | This details all the information a brokers require for operation. Not all
 39 | options need to be pushed to Workers. What options are required for Workers
 40 | is detailed in the [ Worker CONFIG ] section above.
 41 | 
 42 | identity:
 43 |     unique identifier for this broker
 44 | 
 45 | frontend_port: 
 46 |     The port on which frontends connect.
 47 | 
 48 | backend_port: 
 49 |     The port on which backends connect.
 50 | 
 51 | config_pub_port:
 52 |     The port on which this broker publishes new configurations.
 53 | 
 54 | config_sub_ports:
 55 |     The port(s) on which the broker listens for new configurations.
 56 | 
 57 |     *** The idea here is that everything, when possible, should listen for
 58 |     configuration changes from more than one source in order to provide
 59 |     reliability and redundancy. ****
 60 | 
 61 | state: joining, active, disabled
 62 |     joining is the default state when adding a broker to the network. Once
 63 |     it's received it's config it will go to an active state. Brokers can
 64 |     also be disabled. In a disabled state they will provide no response to
 65 |     requests except for a management request to re-enable.
 66 | 
 67 | peers: List of peer sockets to communicate with. This is the port used for
 68 |     heartbeat and work status communication.
 69 | 
 70 | default_heartbeat_interval:
 71 |     Workers can negotiate a longer heartbeat interval, but not a shorter one.
 72 | 
 73 | services: 
 74 |     list of services managed by the broker, or all
 75 | 
 76 | LRU:
 77 |     The least recently used list which will be a list of backends serviced by
 78 |     the broker. Each item on the list will have the following properties
 79 |         services: list of services the backend supports
 80 |         last_heartbeat: used to determine if the backend is alive or not.
 81 | 
 82 | error_Workers:
 83 |     If due to a configuration issue or some other problem a Worker is
 84 |     repeatedly not negotiating correctly a Worker can be ignored as much
 85 |     as is possible with zeromq. 
 86 | 
 87 | family:
 88 |     This option may be blank, if families aren't used in the network. Otherwise
 89 |     this is a single item for the family a broker is a member of.
 90 | 
 91 | fallback_families:
 92 |     While a broker may be a member of only one family, it can have several
 93 |     fallback families. Fallback families are used to route Workers to in the event
 94 |     that the primary family is unable to process it's request.
 95 | 
 96 | family_config:
 97 |     This is a list of family configs. The Broker should have it's own family
 98 |     config plus any fallback family configs as well. See [ FAMILY CONFIG ]
 99 |     below for family config options.
100 | 
101 | [ FAMILY CONFIG ]
102 | 
103 | name:
104 |     Unique name for the family.
105 | 
106 | brokers:
107 |     List of brokers in the family. Includes:
108 |         identity: broker identity
109 |         services: services served by broker
110 |         config_sub_port: Port to subscribe to for config updates 
111 | 
112 | admins:
113 |     List of username and 1 way hashed passwords used for administration of the
114 |     family.
115 | 


--------------------------------------------------------------------------------
/docs/dictionary.txt:
--------------------------------------------------------------------------------
 1 | This is a list of terms used and their meanings. A consistent dictionary will
 2 | make it a lot easier for people reading the documentation to understand it.
 3 | 
 4 | Broker: The primary middleware for the network. This term is used
 5 |     when referring to the middleware itself and client communications.
 6 | 
 7 | Fallback Broker: Fallback Brokers are an optional class of brokers. They can
 8 |     step in for a Broker when it goes down. These are generally hard wired
 9 |     together, preferably with a cross over cable. See the binary star pattern
10 |     discussed in the ZeroMQ guide for more information.
11 | 
12 | Dispatcher: The Broker application piece that handles the Client and Worker
13 |     ports as well as handles configuration management for the Broker.
14 | 
15 | Router: This is the Broker application piece that the Dispatcher passes
16 |     Tasks to. This handles the LRU and Wait Queues.
17 | 
18 | Peer: Special term for referring to Brokers when there is Broker to Broker
19 |     communication.
20 | 
21 | Family: This refers to a cluster of Brokers within a single location. This can
22 |     mean Brokers running on the same server using ipc://, or servers on a
23 |     network using tcp://
24 | 
25 | Community: This refers to a group of Families. When it becomes necessary to
26 |     route outside of a Family this is when Community comes in. This isn't
27 |     well thought out yet, I'll come back to it when I get to this layer.
28 |     Community may even be better as an added Worker server that can route
29 |     requests between 2 families?
30 | 
31 | Worker: This is a backend server, used for data processing. It maintains a 
32 |     connection with a Broker and handles work requests that are passed to it.
33 | 
34 | Client: This is a frontend server. This passes work requests to a Broker and
35 |     waits for a reply. The only time a Broker needs to be aware of a client is
36 |     when it is accepting a request from one.
37 | 
38 | LRU Queue: This is the queue a Broker keeps of Workers ready to accept a
39 |     request. If a Fallback Broker is configured this queue must be kept in
40 |     sync on both the Broker and the Fallback Broker.
41 | 
42 | Wait Queue: This is the list of Task Id's and what Worker they've been
43 |     assigned to. It's expected that all Workers can accept multiple requests
44 |     at a time, this is primarily used to validate work replies. If a work
45 |     reply is sent and it doesn't exist in this queue, it's dropped quietly.
46 | 
47 | Task: This is a unit of work that get's routed to a Worker.
48 | 
49 | Task Reply: This is response from a Worker that gets routed to a Client
50 | 


--------------------------------------------------------------------------------
/docs/protocol.txt:
--------------------------------------------------------------------------------
  1 | [ PROTOCOL ]
  2 | CONNECT: When intiating a new connection to a broker a worker must validate 
  3 |              it's config.
  4 |     Parameters:
  5 |         config_type:     
  6 |             The format the worker accepts the config in. Can be JSON, XML
  7 |             or ZFL.
  8 | 
  9 | If the application already has a config, it will also pass a configversion
 10 | option.
 11 | 
 12 | CONFIG - Broker reply with configuration. This will be sent after CONNECT
 13 |     requests and when the Broker needs to inform the Worker of new CONFIG.
 14 |     Parameters:
 15 |         config_type:
 16 |             The format the config is in.
 17 |         config:
 18 |             The config.
 19 | 
 20 | NEEDBROKER - In the event that a worker is unable to connect to it's prefered
 21 | broker, it may try to connect to another one with a NEEDBROKER request. This
 22 | will include the worker identity and services offered. 
 23 | 
 24 | CHANGEBROKER - This is a reply back to a Worker or Client REQ/XREQ socket 
 25 |     telling it to connect to a different Broker. This can be a response to a 
 26 |     NEEDBROKER request or used when a Broker needs to push the Worker to 
 27 |     another Broker, say if it's coming down for maintenance or other reasons.
 28 | 
 29 | PING: This is the standard heartbeat request. When a Broker receives a PING
 30 |     from a Worker it will add it to the LRU queue if it's not already in it.
 31 |     Parameters:
 32 |         heartbeat_interval:
 33 |             The interval at which the broker can expect heartbeat requests
 34 |             from the worker. Worker only. 
 35 | 
 36 |         time:
 37 |             Time according to worker or client when heartbeat request was sent.
 38 |             If this is way off from the time on the server, then there
 39 |             is a problem and an error will be returned while the worker
 40 |             will be disconnected.
 41 | 
 42 |         config_version: 
 43 |             Worker only, configuration version.
 44 | 
 45 | PONG: This is the response to PING
 46 |     Parameters:
 47 |         time:
 48 |             Time on the Broker when the request was sent. This can be used
 49 |             to help keep servers in sync. This can also be used by the Worker
 50 |             to determine if it's slow in processing requests so it can stop
 51 |             sending PINGs if it's overloaded.
 52 | 
 53 |         config_version: 
 54 |             Configuration version.
 55 | 
 56 | TASK: This is the request asking a worker to do some work. This originates
 57 |     from the Client. The Router will add the following configuration items and
 58 |     will remove the worker from the LRU queue and add it to the wait queue.
 59 |     Parameters:
 60 |         time:
 61 |             Time according to worker when heartbeat request was sent.
 62 |             If this is way off from the time on the server, then there
 63 |             is a problem and an error will be returned while the worker
 64 |             will be disconnected.
 65 | 
 66 |         config_version: 
 67 |             Configuration version.
 68 | 
 69 |         id: This will be the identity of the frontend
 70 |             that made the request and the message id, used for routing.
 71 | 
 72 |         body: The body of the request.
 73 | 
 74 | TASKREPLY: This is the reply to a work request. If the id isn't in the wait
 75 |     queue for the Broker this message is discarded.
 76 |     Parameters:
 77 |         time:
 78 |             Time according to worker when heartbeat request was sent.
 79 |             If this is way off from the time on the server, then there
 80 |             is a problem and an error will be returned while the worker
 81 |             will be disconnected.
 82 | 
 83 |         config_version: 
 84 |             Configuration version.
 85 | 
 86 |         id: This will be the identity of the frontend
 87 |             that made the request and the message id, used for routing.
 88 | 
 89 |         body: The body of the request.
 90 |    errors about the work request, it will be up to the frontend to manage that.
 91 | 
 92 | ERROR:
 93 |     These are the Error codes a server may return to a client. These are based
 94 |     on HTTP status codes.
 95 |         301: Moved Permanently: Sent to Worker and Clients when the Broker
 96 |             needs them to move to another Broker. The Worker can expect a
 97 |             NEEDBROKER request to function correctly from the Broker.
 98 |         426: Upgrade Required. Sent when the Worker is sending a config version
 99 |             different from the Brokers. The Broker should issue a CONNECT to
100 |             get the new config.
101 |         444: No Response Message. Used when the Broker believes the client
102 |             is malware.
103 |         501: Not Implemented. Sent when the Worker or Client sends a request
104 |             the server doesn't expect or isn't aware of.
105 |         503: Service Unavailable. Sent when the Broker is not accepting
106 |             connections. A Worker may attempt a NEEDBROKER request to see
107 |             if it can be redirected. If this is supplied in response to a
108 |             NEEDBROKER request the Worker should not retry.
109 | 
110 | FORCENEGOTIATE:
111 |     Sent by broker to worker in the event the broker isn't sure if the worker
112 |     is in an acceptable state. Upon getting this message from a broker a worker
113 |     should initiate CONNECT.
114 | 
115 | 
116 | 


--------------------------------------------------------------------------------
/docs/scratchpad.txt:
--------------------------------------------------------------------------------
  1 | *** TODO: I am being very inconsistent about frontend, backend, worker,
  2 | *** worker... If I am to expect anyone to be able to follow along I'm going
  3 | *** to have to come up with a consistent dictionary. 
  4 | ***
  5 | *** Dictionary started, will remove this note when I'm done converting
  6 | *** all existing documentation to the terminology.
  7 | 
  8 | While writing documentation to build against I'll have ideas for things that
  9 | should be documented better later. That's what this file is for.
 10 | 
 11 | [ CONNECTIONS ]
 12 | 
 13 | Examples of connections to wrap my head around how they work.
 14 | C = worker
 15 | B = broker
 16 | 
 17 |  - worker startup, broker down:
 18 | * worker must be informed of a broker to connect to. If it's saving it's config
 19 | * locally it may remember, and also have the group information to fallback on
 20 | * to find a new broker. This will be encouraged for workers, but not required.
 21 | 
 22 | C --- CONNECT ---> B
 23 | No response from B, worker fails to start with error.
 24 | 
 25 |  - worker startup, broker up
 26 | * worker must be informed of a broker to connect to. If it's saving it's config
 27 | * locally it may remember, and also have the group information to fallback on
 28 | * to find a new broker. This will be encouraged for workers, but not required.
 29 | C ------CONNECT ----> B
 30 | C <---- CONFIG ------ B
 31 | C ----- PING -------> B
 32 | C <---- PONG -------- B
 33 |  
 34 | * It's at this point that the broker registers the worker in the LRU queue.
 35 | 
 36 | [ DIAGRAMS ]
 37 | * Simple text diagrams of what I'm thinking about for implementation. I've been
 38 | * tossing around a lot of ideas in my head and just want to get them down where
 39 | * I can look at them.
 40 | 
 41 | 
 42 |                      [ FE XREP ]       [ BE XREP ]
 43 |                             --------------
 44 |                             | Dispatcher |
 45 |                             --------------
 46 |                             [ ROUTER XREP ]   
 47 | 
 48 | 
 49 | 
 50 |            [ ROUTER XREQ ]                 [ ROUTER XREQ ]
 51 |              ----------                      ----------  
 52 |              | Router |                      | Router | 
 53 |              ----------                      ----------
 54 | 
 55 | 
 56 | Dispatcher: The purpose of this process is to accept and get rid of tasks
 57 |     as quickly as possible. Tasks could be the core Task processing objects
 58 |     coming in from a front end, or configuration management requests from a
 59 |     router. LRU and Wait Queues are stored in this object. Communication 
 60 |     outside of the broker happens via the XREP listeners, which are usually
 61 |     tcp. Internal requests to the router happen via the PUSH and XREP sockets.
 62 |     These would usually be ipc.
 63 | 
 64 |     The Dispatcher acts as a global state object for the entire application,
 65 |     which helps avoid conflicts in configuration and queue management within
 66 |     a single broker.
 67 | 
 68 |     * The Dispatcher should not need to process any message, message processing
 69 |     * will be delegated to a Router for everything.
 70 | 
 71 | Router: Routers are single process applications that handle whatever process of
 72 |     tasks are required. Any data parsing or other "thinking" should be handled
 73 |     here, as these can be load balanced across cores. The idea is to run one 
 74 |     Router per available core. For example on a 6 core system you could expect
 75 |     1 core for the OS, 1 core for the Dispatcher and then 4 cores left for 
 76 |     Router objects. That's a very simple factoring and can likely be adjusted
 77 |     up or down. Likely you can get more Routers on a 6 core box and see a
 78 |     benefit depending on what the traffic actually looks like. 
 79 | 
 80 | 
 81 | [ MESSAGE FORMAT ]
 82 | 
 83 | Figuring out the Message Format makes sense. Using pyzmq Pickle and JSON is
 84 | bundled in, going to use JSON just to make it more portable, allowing for
 85 | integration with other languages to act as Routers (or Dispatcher but that's
 86 | mainly irrelevant since the Dispatcher doesn't parse anything). Since
 87 | all message passing happens on libzmq sockets, it should be possible to write
 88 | or embed Dispatchers and Routers in other applications. 
 89 | 
 90 | * This has a strong change of changing once the application starts being
 91 | * written.
 92 | 
 93 | Messages are multipart messages. Since the idea is for the Dispatcher to not
 94 | parse anything at all, this can simply be left at 2 parts. Header and Body.
 95 | The Header will be information about the message and relevant to all processing
 96 | done by the Router, the body will be the actual message that gets passed to the
 97 | application (Worker or Client).
 98 | 
 99 | Header: Simple JSON formatted object with basic routing information.
100 |     Some examples:
101 | 
102 |     Incoming request, passed from Client through Dispatch to Router:
103 |     {
104 |         "request": "TASK",
105 |         "message_id: "",
106 |         "service": "www.mydomain.com",
107 |         "type": "REQUEST"
108 |     }
109 | 
110 |     The Router can then take this request, add a message_id and send it on
111 |     it's way to the Worker. It will also pass the message_id up to the
112 |     Dispatcher which can then plug it into the Wait Queue.
113 | 
114 |     The Worker can then do what it needs to do, then send a message back with
115 |     the header:
116 |     {
117 |         "request": "TASK",
118 |         "message_id: "C4529276-BF5C-4502-BDF3-DF32B122D055",
119 |         "service": "www.mydomain.com",
120 |         "type": "RESPONSE"
121 |     }
122 | 
123 |     The Dispatcher would send this to the Router. The Router would send a Wait
124 |     Queue delete request to the Dispatcher. The Dispatcher will determine if
125 |     message_id exists in the Queue, deleting it if it does and return a true/
126 |     false request to the Router. If the message existed and was therefore
127 |     deleted then the Router will pass the message up to the Dispatcher on
128 |     it's PUSH socket, the Dispatcher will then send it up to the Client.
129 | 
130 |     * ok, how does the Dispatcher know which Client to send it to. Perhaps the
131 |     * RESPONSE Tasks going to a Client need to have some sort of quick
132 |     * addressing the Dispatcher can use without having to parse a JSON string.
133 |     * Need to think about this more and also have to stop working on this now.
134 |     *
135 |     * Reading chapter 3 of the guide again. I think router/dealer sockets will
136 |     * handle this for the most part. Might be time to write a test program.
137 | 


--------------------------------------------------------------------------------
/docs/worker.txt:
--------------------------------------------------------------------------------
 1 | [ WORKER ]
 2 | Workers are applications that accept Tasks from Brokers and send Task replies
 3 | when they are done processing. The requirements for a Worker application are
 4 | 
 5 |  - Able to make ZeroMQ REQ sockets
 6 |  This is the core way that a Worker accepts tasks. REQ or XREQ depends on the
 7 |  ability of the worker to accept multiple tasks.
 8 | 
 9 |  - Follow the handshake and heartbeart requirements as follows.
10 | 
11 |  Initial Connection:
12 |     Worker connects to the broker it's been told to connect to. It's up to
13 |     the application developer to determine how an application is told which
14 |     broker to connect to. This may be a command line switch, or the application
15 |     can save it's config from a previous run and get the Broker connection
16 |     information from there. Choice of REQ or XREQ is up to the client.
17 | 
18 |     An example connection looks like this.
19 | 
20 |     W ------CONNECT ----> B
21 |     W <---- CONFIG ------ B
22 |     W ----- PING -------> B
23 |     W <---- PONG -------- B
24 | 
25 |     It's important to note that the PING is necessary to start receiving
26 |     requests. This informs the Broker that the application is ready to take
27 |     another Task.
28 | 
29 |     If the Worker is Asynchronous it should send a new PING immediately on
30 |     receipt of a Task. This will allow the Broker to keep feeding the Worker
31 |     Tasks as fast as it can. This also allows the developer of the Worker
32 |     application to manage load internally. If the Worker is self aware enough
33 |     to know when it should not be accepting more Tasks, it can simply not send
34 |     a PING until it is ready.
35 | 
36 | For more information about other responses please see the protocol document.
37 | It's expected that all Workers will be able to handle all error codes for
38 | example.
39 |    
40 | 


--------------------------------------------------------------------------------
/scale0/scale0.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | #
  3 | # -*- coding: utf-8 -*-
  4 | #
  5 | # Copyright 2011 Joseph Bowman 
  6 | #
  7 | # Licensed under the Apache License, Version 2.0 (the "License"); you may
  8 | # not use this file except in compliance with the License. You may obtain
  9 | # a copy of the License at
 10 | #
 11 | #     http://www.apache.org/licenses/LICENSE-2.0
 12 | #
 13 | # Unless required by applicable law or agreed to in writing, software
 14 | # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 15 | # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
 16 | # License for the specific language governing permissions and limitations
 17 | # under the License.
 18 | 
 19 | import sys
 20 | import calendar
 21 | import time
 22 | import zmq
 23 | import uuid
 24 | import tnetstrings
 25 | from zmq.eventloop import ioloop, zmqstream
 26 | 
 27 | class Dispatcher():
 28 |     def __init__(self, 
 29 |             client_socket_uri="tcp://127.0.0.1:8080", 
 30 |             worker_xrep_socket_uri="tcp://127.0.0.1:8081",
 31 |             pub_socket_uri="tcp://127.0.0.1:8082",
 32 |             my_id=str(uuid.uuid4()),
 33 |             routers=2, heartbeat=1, liveness=3):
 34 | 
 35 |         self.my_id = my_id
 36 |         self.heartbeat_interval = heartbeat * 1000
 37 |         self.heartbeat_liveness = liveness
 38 | 
 39 |         """ Workers info would look something like
 40 |         {
 41 |             "worker1": { "services": ["web"], "last_ping": int(time.time())}
 42 |             "worker2": { "services": ["web"], "last_ping": int(time.time())}
 43 |             "worker3": { "services": ["news", "mail"], "last_ping": int(time.time())}
 44 |         }
 45 |         Eventually I'll move it to an object with getter and setters which
 46 |         can use something like gaeutilities event to notify the main
 47 |         application when a worker is added. That way requests don't
 48 |         get dropped. 
 49 | 
 50 |         *id is usually a uuid, but really as long as they are unique Scale0 should not care.
 51 |         """
 52 | 
 53 |         self.workers = {} 
 54 |         self.LRU = []
 55 | 
 56 |         self.context = zmq.Context.instance()
 57 |         self.loop = ioloop.IOLoop.instance()
 58 | 
 59 |         self.worker_xrep_socket = self.context.socket(zmq.XREP)
 60 |         self.worker_xrep_socket.setsockopt(zmq.IDENTITY, "%s-worker" % self.my_id)
 61 |         self.worker_xrep_socket.bind(worker_xrep_socket_uri)
 62 |         
 63 |         self.worker_xrep_stream = zmqstream.ZMQStream(self.worker_xrep_socket, self.loop)
 64 |         self.worker_xrep_stream.on_recv(self.worker_handler)
 65 | 
 66 |         self.pub_socket = self.context.socket(zmq.PUB)
 67 |         self.pub_socket.setsockopt(zmq.IDENTITY, "%s_broker_pub" % self.my_id)
 68 |         self.pub_socket.bind(pub_socket_uri)
 69 |         
 70 |         self.pub_stream = zmqstream.ZMQStream(self.pub_socket, self.loop)
 71 |         ioloop.PeriodicCallback(self.send_pings, self.heartbeat_interval, self.loop).start()
 72 | 
 73 |         self.loop.start()
 74 | 
 75 |     def worker_handler(self, message):
 76 |         """ worker_handler handles messages from worker sockets. Messages
 77 |         are 3+ part ZeroMQ multipart messages. (worker_id, command, request).
 78 | 
 79 |         worker_id is supplied as part of the ROUTER socket requirements and is
 80 |         used to send replies back.
 81 | 
 82 |         command is mapped to functions. This allows an undefined method error
 83 |         to be thrown if the command isn't an acceptable method. Also just
 84 |         easier to maintain the code if each command is it's own method.
 85 | 
 86 |         request is the rest of the message, can be multiple parts and Scale0
 87 |         will generally ignore it except to pass it on.
 88 |         """
 89 |         sock = self.worker_xrep_stream
 90 | 
 91 |         getattr(self, message[1].lower())(sock, message)
 92 | 
 93 |     def send_pings(self):
 94 |         """ pings are the heartbeat check to determine if the workers listed
 95 |         in the LRU queue are still available. A socket is created and the ping
 96 |         is sent to the listener socket on the worker. The worker will reply
 97 |         with a ping back to the worker_response_socket.
 98 |         """
 99 |         ping_time = str(time.time())
100 |         self.pub_socket.send_multipart(["PING", ping_time ])
101 | 
102 |     def ping(self, sock, message):
103 |         """ ping message received is a reply to a ping for a worker in the LRU queue. """
104 |         (worker_id, command, request) = message
105 |         if self.workers.has_key(worker_id):
106 |             self.workers[worker_id]["last_ping"] = float(request)
107 |             print 'got ping from %s' % worker_id
108 | 
109 | 
110 |     def heartbeat(self, sock, message):
111 |         """ For heartbeat we just shoot the request right back at the sender.
112 |         Don't even bother to parse anything to save time.
113 |         """
114 |         print message
115 |         self.pub_socket.send_multipart(message)
116 | 
117 |     def ready(self, sock, message):
118 |         """ ready is the worker informing Scale0 it can accept more jobs.
119 |         """
120 | 
121 |         (worker_id, command, services) = message
122 |         self.workers[worker_id] = {"services": services.split(","),
123 |             "last_ping": time.time()}
124 |         self.LRU.append(worker_id)
125 |         self.pub_socket.send_multipart([worker_id, "OK"])
126 |         print "Worker %s READY" % worker_id 
127 | 
128 | if __name__ == "__main__":
129 |     Dispatcher()
130 | 


--------------------------------------------------------------------------------
/scale0/test_client.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joerussbowman/Scale0/663cca669d7f968a6f0237e0b4af0a327e831aee/scale0/test_client.py


--------------------------------------------------------------------------------
/scale0/test_worker.py:
--------------------------------------------------------------------------------
 1 | import calendar
 2 | import time
 3 | import zmq
 4 | import uuid
 5 | import tnetstrings
 6 | from zmq.eventloop import ioloop, zmqstream
 7 | 
 8 | class Worker():
 9 |     def __init__(self, connect_to, listen_on="tcp://127.0.0.1:9080"):
10 |         """ The worker connects to a socket to communicate with the Dispatcher
11 |         in the Broker. This allows the Dispatcher to manage it's LRU queue using
12 |         the worker. A listener socket is instatiated. This is the socket that the
13 |         Router in the Broker will make requests to. 
14 |         """
15 |         self.my_id = str(uuid.uuid4())
16 |         self.context = zmq.Context.instance()
17 |         self.loop = ioloop.IOLoop.instance()
18 |         self.listen_on = listen_on
19 | 
20 |         self.xreq_socket = self.context.socket(zmq.XREQ)
21 |         self.xreq_socket.setsockopt(zmq.IDENTITY, "%s" % self.my_id)
22 |         self.xreq_socket.connect(connect_to)
23 |         self.xreq_stream = zmqstream.ZMQStream(self.xreq_socket, self.loop)
24 | 
25 |         self.sub_socket = self.context.socket(zmq.SUB)
26 |         self.sub_socket.connect("tcp://127.0.0.1:8082")
27 |         self.sub_socket.setsockopt(zmq.SUBSCRIBE, self.my_id)
28 |         print "subscribed to %s" % self.my_id
29 |         self.sub_socket.setsockopt(zmq.SUBSCRIBE,"PING")
30 |         self.sub_stream = zmqstream.ZMQStream(self.sub_socket, self.loop)
31 | 
32 |         self.heartbeat_stamp = None
33 |         self.heartbeats = []
34 | 
35 | 
36 |         """ self.connection_state can be 1 of 3 ints
37 |         0: not connected (not in LRU queue on broker)
38 |         1: connection pending (READY sent)
39 |         2: connected (OK recieved, in LRU queue)
40 |         """
41 |         self.connection_state = 0 
42 |         
43 |         self.xreq_stream.on_recv(self.xreq_handler)
44 |         self.sub_stream.on_recv(self.sub_handler)
45 | 
46 |         ioloop.DelayedCallback(self.connect, 1000, self.loop).start()
47 |         ioloop.PeriodicCallback(self.send_heartbeat, 1000, self.loop).start()
48 | 
49 |         self.loop.start()
50 | 
51 |     def send_heartbeat(self):
52 |         if self.connection_state == 2:
53 |             self.heartbeat_stamp = str(time.time())
54 |             print 'sending heartbeat %s' % self.heartbeat_stamp
55 |             self.heartbeats.append(self.heartbeat_stamp)
56 |             self.xreq_socket.send_multipart(["HEARTBEAT", self.heartbeat_stamp])
57 | 
58 |     def xreq_handler(self, msg):
59 |         (command, request) = msg
60 |         if command == "HEARTBEAT":
61 |             if request == self.heartbeat_stamp:
62 |                 print 'Got valid heartbeat %s' % request
63 |             else:
64 |                 print "Heartbeat timestamp mismatch %s" % request
65 |             self.heartbeats.remove(request)
66 |             print self.heartbeats
67 | 
68 |     def sub_handler(self, msg):
69 |         """ Trying to move to pub/sub for getting messages to workers. """
70 |         if msg[0] == "PING":
71 |             self.xreq_socket.send_multipart(msg)
72 |         if msg[0] == self.my_id:
73 |             (id, command) = msg[:2]
74 |             if command == "OK":
75 |                 self.connection_state = 2
76 |                 print 'In LRU Queue'
77 |             if command == "HEARTBEAT":
78 |                 data = msg[2]
79 |                 print "Got heartbeat timestamp %s" % data
80 | 
81 |     def connect(self):
82 |         if self.connection_state < 1:
83 |             print 'connecting to broker'
84 |             self.xreq_socket.send_multipart(["READY", 
85 |                  "test"])
86 |             self.connection_state = 1
87 | 
88 | if __name__ == "__main__":
89 |     Worker("tcp://127.0.0.1:8081")
90 | 


--------------------------------------------------------------------------------
/scale0/tnetstrings.py:
--------------------------------------------------------------------------------
  1 | # Note this implementation is more strict than necessary to demonstrate
  2 | # minimum restrictions on types allowed in dictionaries.
  3 | 
  4 | def dump(data):
  5 |     if type(data) is long or type(data) is int:
  6 |         out = str(data)
  7 |         return '%d:%s#' % (len(out), out)
  8 |     elif type(data) is str:
  9 |         return '%d:' % len(data) + data + ',' 
 10 |     elif type(data) is dict:
 11 |         return dump_dict(data)
 12 |     elif type(data) is list:
 13 |         return dump_list(data)
 14 |     elif data == None:
 15 |         return '0:~'
 16 |     elif type(data) is bool:
 17 |         out = repr(data).lower()
 18 |         return '%d:%s!' % (len(out), out)
 19 |     else:
 20 |         assert False, "Can't serialize stuff that's %s." % type(data)
 21 | 
 22 | 
 23 | def parse(data):
 24 |     payload, payload_type, remain = parse_payload(data)
 25 | 
 26 |     if payload_type == '#':
 27 |         value = int(payload)
 28 |     elif payload_type == '}':
 29 |         value = parse_dict(payload)
 30 |     elif payload_type == ']':
 31 |         value = parse_list(payload)
 32 |     elif payload_type == '!':
 33 |         value = payload == 'true'
 34 |     elif payload_type == '~':
 35 |         assert len(payload) == 0, "Payload must be 0 length for null."
 36 |         value = None
 37 |     elif payload_type == ',':
 38 |         value = payload
 39 |     else:
 40 |         assert False, "Invalid payload type: %r" % payload_type
 41 | 
 42 |     return value, remain
 43 | 
 44 | def parse_payload(data):
 45 |     assert data, "Invalid data to parse, it's empty."
 46 |     length, extra = data.split(':', 1)
 47 |     length = int(length)
 48 | 
 49 |     payload, extra = extra[:length], extra[length:]
 50 |     assert extra, "No payload type: %r, %r" % (payload, extra)
 51 |     payload_type, remain = extra[0], extra[1:]
 52 | 
 53 |     assert len(payload) == length, "Data is wrong length %d vs %d" % (length, len(payload))
 54 |     return payload, payload_type, remain
 55 | 
 56 | def parse_list(data):
 57 |     if len(data) == 0: return []
 58 | 
 59 |     result = []
 60 |     value, extra = parse(data)
 61 |     result.append(value)
 62 | 
 63 |     while extra:
 64 |         value, extra = parse(extra)
 65 |         result.append(value)
 66 | 
 67 |     return result
 68 | 
 69 | def parse_pair(data):
 70 |     key, extra = parse(data)
 71 |     assert extra, "Unbalanced dictionary store."
 72 |     value, extra = parse(extra)
 73 | 
 74 |     return key, value, extra
 75 | 
 76 | def parse_dict(data):
 77 |     if len(data) == 0: return {}
 78 | 
 79 |     key, value, extra = parse_pair(data)
 80 |     assert type(key) is str, "Keys can only be strings."
 81 | 
 82 |     result = {key: value}
 83 | 
 84 |     while extra:
 85 |         key, value, extra = parse_pair(extra)
 86 |         result[key] = value
 87 |   
 88 |     return result
 89 |     
 90 | 
 91 | 
 92 | def dump_dict(data):
 93 |     result = []
 94 |     for k,v in data.items():
 95 |         result.append(dump(str(k)))
 96 |         result.append(dump(v))
 97 | 
 98 |     payload = ''.join(result)
 99 |     return '%d:' % len(payload) + payload + '}'
100 | 
101 | 
102 | def dump_list(data):
103 |     result = []
104 |     for i in data:
105 |         result.append(dump(i))
106 | 
107 |     payload = ''.join(result)
108 |     return '%d:' % len(payload) + payload + ']'
109 | 
110 | 
111 | 


--------------------------------------------------------------------------------