├── .github └── workflows │ └── erlang.yml ├── .gitignore ├── .travis.yml ├── Makefile ├── README.md ├── eqc ├── sidejob_eqc.erl ├── supervisor_eqc.erl └── worker.erl ├── rebar.config ├── rebar.config.script ├── rebar3 ├── src ├── sidejob.app.src ├── sidejob.erl ├── sidejob_app.erl ├── sidejob_config.erl ├── sidejob_resource_stats.erl ├── sidejob_resource_sup.erl ├── sidejob_stat.erl ├── sidejob_sup.erl ├── sidejob_supervisor.erl ├── sidejob_worker.erl └── sidejob_worker_sup.erl └── test ├── full_par_ce.eqc ├── overload_children_ce.eqc ├── pool_full_par_ce.eqc ├── sidejob_eqc_prop_par_ce.eqc ├── sidejob_eqc_prop_seq_ce.eqc ├── sidejob_eqc_prop_seq_ce_+s12:12.eqc ├── supervisor_race_overrun.eqc ├── which_children_ce.eqc └── which_children_pulse_ce.eqc /.github/workflows/erlang.yml: -------------------------------------------------------------------------------- 1 | name: Erlang CI 2 | 3 | on: 4 | push: 5 | branches: [ develop ] 6 | pull_request: 7 | branches: [ develop ] 8 | 9 | 10 | jobs: 11 | 12 | build: 13 | 14 | runs-on: ubuntu-latest 15 | 16 | strategy: 17 | fail-fast: false 18 | matrix: 19 | otp: 20 | - "25.1" 21 | - "24.3" 22 | - "22.3" 23 | 24 | container: 25 | image: erlang:${{ matrix.otp }} 26 | 27 | steps: 28 | - uses: lukka/get-cmake@latest 29 | - uses: actions/checkout@v2 30 | - name: Compile 31 | run: ./rebar3 compile 32 | - name: Run xref and dialyzer 33 | run: ./rebar3 do xref, dialyzer 34 | - name: Run eunit 35 | run: ./rebar3 as gha do eunit 36 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.beam 2 | .*.swp 3 | ebin/sidejob.app 4 | .eunit 5 | _build 6 | .eqc-info 7 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: erlang 2 | otp_release: 3 | - 20.3.8 4 | - 21.3 5 | - 22.3 6 | script: 7 | - chmod u+x rebar3 8 | - ./rebar3 do upgrade, compile, xref, dialyzer, eunit 9 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: compile rel cover test dialyzer 2 | REBAR=./rebar3 3 | 4 | compile: 5 | $(REBAR) compile 6 | 7 | clean: 8 | $(REBAR) clean 9 | 10 | cover: test 11 | $(REBAR) cover 12 | 13 | test: compile 14 | $(REBAR) as test do eunit 15 | 16 | dialyzer: 17 | $(REBAR) dialyzer 18 | 19 | xref: 20 | $(REBAR) xref 21 | 22 | check: test dialyzer xref 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | (TODO: Write a better README. Current text is copied from [sidejob#1](https://github.com/basho/sidejob/pull/1)) 2 | 3 | [![Erlang CI Actions Status](https://github.com/basho/sidejob/workflows/Erlang%20CI/badge.svg)](https://github.com/basho/sidejob/actions) 4 | 5 | Note: this library was originally written to support process bounding in Riak using the sidejob_supervisor behavior. In Riak, this is used to limit the number of concurrent get/put FSMs that can be active, failing client requests with {error, overload} if the limit is ever hit. The purpose being to provide a fail-safe mechanism during extreme overload scenarios. 6 | 7 | sidejob is an Erlang library that implements a parallel, capacity-limited request pool. In sidejob, these pools are called resources. A resource is managed by multiple gen_server like processes which can be sent calls and casts using sidejob:call or sidejob:cast respectively. 8 | 9 | A resource has a fixed capacity. This capacity is split across all the workers, with each worker having a worker capacity of resource capacity/num_workers. 10 | 11 | When sending a call/cast, sidejob dispatches the request to an available worker, where available means the worker has not reached it's designated limit. Each worker maintains a usage count in a per-resource public ETS table. The process that tries to send a request will read/update slots in this table to determine available workers. 12 | 13 | This entire approach is implemented in a scalable manner. When a process tries to send a sidejob request, sidejob determines the Erlang scheduler id the process is running on, and uses that to pick a certain worker in the worker pool to try and send a request to. If that worker is at it's limit, the next worker in order is selected until all workers have been tried. Since multiple concurrent processes in Erlang will be running on different schedulers, and therefore start at different offsets in the worker list, multiple concurrent requests can be attempted with little lock contention. Specifically, different processes will be touching different slots in the ETS table and hitting different ETS segment locks. 14 | 15 | For a normal sidejob worker, the limit corresponds to the size of a worker's mailbox. Before sending a request to a worker the relevant usage value is incremented by the sender, after receiving the message the worker decrements the usage value. Thus, the total number of messages that can be sent to a set of sidejob workers is limited; in other words, a bounded process mailbox. 16 | 17 | However, sidejob workers can also implement custom usage strategies. For example, sidejob comes with the sidejob_supervisor worker that implements a parallel, capacity limited supervisor for dynamic, transient children. In this case, the capacity being managed is the number of spawned children. Trying to spawn additional results in the standard overload response from sidejob. 18 | 19 | In addition to providing a capacity limit, the sidejob_supervisor behavior is more scalable than a single OTP supervisor when there multiple processes constantly attempting to start new children via said supervisor. This is because there are multiple parallel workers rather than a single gen_server process. For example, Riak moved away from using supervisors to manage it's get and put FSMs because the supervisor ended up being a bottleneck. Unfortunately, not using a supervisor made it hard to track the number of spawned children, return a list of child pids, etc. By moving to sidejob_supervisor for get/put FSM management, Riak can now how easily track FSM pids without the scalability problems -- in addition to having the ability to bound process growth. 20 | -------------------------------------------------------------------------------- /eqc/sidejob_eqc.erl: -------------------------------------------------------------------------------- 1 | %%% File : sidejob_eqc.erl 2 | %%% Author : Ulf Norell 3 | %%% Description : 4 | %%% Created : 13 May 2013 by Ulf Norell 5 | -module(sidejob_eqc). 6 | 7 | %% Sidejob is intended to run jobs (of the form call or cast), running 8 | %% at most W jobs in parallel, and returning 'overload' if more than 9 | %% K*W jobs are waiting to be completed. Here W is the number of 10 | %% sidejob workers, and K-1 is the maximum number of jobs that can be 11 | %% waiting for any particular worker. When new jobs are submitted, 12 | %% sidejob looks for an available worker (with fewer than K-1 jobs in 13 | %% its queue), starting with one corresponding to the scheduler number 14 | %% that the caller is running on; it returns overload only if every 15 | %% worker has K-1 waiting jobs at the time sidejob checks. 16 | 17 | %% If a job crashes, then the worker that was running it is restarted, 18 | %% but any jobs waiting for that worker are lost (and, in the event of 19 | %% a call job, cause their caller to crash too). 20 | 21 | %% Sidejob is inherently non-deterministic. For example, if K*W jobs 22 | %% are running, one is about to finish, and another is about to be 23 | %% submitted, then there is a race between these two events that can 24 | %% lead the new job to be rejected as overload, or not. Even in a 25 | %% non-overload situation, a worker with a full queue which is about 26 | %% to finish a job may be assigned a new job if the finish happens 27 | %% first, or it may be assigned to the next worker if the finish 28 | %% happens second. Thus it is impossible to predict reliably which 29 | %% worker a job will be assigned to, and thus which jobs will be 30 | %% discarded when a job crashes. 31 | 32 | %% Nevertheless, this model tries to predict such outcomes 33 | %% precisely. As a result, the tests suffer from race conditions, and 34 | %% (even the sequential) tests have been failing. To address this, the 35 | %% model sleeps after every action, to allow sidejob to complete all 36 | %% the resulting actions. This sleep was originally 1ms, which was not 37 | %% always long enough, leading tests to fail. Now 38 | %% * we sleep for 2ms, 39 | %% * we check to see if the VM is "quiescent" before continuing, and 40 | %% if not, we sleep again, 41 | %% * we retry calls that return results that could be transient 42 | %% ('overload' from call and cast, 'blocked' from get_status) 43 | %% * after a restart of the task supervisor, we wait 10ms (!) because 44 | %% weird stuff happens if we don't 45 | %% This makes tests much more deterministic, at least. Fewer than one 46 | %% test in 300,000 should fail--if they fail more often than that, 47 | %% there is something wrong. 48 | 49 | %% The disadvantages of this approach are: 50 | %% * It is still possible, if much less likely, that a test fail when 51 | %% nothing is wrong. 52 | %% * This model cannot test rapid sequences of events, and so risks 53 | %% missing some bugs, because it must wait for quiescence after 54 | %% every operation. 55 | %% * It does not make sense to run parallel tests with this model. 56 | 57 | %% Three ways in which testing could be improved are: 58 | %% 1. Use PULSE, not for parallel testing, but to run sequential 59 | %% tests, because PULSE can guarantee quiescence before proceeding 60 | %% to the next operation, without sleeping in reality. This could 61 | %% make tests very much faster to run. 62 | %% 2. Create a different model that tolerates non-determinism, 63 | %% instead checking global properties such as that no more than 64 | %% W jobs are in progress simultaneously, that jobs are rejected 65 | %% as overload iff K*W jobs are currently in the system, that *at 66 | %% least* ceiling (N/K) jobs are actually running when N jobs are 67 | %% in the system. Such a model could be used to test sidejob with 68 | %% rapidly arriving events, and so might find race conditions that 69 | %% this model misses. It could also potentially run much faster, 70 | %% and thus find bugs that are simply too rare to find in a 71 | %% realistic time with a model that sleeps frequently. 72 | %% 3. Make the 'intensity' parameter of the sidejob supervisor 73 | %% configurable--at present it is always 10, which means that a 74 | %% supervisor restart only happens after ten jobs crash. This 75 | %% makes test that fail in this situation long, and as a result 76 | %% they shrink very slowly. 77 | 78 | -include_lib("eqc/include/eqc_statem.hrl"). 79 | -include_lib("eqc/include/eqc.hrl"). 80 | -ifdef(PULSE). 81 | -export([prop_pulse/0, pulse_instrument/0, pulse_instrument/1]). 82 | -include_lib("pulse/include/pulse.hrl"). 83 | -endif. 84 | 85 | -export([initial_state/0]). 86 | -export([prop_seq/0]). 87 | -export([work/2, finish_work/1, crash/1, get_element/2, get_status/1]). 88 | -export([new_resource_command/1, 89 | new_resource_pre/1, new_resource_next/3, new_resource_post/3]). 90 | 91 | -export([worker/0, kill_all_pids/1]). 92 | 93 | -record(state, {limit, width, restarts = 0, workers = []}). 94 | -record(worker, {pid, scheduler, queue, status = ready, cmd}). 95 | 96 | -import(eqc_statem, [tag/2]). 97 | 98 | -compile(nowarn_unused_function). 99 | 100 | -define(RESOURCE, resource). 101 | -define(TIMEOUT, 5000). 102 | -define(RESTART_LIMIT, 10). 103 | 104 | -define(QC_OUT(P), 105 | eqc:on_output(fun(Str, Args) -> io:format(user, Str, Args) end, P)). 106 | 107 | initial_state() -> 108 | #state{}. 109 | 110 | %% -- Commands --------------------------------------------------------------- 111 | 112 | %% -- new_resource 113 | new_resource_command(_S) -> 114 | ?LET({K, W}, {choose(1, 5), oneof([?SHRINK(default, [1]), choose(1, 8)])}, 115 | case W of 116 | default -> 117 | Width = erlang:system_info(schedulers), 118 | {call, sidejob, new_resource, [?RESOURCE, worker, K * Width]}; 119 | Width -> 120 | {call, sidejob, new_resource, [?RESOURCE, worker, K * Width, Width]} 121 | end). 122 | 123 | new_resource_pre(S) -> S#state.limit == undefined. 124 | 125 | new_resource_next(S, V, Args=[_, _, _]) -> 126 | new_resource_next(S, V, Args ++ [erlang:system_info(schedulers)]); 127 | new_resource_next(S, _, [_, _, Limit, Width]) -> 128 | S#state{ limit = Limit, width = Width }. 129 | 130 | new_resource_post(_, _, V) -> 131 | case V of 132 | {ok, Pid} when is_pid(Pid) -> true; 133 | _ -> {not_ok, V} 134 | end. 135 | 136 | %% -- work 137 | work(Cmd, Scheduler) -> 138 | wait_until_quiescent(), 139 | {Worker, Status} = work0 (Cmd, Scheduler), 140 | case Status of 141 | %%overload -> 142 | %% Temporary overload is not necessarily a problem--there 143 | %% may be workers stopping/dying/being replaced. 144 | %% wait_until_quiescent(), 145 | %%work0(Cmd, Scheduler); 146 | _ -> 147 | {Worker, Status} 148 | end. 149 | 150 | work0(Cmd, Scheduler) -> 151 | status_keeper ! {start_worker, self(), Cmd, Scheduler}, 152 | Worker = receive {start_worker, Worker0} -> Worker0 end, 153 | {Worker, get_status(Worker)}. 154 | 155 | -ifdef(PULSE). 156 | gen_scheduler() -> 1. 157 | -else. 158 | gen_scheduler() -> choose(1, erlang:system_info(schedulers)). 159 | -endif. 160 | 161 | work_args(_) -> 162 | [elements([call, cast]), gen_scheduler()]. 163 | 164 | work_pre(S) -> 165 | S#state.limit /= undefined. 166 | 167 | work_next(S, V, [Cmd, Sched]) -> 168 | Pid = {call, ?MODULE, get_element, [1, V]}, 169 | Status = {call, ?MODULE, get_element, [2, V]}, 170 | S1 = do_work(S, Pid, [Cmd, Sched]), 171 | get_status_next(S1, Status, [Pid]). 172 | 173 | do_work(S, Pid, [Cmd, Sched]) -> 174 | {Queue, Status} = 175 | case schedule(S, Sched) of 176 | full -> {keep, {finished, overload}}; 177 | {blocked, Q} -> {Q, blocked}; 178 | {ready, Q} -> {Q, working} 179 | end, 180 | W = #worker{ pid = Pid, 181 | scheduler = Sched, 182 | queue = Queue, 183 | cmd = Cmd, 184 | status = Status }, 185 | S#state{ workers = S#state.workers ++ [W] }. 186 | 187 | work_post(S, [Cmd, Sched], {Pid, Status}) -> 188 | get_status_post(do_work(S, Pid, [Cmd, Sched]), [Pid], Status). 189 | 190 | %% -- get_status 191 | get_status(Worker) -> 192 | case get_status0(Worker) of 193 | blocked -> 194 | %% May just not have started yet 195 | wait_until_quiescent(), 196 | get_status0 (Worker); 197 | R -> R 198 | end. 199 | 200 | get_status0(Worker) -> 201 | status_keeper ! {get_status, self(), Worker}, 202 | receive {Worker, R} -> R 203 | end. 204 | 205 | get_status_args(S) -> 206 | [busy_worker(S)]. 207 | 208 | get_status_pre(S) -> 209 | busy_workers(S) /= []. 210 | 211 | get_status_pre(S, [Pid]) -> 212 | case get_worker(S, Pid) of 213 | #worker{ status = {working, _} } -> false; 214 | #worker{} -> true; 215 | _ -> false 216 | end. 217 | 218 | get_status_next(S, V, [WPid]) -> 219 | NewStatus = 220 | case (get_worker(S, WPid))#worker.status of 221 | {finished, _} -> stopped; 222 | blocked -> blocked; 223 | zombie -> zombie; 224 | crashed -> crashed; 225 | working -> {working, {call, ?MODULE, get_element, [2, V]}} 226 | end, 227 | set_worker_status(S, WPid, NewStatus). 228 | 229 | get_status_post(S, [WPid], R) -> 230 | case (get_worker(S, WPid))#worker.status of 231 | {finished, Res} -> eq(R, Res); 232 | blocked -> eq(R, blocked); 233 | zombie -> eq(R, blocked); 234 | crashed -> eq(R, crashed); 235 | working -> 236 | case R of 237 | {working, Pid} when is_pid(Pid) -> true; 238 | _ -> {R, '/=', {working, 'Pid'}} 239 | end 240 | end. 241 | 242 | %% -- finish 243 | finish_work(bad_element) -> ok; 244 | finish_work(Pid) -> 245 | Pid ! finish, 246 | wait_until_quiescent(). 247 | 248 | finish_work_args(S) -> 249 | [elements(working_workers(S))]. 250 | 251 | finish_work_pre(S) -> 252 | working_workers(S) /= []. 253 | 254 | finish_work_pre(S, [Pid]) -> 255 | lists:member(Pid, working_workers(S)). 256 | 257 | finish_work_next(S, _, [Pid]) -> 258 | W = #worker{} = lists:keyfind({working, Pid}, #worker.status, S#state.workers), 259 | Status = 260 | case W#worker.cmd of 261 | cast -> stopped; 262 | call -> {finished, done} 263 | end, 264 | wakeup_worker(set_worker_status(S, W#worker.pid, Status), W#worker.queue). 265 | 266 | %. -- crash 267 | crash(bad_element) -> ok; 268 | crash(Pid) -> 269 | Pid ! crash, 270 | wait_until_quiescent(). 271 | 272 | crash_args(S) -> 273 | [elements(working_workers(S))]. 274 | 275 | crash_pre(S) -> 276 | working_workers(S) /= []. 277 | 278 | crash_pre(S, [Pid]) -> 279 | lists:member(Pid, working_workers(S)). 280 | 281 | crash_next(S, _, [Pid]) -> 282 | W = #worker{} = lists:keyfind({working, Pid}, #worker.status, S#state.workers), 283 | S1 = S#state{ restarts = S#state.restarts + 1 }, 284 | S2 = set_worker_status(S1, W#worker.pid, stopped), 285 | case S2#state.restarts > ?RESTART_LIMIT of 286 | true -> kill_all_queues(S2#state{ restarts = 0 }); 287 | false -> kill_queue(S2, W#worker.queue) 288 | end. 289 | 290 | crash_post(#state{ restarts=Restarts }, [_Pid], _) -> 291 | %% This is a truly horrible hack! 292 | %% At the restart limit, the sidejob supervisor is restarted, 293 | %% which takes longer, and we see non-deterministic effects. In 294 | %% *sequential* tests, the post-condition is called directly after 295 | %% the call to crash, and we can tell from the dynamic state 296 | %% whether or not the restart limit was reached. If so, we give 297 | %% sidejob a bit more time, to avoid concommitant errors. 298 | [begin status_keeper ! supervisor_restart, 299 | timer:sleep(10) 300 | end || Restarts==?RESTART_LIMIT], 301 | true. 302 | 303 | kill_queue(S, Q) -> 304 | Kill = 305 | fun(W=#worker{ queue = Q1, status = blocked, cmd = Cmd }) when Q1 == Q -> 306 | W#worker{ queue = zombie, status = case Cmd of call->crashed; cast->zombie end }; 307 | (W) -> W end, 308 | S#state{ workers = lists:map(Kill, S#state.workers) }. 309 | 310 | kill_all_queues(S) -> 311 | Kill = fun(W=#worker{ status = Status, cmd = Cmd }) when Status /= {finished,done} -> 312 | W#worker{ queue = zombie, 313 | status = case Cmd of call->crashed; cast->zombie end }; 314 | (W) -> W end, 315 | S#state{ workers = lists:map(Kill, S#state.workers) }. 316 | 317 | %% -- Helpers ---------------------------------------------------------------- 318 | 319 | schedule(S, Scheduler) -> 320 | Limit = S#state.limit, 321 | Width = S#state.width, 322 | N = (Scheduler - 1) rem Width + 1, 323 | Queues = lists:sublist(lists:seq(N, Width) ++ lists:seq(1, Width), Width), 324 | NotReady = fun(ready) -> false; 325 | ({finished, _}) -> false; 326 | (_) -> true end, 327 | IsWorking = fun({working, _}) -> true; 328 | (working) -> true; 329 | (_) -> false end, 330 | QueueLen = 331 | fun(Q) -> 332 | Stats = [ St || #worker{ queue = AlsoQ, status = St } <- S#state.workers, 333 | AlsoQ == Q ], 334 | {length(lists:filter(NotReady, Stats)), lists:any(IsWorking, Stats)} 335 | end, 336 | Ss = [ {Q, Worker} 337 | || Q <- Queues, 338 | {Len, Worker} <- [QueueLen(Q)], 339 | Len < Limit div Width ], 340 | case Ss of 341 | [] -> full; 342 | [{Sc, false}|_] -> {ready, Sc}; 343 | [{Sc, _}|_] -> {blocked, Sc} 344 | end. 345 | 346 | get_worker(S, Pid) -> 347 | lists:keyfind(Pid, #worker.pid, S#state.workers). 348 | 349 | wakeup_worker(S, Q) -> 350 | Blocked = [ W || W=#worker{status = blocked, queue = Q1} <- S#state.workers, Q == Q1 ], 351 | case Blocked of 352 | [] -> S; 353 | [#worker{pid = Pid}|_] -> 354 | set_worker_status(S, Pid, working) 355 | end. 356 | 357 | set_worker_status(S, Pid, Status) -> 358 | set_worker_status(S, Pid, keep, Status). 359 | 360 | set_worker_status(S, Pid, _, stopped) -> 361 | S#state{ workers = lists:keydelete(Pid, #worker.pid, S#state.workers) }; 362 | set_worker_status(S, Pid, Q0, Status) -> 363 | W = get_worker(S, Pid), 364 | Q = if Q0 == keep -> W#worker.queue; 365 | true -> Q0 end, 366 | S#state{ workers = 367 | lists:keystore(Pid, #worker.pid, 368 | S#state.workers, 369 | W#worker{ queue = Q, status = Status }) }. 370 | 371 | busy_worker(S) -> 372 | ?LET(W, elements(busy_workers(S)), 373 | W#worker.pid). 374 | 375 | busy_workers(S) -> 376 | S#state.workers. 377 | 378 | working_workers(S) -> 379 | [ Pid || #worker{ status = {working, Pid}, queue = Q } <- S#state.workers, Q /= zombie ]. 380 | 381 | get_element(N, T) when is_tuple(T) -> element(N, T); 382 | get_element(_, _) -> bad_element. 383 | 384 | %% -- Worker loop ------------------------------------------------------------ 385 | 386 | worker() -> 387 | receive 388 | {call, From} -> 389 | Res = sidejob:call(?RESOURCE, {start, self(), From}), 390 | From ! {self(), Res}; 391 | {cast, From} -> 392 | Ref = make_ref(), 393 | Res = 394 | case sidejob:cast(?RESOURCE, {start, Ref, self()}) of 395 | overload -> overload; 396 | ok -> 397 | receive 398 | {started, Ref, Pid} -> 399 | {working, Pid} 400 | end 401 | end, 402 | From ! {self(), Res} 403 | end. 404 | 405 | %% -- Status keeper ---------------------------------------------------------- 406 | %% When running with parallel_commands we need a proxy process that holds the 407 | %% statuses of the workers. 408 | start_status_keeper() -> 409 | case whereis(status_keeper) of 410 | undefined -> ok; 411 | Pid -> unregister(status_keeper), exit(Pid,kill) 412 | end, 413 | register(status_keeper, spawn(fun() -> status_keeper([]) end)). 414 | 415 | status_keeper(State) -> 416 | receive 417 | {start_worker, From, Cmd, Scheduler} -> 418 | Worker = spawn_opt(fun worker/0, [{scheduler, Scheduler}]), 419 | monitor(process,Worker), 420 | Worker ! {Cmd, self()}, 421 | From ! {start_worker, Worker}, 422 | status_keeper([{worker, Worker, [], Cmd} | State]); 423 | {Worker, Status} when is_pid(Worker) -> 424 | {worker, Worker, OldStatus, Cmd} = lists:keyfind(Worker, 2, State), 425 | status_keeper(lists:keystore(Worker, 2, State, 426 | {worker, Worker, OldStatus ++ [Status], Cmd})); 427 | {'DOWN',_,process,Worker,Reason} -> 428 | [self() ! {Worker,crashed} || Reason/=normal], 429 | status_keeper(State); 430 | {get_status, From, Worker} -> 431 | case lists:keyfind(Worker, 2, State) of 432 | {worker, Worker, [Status | NewStatus0], Cmd} -> 433 | NewStatus = case Status of crashed -> [crashed]; _ -> NewStatus0 end, 434 | From ! {Worker, Status}, 435 | status_keeper(lists:keystore(Worker, 2, State, 436 | {worker, Worker, NewStatus, Cmd})); 437 | _ -> 438 | From ! {Worker, blocked}, 439 | status_keeper(State) 440 | end; 441 | supervisor_restart -> 442 | %% all workers crash; pending status messages must be discarded 443 | flush_all_messages(), 444 | status_keeper([{worker,Worker, 445 | [case Msg of 446 | {working,_} when Cmd==call -> crashed; 447 | {working,_} when Cmd==cast -> blocked; 448 | _ -> Msg 449 | end || Msg <- Msgs], 450 | Cmd} 451 | || {worker,Worker,Msgs,Cmd} <- State]) 452 | end. 453 | 454 | flush_all_messages() -> 455 | receive _ -> flush_all_messages() after 0 -> ok end. 456 | 457 | %% -- Property --------------------------------------------------------------- 458 | 459 | prop_seq() -> 460 | ?FORALL(Repetitions,?SHRINK(1,[100]), 461 | ?FORALL(Cmds, commands(?MODULE), 462 | ?ALWAYS(Repetitions, 463 | ?TIMEOUT(?TIMEOUT, 464 | ?SOMETIMES(1,%10, 465 | begin 466 | cleanup(), 467 | HSR={_, S, R} = run_commands(?MODULE, Cmds), 468 | [ exit(Pid, kill) || #worker{ pid = Pid } <- S#state.workers, is_pid(Pid) ], 469 | aggregate(command_names(Cmds), 470 | pretty_commands(?MODULE, Cmds, HSR, 471 | R == ok)) 472 | end))))). 473 | 474 | %% Because these tests try to wait for quiescence after each 475 | %% operation, it is not really meaninful to run parallel tests. 476 | 477 | %% prop_par() -> 478 | %% ?FORALL(Cmds, parallel_commands(?MODULE), 479 | %% ?TIMEOUT(?TIMEOUT, 480 | %% % ?SOMETIMES(4, 481 | %% begin 482 | %% cleanup(), 483 | %% HSR={SeqH, ParH, R} = run_parallel_commands(?MODULE, Cmds), 484 | %% kill_all_pids({SeqH, ParH}), 485 | %% aggregate(command_names(Cmds), 486 | %% pretty_commands(?MODULE, Cmds, HSR, 487 | %% R == ok)) 488 | %% end)). 489 | %% 490 | %% -ifdef(PULSE). 491 | %% prop_pulse() -> 492 | %% ?SETUP(fun() -> N = erlang:system_flag(schedulers_online, 1), 493 | %% fun() -> erlang:system_flag(schedulers_online, N) end end, 494 | %% ?FORALL(Cmds, parallel_commands(?MODULE), 495 | %% ?PULSE(HSR={_, _, R}, 496 | %% begin 497 | %% cleanup(), 498 | %% run_parallel_commands(?MODULE, Cmds) 499 | %% end, 500 | %% aggregate(command_names(Cmds), 501 | %% pretty_commands(?MODULE, Cmds, HSR, 502 | %% R == ok))))). 503 | %% -endif. 504 | 505 | kill_all_pids(Pid) when is_pid(Pid) -> exit(Pid, kill); 506 | kill_all_pids([H|T]) -> kill_all_pids(H), kill_all_pids(T); 507 | kill_all_pids(T) when is_tuple(T) -> kill_all_pids(tuple_to_list(T)); 508 | kill_all_pids(_) -> ok. 509 | 510 | cleanup() -> 511 | start_status_keeper(), 512 | error_logger:tty(false), 513 | (catch application:stop(sidejob)), 514 | % error_logger:tty(true), 515 | application:start(sidejob). 516 | 517 | -ifdef(PULSE). 518 | pulse_instrument() -> 519 | [ pulse_instrument(File) || File <- filelib:wildcard("../src/*.erl") ++ 520 | filelib:wildcard("../test/*.erl") ]. 521 | 522 | pulse_instrument(File) -> 523 | Modules = [ application, application_controller, application_master, 524 | application_starter, gen, gen_event, gen_fsm, gen_server, 525 | proc_lib, supervisor ], 526 | ReplaceModules = 527 | [{Mod, list_to_atom(lists:concat([pulse_, Mod]))} 528 | || Mod <- Modules], 529 | io:format("compiling ~p~n", [File]), 530 | {ok, Mod} = compile:file(File, [{d, 'PULSE', true}, {d, 'EQC', true}, 531 | {parse_transform, pulse_instrument}, 532 | {pulse_side_effect, [{ets, '_', '_'}]}, 533 | {pulse_replace_module, ReplaceModules}]), 534 | code:purge(Mod), 535 | code:load_file(Mod), 536 | Mod. 537 | -endif. 538 | 539 | %% Wait for quiescence: to get deterministic testing, we need to let 540 | %% sidejob finish what it is doing. 541 | 542 | busy_processes() -> 543 | [Pid || Pid <- processes(), 544 | {status,Status} <- [erlang:process_info(Pid,status)], 545 | Status /= waiting, 546 | Status /= suspended]. 547 | 548 | quiescent() -> 549 | busy_processes() == [self()]. 550 | 551 | wait_until_quiescent() -> 552 | timer:sleep(2), 553 | case quiescent() of 554 | true -> 555 | ok; 556 | false -> 557 | %% This happens regularly 558 | wait_until_quiescent() 559 | end. 560 | -------------------------------------------------------------------------------- /eqc/supervisor_eqc.erl: -------------------------------------------------------------------------------- 1 | %%% File : supervisor_eqc.erl 2 | %%% Author : Ulf Norell 3 | %%% Description : 4 | %%% Created : 15 May 2013 by Ulf Norell 5 | -module(supervisor_eqc). 6 | 7 | -export([ 8 | prop_seq/0 %, 9 | % prop_par/0 10 | ]). 11 | 12 | -export([initial_state/0, start_worker/0]). 13 | -export([new_resource/1, new_resource/2, new_resource_args/1, 14 | new_resource_pre/1, new_resource_next/3, new_resource_post/3]). 15 | 16 | -export([work/2, work_args/1, 17 | work_pre/1, work_next/3, work_post/3]). 18 | 19 | -export([terminate/2, terminate_args/1, 20 | terminate_pre/1, terminate_next/3]). 21 | 22 | -export([worker/0, kill_all_pids/1]). 23 | 24 | -include_lib("eqc/include/eqc_statem.hrl"). 25 | -include_lib("eqc/include/eqc.hrl"). 26 | -ifdef(PULSE). 27 | -export([prop_pulse/0]). 28 | -include_lib("pulse/include/pulse.hrl"). 29 | -endif. 30 | 31 | -record(state, {limit, width, children = [], 32 | %% there is a bug in the supervisor that actually 33 | %% means it can start (limit*(limit+1)) / 2 processes. 34 | fuzz_limit}). 35 | -record(child, {pid}). 36 | 37 | -import(eqc_statem, [tag/2]). 38 | 39 | -define(RESOURCE, resource). 40 | -define(SLEEP, 1). 41 | -define(TIMEOUT, 5000). 42 | -define(RESTART_LIMIT, 10). 43 | 44 | initial_state() -> 45 | #state{}. 46 | 47 | %% -- Commands --------------------------------------------------------------- 48 | 49 | %% -- new_resource 50 | new_resource(Limit) -> 51 | R = sidejob:new_resource(?RESOURCE, sidejob_supervisor, Limit), 52 | timer:sleep(?SLEEP), 53 | R. 54 | 55 | new_resource(Limit, Width) -> 56 | R = sidejob:new_resource(?RESOURCE, sidejob_supervisor, Limit, Width), 57 | timer:sleep(?SLEEP), 58 | R. 59 | 60 | new_resource_args(_S) -> 61 | ?LET({K, W}, {choose(1, 5), oneof([?SHRINK(default, [1]), choose(1, 8)])}, 62 | case W of 63 | default -> [K * erlang:system_info(schedulers)]; 64 | Width -> [K * Width, Width] 65 | end). 66 | 67 | new_resource_pre(S) -> S#state.limit == undefined. 68 | 69 | new_resource_next(S, V, Args=[_]) -> 70 | new_resource_next(S, V, Args ++ [erlang:system_info(schedulers)]); 71 | new_resource_next(S, _, [Limit, Width]) -> 72 | S#state{ limit = Limit, width = Width, fuzz_limit= ((Limit * (Limit +1)) div 2) }. 73 | 74 | new_resource_post(_, _, V) -> 75 | case V of 76 | {ok, Pid} when is_pid(Pid) -> true; 77 | _ -> {not_ok, V} 78 | end. 79 | 80 | %% -- work 81 | work(Cmd, Scheduler) -> 82 | Worker = spawn_opt(fun proxy/0, [{scheduler, Scheduler}]), 83 | Worker ! {Cmd, self()}, 84 | receive 85 | {Worker, Reply} -> 86 | case Reply of 87 | {ok, Pid} when is_pid(Pid) -> Pid; 88 | Other -> Other 89 | end 90 | after 100 -> timeout 91 | end. 92 | 93 | -ifdef(PULSE). 94 | gen_scheduler() -> 1. 95 | -else. 96 | gen_scheduler() -> choose(1, erlang:system_info(schedulers)). 97 | -endif. 98 | 99 | work_args(_) -> 100 | [elements([start_child, spawn_mfa, spawn_fun]), gen_scheduler()]. 101 | 102 | work_pre(S) -> 103 | S#state.limit /= undefined. 104 | 105 | work_next(S, V, [_Cmd, _Sched]) -> 106 | case length(S#state.children) =< S#state.fuzz_limit of 107 | false -> S; 108 | true -> S#state{ children = S#state.children ++ [#child{pid = V}] } 109 | end. 110 | 111 | work_post(S, [_Cmd, _Sched], V) -> 112 | Children = filter_children(S#state.children), 113 | case {V, length(Children), S#state.limit, S#state.fuzz_limit} of 114 | {{error, overload}, LChildren, Limit, _FuzzLimit} when LChildren >= Limit -> 115 | true; 116 | {{error, overload}, LChildren, Limit, _FuzzLimit} -> 117 | {false, not_overloaded, LChildren, Limit}; 118 | {Pid, LChildren, _Limit, FuzzLimit} when is_pid(Pid), LChildren =< FuzzLimit -> 119 | true; 120 | {Pid, _LChildren, _Limit, _FuzzLimit} when not is_pid(Pid) -> 121 | {invalid_return, expected_pid, Pid}; 122 | {_Pid, LChildren, _Limit, FuzzLimit} -> 123 | {false, fuzz_limit_broken, LChildren, FuzzLimit} 124 | end. 125 | 126 | %% -- Finish work ------------------------------------------------------------ 127 | 128 | terminate(Pid, Reason) when is_pid(Pid) -> 129 | Pid ! Reason, 130 | timer:sleep(?SLEEP); 131 | terminate({error, overload}, _Reason) -> 132 | timer:sleep(?SLEEP). 133 | 134 | terminate_args(S) -> 135 | [elements([ C#child.pid || C <- S#state.children ]), 136 | elements([normal, crash])]. 137 | 138 | terminate_pre(S) -> S#state.children /= []. 139 | terminate_pre(S, [Pid, _]) -> lists:keymember(Pid, #child.pid, S#state.children). 140 | terminate_next(S, _, [Pid, _]) -> 141 | S#state{ children = lists:keydelete(Pid, #child.pid, S#state.children) }. 142 | 143 | %% -- which_children --------------------------------------------------------- 144 | 145 | which_children_command(_S) -> 146 | {call, sidejob_supervisor, which_children, [?RESOURCE]}. 147 | 148 | which_children_pre(S) -> S#state.limit /= undefined. 149 | 150 | which_children_post(S, [_], V) when is_list(V) -> 151 | %% NOTE: This is a hack to pass the test 152 | 153 | %% XXX: there is an undiagnosed bug that leads to the 154 | %% counter-example in test/which_children_pulse_ce.eqc and still 155 | %% needs fixing. As this software has been released and running 156 | %% for a long time with no reported bugs this temporary hack is 157 | %% accepted for now. 158 | case ordsets:is_subset(lists:sort(V), lists:sort(filter_children(S#state.children))) of 159 | true -> 160 | true; 161 | false -> 162 | {lists:sort(V), not_subset, lists:sort(filter_children(S#state.children))} 163 | end; 164 | which_children_post(_, [_], V) -> 165 | {not_a_list, V}. 166 | 167 | %% since we allow more than Limit processes (sidejob race bug, see 168 | %% fuzz_limit above) the children list may sometimes contain 169 | %% `overload` tuples. This function filters those out. 170 | filter_children(Children) -> 171 | [Pid || #child{pid=Pid} <- Children, 172 | is_pid(Pid)]. 173 | 174 | %% -- Weights ---------------------------------------------------------------- 175 | 176 | weight(_, work) -> 5; 177 | weight(_, terminate) -> 4; 178 | weight(_, which_children) -> 1; 179 | weight(_, _) -> 1. 180 | 181 | %% -- Workers and proxies ---------------------------------------------------- 182 | 183 | worker() -> 184 | receive normal -> ok; 185 | crash -> exit(crash) end. 186 | 187 | start_worker() -> 188 | {ok, spawn_link(fun worker/0)}. 189 | 190 | proxy() -> 191 | receive 192 | {Cmd, From} -> 193 | Res = 194 | case Cmd of 195 | start_child -> sidejob_supervisor:start_child(?RESOURCE, ?MODULE, start_worker, []); 196 | spawn_mfa -> sidejob_supervisor:spawn(?RESOURCE, ?MODULE, worker, []); 197 | spawn_fun -> sidejob_supervisor:spawn(?RESOURCE, fun() -> worker() end) 198 | end, 199 | From ! {self(), Res} 200 | end. 201 | 202 | %% -- Property --------------------------------------------------------------- 203 | 204 | prop_seq() -> 205 | ?FORALL(Cmds, commands(?MODULE), 206 | ?TIMEOUT(?TIMEOUT, 207 | ?SOMETIMES(4, 208 | begin 209 | cleanup(), 210 | HSR={H, S, R} = run_commands(?MODULE, Cmds), 211 | kill_all_pids({H, S}), 212 | aggregate(command_names(Cmds), 213 | pretty_commands(?MODULE, Cmds, HSR, 214 | R == ok)) 215 | end))). 216 | 217 | % prop_par() -> 218 | % ?FORALL(Cmds, parallel_commands(?MODULE), 219 | % ?TIMEOUT(?TIMEOUT, 220 | % % ?SOMETIMES(4, 221 | % begin 222 | % cleanup(), 223 | % HSR={SeqH, ParH, R} = run_parallel_commands(?MODULE, Cmds), 224 | % kill_all_pids({SeqH, ParH}), 225 | % aggregate(command_names(Cmds), 226 | % pretty_commands(?MODULE, Cmds, HSR, 227 | % R == ok)) 228 | % end)). 229 | 230 | -ifdef(PULSE). 231 | prop_pulse() -> 232 | ?SETUP(fun() -> N = erlang:system_flag(schedulers_online, 1), 233 | fun() -> erlang:system_flag(schedulers_online, N) end end, 234 | ?FORALL(Cmds, parallel_commands(?MODULE), 235 | ?PULSE(HSR={_, _, R}, 236 | begin 237 | cleanup(), 238 | run_parallel_commands(?MODULE, Cmds) 239 | end, 240 | aggregate(command_names(Cmds), 241 | pretty_commands(?MODULE, Cmds, HSR, 242 | R == ok))))). 243 | -endif. 244 | 245 | kill_all_pids(Pid) when is_pid(Pid) -> exit(Pid, kill); 246 | kill_all_pids([H|T]) -> kill_all_pids(H), kill_all_pids(T); 247 | kill_all_pids(T) when is_tuple(T) -> kill_all_pids(tuple_to_list(T)); 248 | kill_all_pids(_) -> ok. 249 | 250 | cleanup() -> 251 | error_logger:tty(false), 252 | (catch application:stop(sidejob)), 253 | % error_logger:tty(true), 254 | application:start(sidejob). 255 | -------------------------------------------------------------------------------- /eqc/worker.erl: -------------------------------------------------------------------------------- 1 | %%% File : worker.erl 2 | %%% Author : Ulf Norell 3 | %%% Description : 4 | %%% Created : 13 May 2013 by Ulf Norell 5 | -module(worker). 6 | 7 | -behaviour(gen_server). 8 | 9 | %% API 10 | -export([start_link/0]). 11 | 12 | %% gen_server callbacks 13 | -export([init/1, handle_call/3, handle_cast/2, handle_info/2, 14 | terminate/2, code_change/3]). 15 | 16 | -record(state, {}). 17 | 18 | -define(SERVER, ?MODULE). 19 | 20 | %%==================================================================== 21 | %% API 22 | %%==================================================================== 23 | %%-------------------------------------------------------------------- 24 | %% Function: start_link() -> {ok,Pid} | ignore | {error,Error} 25 | %% Description: Starts the server 26 | %%-------------------------------------------------------------------- 27 | start_link() -> 28 | gen_server:start_link({local, ?SERVER}, ?MODULE, [], []). 29 | 30 | %%==================================================================== 31 | %% gen_server callbacks 32 | %%==================================================================== 33 | 34 | %%-------------------------------------------------------------------- 35 | %% Function: init(Args) -> {ok, State} | 36 | %% {ok, State, Timeout} | 37 | %% ignore | 38 | %% {stop, Reason} 39 | %% Description: Initiates the server 40 | %%-------------------------------------------------------------------- 41 | init([_Arg]) -> 42 | {ok, #state{}}. 43 | 44 | %%-------------------------------------------------------------------- 45 | %% Function: %% handle_call(Request, From, State) -> 46 | %% {reply, Reply, State} | 47 | %% {reply, Reply, State, Timeout} | 48 | %% {noreply, State} | 49 | %% {noreply, State, Timeout} | 50 | %% {stop, Reason, Reply, State} | 51 | %% {stop, Reason, State} 52 | %% Description: Handling call messages 53 | %%-------------------------------------------------------------------- 54 | handle_call({start, Worker, Parent}, _From, State) -> 55 | Parent ! {Worker, {working, self()}}, 56 | receive finish -> ok; 57 | crash -> exit(crashed) end, 58 | {reply, done, State}; 59 | handle_call(_Request, _From, State) -> 60 | Reply = ok, 61 | {reply, Reply, State}. 62 | 63 | %%-------------------------------------------------------------------- 64 | %% Function: handle_cast(Msg, State) -> {noreply, State} | 65 | %% {noreply, State, Timeout} | 66 | %% {stop, Reason, State} 67 | %% Description: Handling cast messages 68 | %%-------------------------------------------------------------------- 69 | handle_cast({start, Ref, Pid}, State) -> 70 | Pid ! {started, Ref, self()}, 71 | receive finish -> ok; 72 | crash -> exit(crashed) end, 73 | {noreply, State}. 74 | 75 | %%-------------------------------------------------------------------- 76 | %% Function: handle_info(Info, State) -> {noreply, State} | 77 | %% {noreply, State, Timeout} | 78 | %% {stop, Reason, State} 79 | %% Description: Handling all non call/cast messages 80 | %%-------------------------------------------------------------------- 81 | handle_info(_Info, State) -> 82 | {noreply, State}. 83 | 84 | %%-------------------------------------------------------------------- 85 | %% Function: terminate(Reason, State) -> void() 86 | %% Description: This function is called by a gen_server when it is about to 87 | %% terminate. It should be the opposite of Module:init/1 and do any necessary 88 | %% cleaning up. When it returns, the gen_server terminates with Reason. 89 | %% The return value is ignored. 90 | %%-------------------------------------------------------------------- 91 | terminate(_Reason, _State) -> 92 | ok. 93 | 94 | %%-------------------------------------------------------------------- 95 | %% Func: code_change(OldVsn, State, Extra) -> {ok, NewState} 96 | %% Description: Convert process state when code is changed 97 | %%-------------------------------------------------------------------- 98 | code_change(_OldVsn, State, _Extra) -> 99 | {ok, State}. 100 | 101 | %%-------------------------------------------------------------------- 102 | %%% Internal functions 103 | %%-------------------------------------------------------------------- 104 | 105 | 106 | -------------------------------------------------------------------------------- /rebar.config: -------------------------------------------------------------------------------- 1 | {minimum_otp_vsn, "22.0"}. 2 | {erl_opts, [debug_info, warnings_as_errors, warn_untyped_records]}. 3 | {cover_enabled, true}. 4 | {edoc_opts, [{preprocess, true}]}. 5 | {xref_checks,[undefined_function_calls,undefined_functions,locals_not_used, 6 | deprecated_function_calls, deprecated_functions]}. 7 | {deps, []}. 8 | {profiles, [ 9 | {gha, [{erl_opts, [{d, 'GITHUBEXCLUDE'}]}]} 10 | ]}. 11 | {plugins, [{eqc_rebar, {git, "https://github.com/Quviq/eqc-rebar", {branch, "master"}}}]}. 12 | -------------------------------------------------------------------------------- /rebar.config.script: -------------------------------------------------------------------------------- 1 | case os:getenv("PULSE") of 2 | false -> 3 | CONFIG; 4 | _ -> 5 | ErlOpts = proplists:get_value(erl_opts, CONFIG, []), 6 | NewErlOpts = {erl_opts, [{d, 'PULSE', true}, 7 | {parse_transform, pulse_instrument}, 8 | {pulse_side_effect, [{ets, '_', '_'}]}, 9 | {pulse_replace_module, [{application, pulse_application}, 10 | {application_controller, pulse_application_controller}, 11 | {application_master, pulse_application_master}, 12 | {application_starter, pulse_application_starter}, 13 | {gen, pulse_gen}, 14 | {gen_event, pulse_gen_event}, 15 | {gen_fsm, pulse_gen_fsm}, 16 | {gen_server, pulse_gen_server}, 17 | {proc_lib, pulse_proc_lib}, 18 | {supervisor, pulse_supervisor}]}|ErlOpts]}, 19 | lists:keystore(erl_opts, 1, CONFIG, NewErlOpts) 20 | end. 21 | -------------------------------------------------------------------------------- /rebar3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/basho/sidejob/10abce4fc76054c8aad230943b5c1a31b67efc6f/rebar3 -------------------------------------------------------------------------------- /src/sidejob.app.src: -------------------------------------------------------------------------------- 1 | {application, sidejob, 2 | [ 3 | {description, "Parallel worker and capacity limiting library"}, 4 | {vsn, git}, 5 | {registered, []}, 6 | {applications, [ 7 | kernel, 8 | stdlib 9 | ]}, 10 | {registered, []}, 11 | {mod, {sidejob_app, []}}, 12 | {env, []} 13 | ]}. 14 | -------------------------------------------------------------------------------- /src/sidejob.erl: -------------------------------------------------------------------------------- 1 | %% ------------------------------------------------------------------- 2 | %% 3 | %% Copyright (c) 2013 Basho Technologies, Inc. All Rights Reserved. 4 | %% 5 | %% This file is provided to you under the Apache License, 6 | %% Version 2.0 (the "License"); you may not use this file 7 | %% except in compliance with the License. You may obtain 8 | %% a copy of the License at 9 | %% 10 | %% http://www.apache.org/licenses/LICENSE-2.0 11 | %% 12 | %% Unless required by applicable law or agreed to in writing, 13 | %% software distributed under the License is distributed on an 14 | %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | %% KIND, either express or implied. See the License for the 16 | %% specific language governing permissions and limitations 17 | %% under the License. 18 | %% 19 | %% ------------------------------------------------------------------- 20 | -module(sidejob). 21 | -export([new_resource/3, new_resource/4, call/2, call/3, cast/2, 22 | unbounded_cast/2, resource_exists/1]). 23 | 24 | %%%=================================================================== 25 | %%% API 26 | %%%=================================================================== 27 | 28 | %% @doc 29 | %% Create a new sidejob resource that uses the provided worker module, 30 | %% enforces the requested usage limit, and is managed by the specified 31 | %% number of worker processes. 32 | %% 33 | %% This call will generate and load a new module, via {@link sidejob_config}, 34 | %% that provides information about the new resource. It will also start up the 35 | %% supervision hierarchy that manages this resource: ensuring that the workers 36 | %% and stats aggregation server for this resource remain running. 37 | new_resource(Name, Mod, Limit, Workers) -> 38 | StatsETS = sidejob_resource_sup:stats_ets(Name), 39 | WorkerNames = sidejob_worker:workers(Name, Workers), 40 | StatsName = sidejob_resource_stats:reg_name(Name), 41 | WorkerLimit = Limit div Workers, 42 | sidejob_config:load_config(Name, [{width, Workers}, 43 | {limit, Limit}, 44 | {worker_limit, WorkerLimit}, 45 | {stats_ets, StatsETS}, 46 | {workers, list_to_tuple(WorkerNames)}, 47 | {worker_ets, list_to_tuple(WorkerNames)}, 48 | {stats, StatsName}]), 49 | sidejob_sup:add_resource(Name, Mod). 50 | 51 | %% @doc 52 | %% Same as {@link new_resource/4} except that the number of workers defaults 53 | %% to the number of scheduler threads. 54 | new_resource(Name, Mod, Limit) -> 55 | Workers = erlang:system_info(schedulers), 56 | new_resource(Name, Mod, Limit, Workers). 57 | 58 | 59 | %% @doc 60 | %% Same as {@link call/3} with a default timeout of 5 seconds. 61 | call(Name, Msg) -> 62 | call(Name, Msg, 5000). 63 | 64 | %% @doc 65 | %% Perform a synchronous call to the specified resource, failing if the 66 | %% resource has reached its usage limit. 67 | call(Name, Msg, Timeout) -> 68 | case available(Name) of 69 | none -> 70 | overload; 71 | Worker -> 72 | gen_server:call(Worker, Msg, Timeout) 73 | end. 74 | 75 | %% @doc 76 | %% Perform an asynchronous cast to the specified resource, failing if the 77 | %% resource has reached its usage limit. 78 | cast(Name, Msg) -> 79 | case available(Name) of 80 | none -> 81 | overload; 82 | Worker -> 83 | gen_server:cast(Worker, Msg) 84 | end. 85 | 86 | %% @doc 87 | %% Perform an asynchronous cast to the specified resource, ignoring 88 | %% usage limits 89 | unbounded_cast(Name, Msg) -> 90 | Worker = preferred_worker(Name), 91 | gen_server:cast(Worker, Msg). 92 | 93 | %%% @doc 94 | %% Check if the specified resource exists. Erlang docs call out that 95 | %% using erlang:module_exists should not be used, so try to call 96 | %% a function on the module in question and, if it succeeds, return 97 | %% true. Otherwise, the module hasn't been created so return false. 98 | -spec resource_exists(Mod::atom()) -> boolean(). 99 | resource_exists(Mod) -> 100 | try 101 | _ = Mod:width(), 102 | true 103 | catch _:_ -> 104 | false 105 | end. 106 | 107 | %%%=================================================================== 108 | %%% Internal functions 109 | %%%=================================================================== 110 | 111 | %% Return the preferred worker for the current scheduler 112 | preferred_worker(Name) -> 113 | Width = Name:width(), 114 | Scheduler = erlang:system_info(scheduler_id), 115 | Worker = Scheduler rem Width, 116 | worker_reg_name(Name, Worker). 117 | 118 | %% Find an available worker or return none if all workers at limit 119 | available(Name) -> 120 | WorkerETS = Name:worker_ets(), 121 | Width = Name:width(), 122 | Limit = Name:worker_limit(), 123 | Scheduler = erlang:system_info(scheduler_id), 124 | Worker = Scheduler rem Width, 125 | case is_available(WorkerETS, Limit, Worker) of 126 | true -> 127 | worker_reg_name(Name, Worker); 128 | false -> 129 | available(Name, WorkerETS, Width, Limit, Worker+1, Worker) 130 | end. 131 | 132 | available(Name, _WorkerETS, _Width, _Limit, End, End) -> 133 | ets:update_counter(Name:stats_ets(), rejected, 1), 134 | none; 135 | available(Name, WorkerETS, Width, Limit, X, End) -> 136 | Worker = X rem Width, 137 | case is_available(WorkerETS, Limit, Worker) of 138 | false -> 139 | available(Name, WorkerETS, Width, Limit, (Worker+1) rem Width, End); 140 | true -> 141 | worker_reg_name(Name, Worker) 142 | end. 143 | 144 | is_available(WorkerETS, Limit, Worker) -> 145 | ETS = element(Worker+1, WorkerETS), 146 | case ets:lookup_element(ETS, full, 2) of 147 | 1 -> 148 | false; 149 | 0 -> 150 | Value = ets:update_counter(ETS, usage, 1), 151 | if Value >= Limit -> 152 | ets:insert(ETS, {full, 1}); 153 | true -> 154 | ok 155 | end, 156 | true 157 | end. 158 | 159 | worker_reg_name(Name, Id) -> 160 | element(Id+1, Name:workers()). 161 | -------------------------------------------------------------------------------- /src/sidejob_app.erl: -------------------------------------------------------------------------------- 1 | %% ------------------------------------------------------------------- 2 | %% 3 | %% Copyright (c) 2013 Basho Technologies, Inc. All Rights Reserved. 4 | %% 5 | %% This file is provided to you under the Apache License, 6 | %% Version 2.0 (the "License"); you may not use this file 7 | %% except in compliance with the License. You may obtain 8 | %% a copy of the License at 9 | %% 10 | %% http://www.apache.org/licenses/LICENSE-2.0 11 | %% 12 | %% Unless required by applicable law or agreed to in writing, 13 | %% software distributed under the License is distributed on an 14 | %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | %% KIND, either express or implied. See the License for the 16 | %% specific language governing permissions and limitations 17 | %% under the License. 18 | %% 19 | %% ------------------------------------------------------------------- 20 | -module(sidejob_app). 21 | -behaviour(application). 22 | 23 | %% Application callbacks 24 | -export([start/2, start_phase/3, prep_stop/1, stop/1, config_change/3]). 25 | 26 | %%%=================================================================== 27 | %%% Application callbacks 28 | %%%=================================================================== 29 | 30 | start(_Type, _StartArgs) -> 31 | case sidejob_sup:start_link() of 32 | {ok, Pid} -> 33 | {ok, Pid}; 34 | Error -> 35 | Error 36 | end. 37 | 38 | stop(_State) -> 39 | ok. 40 | 41 | start_phase(_Phase, _Type, _PhaseArgs) -> 42 | ok. 43 | 44 | prep_stop(State) -> 45 | State. 46 | 47 | config_change(_Changed, _New, _Removed) -> 48 | ok. 49 | -------------------------------------------------------------------------------- /src/sidejob_config.erl: -------------------------------------------------------------------------------- 1 | %% ------------------------------------------------------------------- 2 | %% 3 | %% Copyright (c) 2013 Basho Technologies, Inc. All Rights Reserved. 4 | %% 5 | %% This file is provided to you under the Apache License, 6 | %% Version 2.0 (the "License"); you may not use this file 7 | %% except in compliance with the License. You may obtain 8 | %% a copy of the License at 9 | %% 10 | %% http://www.apache.org/licenses/LICENSE-2.0 11 | %% 12 | %% Unless required by applicable law or agreed to in writing, 13 | %% software distributed under the License is distributed on an 14 | %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | %% KIND, either express or implied. See the License for the 16 | %% specific language governing permissions and limitations 17 | %% under the License. 18 | %% 19 | %% ------------------------------------------------------------------- 20 | 21 | %% @doc 22 | %% Utility that converts a given property list into a module that provides 23 | %% constant time access to the various key/value pairs. 24 | %% 25 | %% Example: 26 | %% load_config(test, [{limit, 1000}, 27 | %% {num_workers, 4}, 28 | %% {workers, [{test_1, test_2, test_3, test_4}]}]). 29 | %% 30 | %% creates the module `test' such that: 31 | %% test:limit(). => 1000 32 | %% test:num_workers(). => 16 33 | %% test:workers(). => [{test_1, test_2, test_3, test_4}]}] 34 | %% 35 | -module(sidejob_config). 36 | -export([load_config/2]). 37 | 38 | load_config(Resource, Config) -> 39 | Module = make_module(Resource), 40 | Exports = [make_export(Key) || {Key, _} <- Config], 41 | Functions = [make_function(Key, Value) || {Key, Value} <- Config], 42 | ExportAttr = make_export_attribute(Exports), 43 | Abstract = [Module, ExportAttr | Functions], 44 | Forms = erl_syntax:revert_forms(Abstract), 45 | {ok, Resource, Bin} = compile:forms(Forms, [verbose, report_errors]), 46 | code:purge(Resource), 47 | {module, Resource} = code:load_binary(Resource, 48 | atom_to_list(Resource) ++ ".erl", 49 | Bin), 50 | ok. 51 | 52 | make_module(Module) -> 53 | erl_syntax:attribute(erl_syntax:atom(module), 54 | [erl_syntax:atom(Module)]). 55 | 56 | make_export(Key) -> 57 | erl_syntax:arity_qualifier(erl_syntax:atom(Key), 58 | erl_syntax:integer(0)). 59 | 60 | make_export_attribute(Exports) -> 61 | erl_syntax:attribute(erl_syntax:atom(export), 62 | [erl_syntax:list(Exports)]). 63 | 64 | make_function(Key, Value) -> 65 | Constant = erl_syntax:clause([], none, [erl_syntax:abstract(Value)]), 66 | erl_syntax:function(erl_syntax:atom(Key), [Constant]). 67 | -------------------------------------------------------------------------------- /src/sidejob_resource_stats.erl: -------------------------------------------------------------------------------- 1 | %% ------------------------------------------------------------------- 2 | %% 3 | %% Copyright (c) 2013 Basho Technologies, Inc. All Rights Reserved. 4 | %% 5 | %% This file is provided to you under the Apache License, 6 | %% Version 2.0 (the "License"); you may not use this file 7 | %% except in compliance with the License. You may obtain 8 | %% a copy of the License at 9 | %% 10 | %% http://www.apache.org/licenses/LICENSE-2.0 11 | %% 12 | %% Unless required by applicable law or agreed to in writing, 13 | %% software distributed under the License is distributed on an 14 | %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | %% KIND, either express or implied. See the License for the 16 | %% specific language governing permissions and limitations 17 | %% under the License. 18 | %% 19 | %% ------------------------------------------------------------------- 20 | -module(sidejob_resource_stats). 21 | -behaviour(gen_server). 22 | 23 | %% API 24 | -export([reg_name/1, start_link/2, report/5, init_stats/1, stats/1, usage/1]). 25 | 26 | %% gen_server callbacks 27 | -export([init/1, handle_call/3, handle_cast/2, handle_info/2, 28 | terminate/2, code_change/3]). 29 | 30 | -record(state, {worker_reports = dict:new(), 31 | stats_ets = undefined, 32 | usage = 0, 33 | rejected = 0, 34 | in = 0, 35 | out = 0, 36 | stats_60s = sidejob_stat:new(), 37 | next_stats_60s = sidejob_stat:new(), 38 | left_60s = 60, 39 | stats_total = sidejob_stat:new()}). 40 | 41 | %%%=================================================================== 42 | %%% API 43 | %%%=================================================================== 44 | 45 | reg_name(Name) when is_atom(Name) -> 46 | reg_name(atom_to_binary(Name, latin1)); 47 | reg_name(Name) -> 48 | binary_to_atom(<>, latin1). 49 | 50 | start_link(RegName, StatsETS) -> 51 | gen_server:start_link({local, RegName}, ?MODULE, [StatsETS], []). 52 | 53 | %% @doc 54 | %% Used by {@link sidejob_worker} processes to report per-worker statistics 55 | report(Name, Id, Usage, In, Out) -> 56 | gen_server:cast(Name, {report, Id, Usage, In, Out}). 57 | 58 | %% @doc 59 | %% Used by {@link sidejob_resource_sup} to initialize a newly created 60 | %% stats ETS table to ensure the table is non-empty before bringing a 61 | %% resource online 62 | init_stats(StatsETS) -> 63 | EmptyStats = compute(#state{}), 64 | ets:insert(StatsETS, [{rejected, 0}, 65 | {usage, 0}, 66 | {stats, EmptyStats}]). 67 | 68 | %% @doc 69 | %% Return the computed stats for the given sidejob resource 70 | stats(Name) -> 71 | StatsETS = Name:stats_ets(), 72 | ets:lookup_element(StatsETS, stats, 2). 73 | 74 | %% @doc 75 | %% Return the current usage for the given sidejob resource 76 | usage(Name) -> 77 | StatsETS = Name:stats_ets(), 78 | ets:lookup_element(StatsETS, usage, 2). 79 | 80 | %%%=================================================================== 81 | %%% gen_server callbacks 82 | %%%=================================================================== 83 | 84 | init([StatsETS]) -> 85 | schedule_tick(), 86 | {ok, #state{stats_ets=StatsETS}}. 87 | 88 | handle_call(get_stats, _From, State) -> 89 | {reply, compute(State), State}; 90 | 91 | handle_call(usage, _From, State=#state{usage=Usage}) -> 92 | {reply, Usage, State}; 93 | 94 | handle_call(_Request, _From, State) -> 95 | {reply, ok, State}. 96 | 97 | handle_cast({report, Id, UsageVal, InVal, OutVal}, 98 | State=#state{worker_reports=Reports}) -> 99 | Reports2 = dict:store(Id, {UsageVal, InVal, OutVal}, Reports), 100 | State2 = State#state{worker_reports=Reports2}, 101 | {noreply, State2}; 102 | 103 | handle_cast(_Msg, State) -> 104 | {noreply, State}. 105 | 106 | handle_info(tick, State) -> 107 | schedule_tick(), 108 | State2 = tick(State), 109 | {noreply, State2}; 110 | 111 | handle_info(_Info, State) -> 112 | {noreply, State}. 113 | 114 | terminate(_Reason, _State) -> 115 | ok. 116 | 117 | code_change(_OldVsn, State, _Extra) -> 118 | {ok, State}. 119 | 120 | %%%=================================================================== 121 | %%% Internal functions 122 | %%%=================================================================== 123 | 124 | schedule_tick() -> 125 | erlang:send_after(1000, self(), tick). 126 | 127 | %% Aggregate all reported worker stats into unified stat report for 128 | %% this resource 129 | tick(State=#state{stats_ets=StatsETS, 130 | left_60s=Left60, 131 | next_stats_60s=Next60, 132 | stats_total=Total}) -> 133 | {Usage, In, Out} = combine_reports(State), 134 | 135 | Rejected = ets:update_counter(StatsETS, rejected, 0), 136 | ets:update_counter(StatsETS, rejected, {2,-Rejected,0,0}), 137 | 138 | NewNext60 = sidejob_stat:add(Rejected, In, Out, Next60), 139 | NewTotal = sidejob_stat:add(Rejected, In, Out, Total), 140 | State2 = State#state{usage=Usage, 141 | rejected=Rejected, 142 | in=In, 143 | out=Out, 144 | next_stats_60s=NewNext60, 145 | stats_total=NewTotal}, 146 | 147 | State3 = case Left60 of 148 | 0 -> 149 | State2#state{left_60s=59, 150 | stats_60s=NewNext60, 151 | next_stats_60s=sidejob_stat:new()}; 152 | _ -> 153 | State2#state{left_60s=Left60-1} 154 | end, 155 | 156 | ets:insert(StatsETS, [{usage, Usage}, 157 | {stats, compute(State3)}]), 158 | State3. 159 | 160 | %% Total all reported worker stats into a single sum for each metric 161 | combine_reports(#state{worker_reports=Reports}) -> 162 | dict:fold(fun(_, {Usage, In, Out}, {UsageAcc, InAcc, OutAcc}) -> 163 | {UsageAcc + Usage, InAcc + In, OutAcc + Out} 164 | end, {0,0,0}, Reports). 165 | 166 | compute(#state{usage=Usage, rejected=Rejected, in=In, out=Out, 167 | stats_60s=Stats60s, stats_total=StatsTotal}) -> 168 | {Usage60, Rejected60, InAvg60, InMax60, OutAvg60, OutMax60} = 169 | sidejob_stat:compute(Stats60s), 170 | 171 | {UsageTot, RejectedTot, InAvgTot, InMaxTot, OutAvgTot, OutMaxTot} = 172 | sidejob_stat:compute(StatsTotal), 173 | 174 | [{usage, Usage}, 175 | {rejected, Rejected}, 176 | {in_rate, In}, 177 | {out_rate, Out}, 178 | {usage_60s, Usage60}, 179 | {rejected_60s, Rejected60}, 180 | {avg_in_rate_60s, InAvg60}, 181 | {max_in_rate_60s, InMax60}, 182 | {avg_out_rate_60s, OutAvg60}, 183 | {max_out_rate_60s, OutMax60}, 184 | {usage_total, UsageTot}, 185 | {rejected_total, RejectedTot}, 186 | {avg_in_rate_total, InAvgTot}, 187 | {max_in_rate_total, InMaxTot}, 188 | {avg_out_rate_total, OutAvgTot}, 189 | {max_out_rate_total, OutMaxTot}]. 190 | -------------------------------------------------------------------------------- /src/sidejob_resource_sup.erl: -------------------------------------------------------------------------------- 1 | %% ------------------------------------------------------------------- 2 | %% 3 | %% Copyright (c) 2013 Basho Technologies, Inc. All Rights Reserved. 4 | %% 5 | %% This file is provided to you under the Apache License, 6 | %% Version 2.0 (the "License"); you may not use this file 7 | %% except in compliance with the License. You may obtain 8 | %% a copy of the License at 9 | %% 10 | %% http://www.apache.org/licenses/LICENSE-2.0 11 | %% 12 | %% Unless required by applicable law or agreed to in writing, 13 | %% software distributed under the License is distributed on an 14 | %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | %% KIND, either express or implied. See the License for the 16 | %% specific language governing permissions and limitations 17 | %% under the License. 18 | %% 19 | %% ------------------------------------------------------------------- 20 | 21 | %% @doc 22 | %% The sidejob_resource_sup manages the entire supervision hierarchy for 23 | %% a sidejob resource. Thus, there is one resource supervisor for each 24 | %% registered sidejob resource. 25 | %% 26 | %% The resource supervisor is the owner of a resource's limit and stats 27 | %% ETS tables, therefore ensuring the ETS tables survive crashes elsewhere 28 | %% in the resource hierarchy. 29 | %% 30 | %% The resource supervisor has two children: a {@link sidejob_worker_sup} 31 | %% that supervises the actual worker processes for a given resource, and 32 | %% a {@link sidejob_resource_stats} server that aggregates statistics 33 | %% reported by the worker processes. 34 | -module(sidejob_resource_sup). 35 | -behaviour(supervisor). 36 | 37 | %% API 38 | -export([start_link/2, stats_ets/1]). 39 | 40 | %% Supervisor callbacks 41 | -export([init/1]). 42 | 43 | %%%=================================================================== 44 | %%% API functions 45 | %%%=================================================================== 46 | 47 | start_link(Name, Mod) -> 48 | supervisor:start_link({local, Name}, ?MODULE, [Name, Mod]). 49 | 50 | stats_ets(Name) -> 51 | ETS = iolist_to_binary([atom_to_binary(Name, latin1), "_stats_ets"]), 52 | binary_to_atom(ETS, latin1). 53 | 54 | %%%=================================================================== 55 | %%% Supervisor callbacks 56 | %%%=================================================================== 57 | 58 | init([Name, Mod]) -> 59 | Width = Name:width(), 60 | StatsETS = stats_ets(Name), 61 | StatsName = Name:stats(), 62 | WorkerNames = sidejob_worker:workers(Name, Width), 63 | 64 | _WorkerETS = [begin 65 | WorkerTab = ets:new(WorkerName, [named_table, 66 | public]), 67 | ets:insert(WorkerTab, [{usage, 0}, 68 | {full, 0}]), 69 | WorkerTab 70 | end || WorkerName <- WorkerNames], 71 | 72 | StatsTab = ets:new(StatsETS, [named_table, 73 | public, 74 | {read_concurrency,true}, 75 | {write_concurrency,true}]), 76 | sidejob_resource_stats:init_stats(StatsTab), 77 | 78 | WorkerSup = {sidejob_worker_sup, 79 | {sidejob_worker_sup, start_link, 80 | [Name, Width, StatsName, Mod]}, 81 | permanent, infinity, supervisor, [sidejob_worker_sup]}, 82 | StatsServer = {StatsName, 83 | {sidejob_resource_stats, start_link, [StatsName, StatsTab]}, 84 | permanent, 5000, worker, [sidejob_resource_stats]}, 85 | {ok, {{one_for_one, 10, 10}, [WorkerSup, StatsServer]}}. 86 | -------------------------------------------------------------------------------- /src/sidejob_stat.erl: -------------------------------------------------------------------------------- 1 | %% ------------------------------------------------------------------- 2 | %% 3 | %% Copyright (c) 2013 Basho Technologies, Inc. All Rights Reserved. 4 | %% 5 | %% This file is provided to you under the Apache License, 6 | %% Version 2.0 (the "License"); you may not use this file 7 | %% except in compliance with the License. You may obtain 8 | %% a copy of the License at 9 | %% 10 | %% http://www.apache.org/licenses/LICENSE-2.0 11 | %% 12 | %% Unless required by applicable law or agreed to in writing, 13 | %% software distributed under the License is distributed on an 14 | %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | %% KIND, either express or implied. See the License for the 16 | %% specific language governing permissions and limitations 17 | %% under the License. 18 | %% 19 | %% ------------------------------------------------------------------- 20 | -module(sidejob_stat). 21 | -export([new/0, add/4, compute/1]). 22 | 23 | -record(stat, {rejected = 0, 24 | in_sum = 0, 25 | in_max = 0, 26 | out_sum = 0, 27 | out_max = 0, 28 | samples = 0}). 29 | 30 | -define(ADD(Field, Value), Field = Stat#stat.Field + Value). 31 | -define(MAX(Field, Value), Field = max(Stat#stat.Field, Value)). 32 | 33 | new() -> 34 | #stat{}. 35 | 36 | add(Rejected, In, Out, Stat) -> 37 | Stat#stat{?ADD(rejected, Rejected), 38 | ?ADD(in_sum, In), 39 | ?ADD(out_sum, Out), 40 | ?ADD(samples, 1), 41 | ?MAX(in_max, In), 42 | ?MAX(out_max, Out)}. 43 | 44 | compute(#stat{rejected=Rejected, in_sum=InSum, in_max=InMax, 45 | out_sum=OutSum, out_max=OutMax, samples=Samples}) -> 46 | InAvg = InSum div max(1,Samples), 47 | OutAvg = OutSum div max(1,Samples), 48 | {InSum, Rejected, InAvg, InMax, OutAvg, OutMax}. 49 | -------------------------------------------------------------------------------- /src/sidejob_sup.erl: -------------------------------------------------------------------------------- 1 | %% ------------------------------------------------------------------- 2 | %% 3 | %% Copyright (c) 2013 Basho Technologies, Inc. All Rights Reserved. 4 | %% 5 | %% This file is provided to you under the Apache License, 6 | %% Version 2.0 (the "License"); you may not use this file 7 | %% except in compliance with the License. You may obtain 8 | %% a copy of the License at 9 | %% 10 | %% http://www.apache.org/licenses/LICENSE-2.0 11 | %% 12 | %% Unless required by applicable law or agreed to in writing, 13 | %% software distributed under the License is distributed on an 14 | %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | %% KIND, either express or implied. See the License for the 16 | %% specific language governing permissions and limitations 17 | %% under the License. 18 | %% 19 | %% ------------------------------------------------------------------- 20 | 21 | %% @doc 22 | %% The top-level supervisor for the sidejob application. 23 | %% 24 | %% When a new resource is created via {@link sidejob:new_resource/4}, 25 | %% a new {@link sidejob_resource_sup} is added to this supervisor. 26 | %% 27 | %% The actual resource supervisor manages a given resource's process 28 | %% hierarchy. This top-level supervisor simply ensures that all registered 29 | %% resource supervisors remain up. 30 | 31 | -module(sidejob_sup). 32 | -behaviour(supervisor). 33 | 34 | %% API 35 | -export([start_link/0, add_resource/2]). 36 | 37 | %% Supervisor callbacks 38 | -export([init/1]). 39 | 40 | %%%=================================================================== 41 | %%% API functions 42 | %%%=================================================================== 43 | 44 | start_link() -> 45 | supervisor:start_link({local, ?MODULE}, ?MODULE, []). 46 | 47 | add_resource(Name, Mod) -> 48 | Child = {Name, 49 | {sidejob_resource_sup, start_link, [Name, Mod]}, 50 | permanent, infinity, supervisor, [sidejob_resource_sup]}, 51 | supervisor:start_child(?MODULE, Child). 52 | 53 | %%%=================================================================== 54 | %%% Supervisor callbacks 55 | %%%=================================================================== 56 | 57 | init([]) -> 58 | {ok, {{one_for_one, 10, 10}, []}}. 59 | -------------------------------------------------------------------------------- /src/sidejob_supervisor.erl: -------------------------------------------------------------------------------- 1 | %% ------------------------------------------------------------------- 2 | %% 3 | %% Copyright (c) 2013 Basho Technologies, Inc. All Rights Reserved. 4 | %% 5 | %% This file is provided to you under the Apache License, 6 | %% Version 2.0 (the "License"); you may not use this file 7 | %% except in compliance with the License. You may obtain 8 | %% a copy of the License at 9 | %% 10 | %% http://www.apache.org/licenses/LICENSE-2.0 11 | %% 12 | %% Unless required by applicable law or agreed to in writing, 13 | %% software distributed under the License is distributed on an 14 | %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | %% KIND, either express or implied. See the License for the 16 | %% specific language governing permissions and limitations 17 | %% under the License. 18 | %% 19 | %% ------------------------------------------------------------------- 20 | 21 | %% @doc 22 | %% This module implements a sidejob_worker behavior that operates as a 23 | %% parallel, capacity-limited supervisor of dynamic, transient children. 24 | 25 | -module(sidejob_supervisor). 26 | -behaviour(gen_server). 27 | 28 | %% API 29 | -export([start_child/4, spawn/2, spawn/4, which_children/1]). 30 | 31 | %% sidejob_worker callbacks 32 | -export([current_usage/1, rate/1]). 33 | 34 | %% gen_server callbacks 35 | -export([init/1, handle_call/3, handle_cast/2, handle_info/2, 36 | terminate/2, code_change/3]). 37 | 38 | -record(state, {name, 39 | children=sets:new(), 40 | spawned=0, 41 | died=0}). 42 | 43 | -type resource() :: atom(). 44 | 45 | %%%=================================================================== 46 | %%% API 47 | %%%=================================================================== 48 | 49 | -spec start_child(resource(), module(), atom(), term()) -> {ok, pid()} | 50 | {error, overload} | 51 | {error, term()}. 52 | start_child(Name, Mod, Fun, Args) -> 53 | case sidejob:call(Name, {start_child, Mod, Fun, Args}, infinity) of 54 | overload -> 55 | {error, overload}; 56 | Other -> 57 | Other 58 | end. 59 | 60 | -spec spawn(resource(), function() | {module(), atom(), [term()]}) -> {ok, pid()} | {error, overload}. 61 | spawn(Name, Fun) -> 62 | case sidejob:call(Name, {spawn, Fun}, infinity) of 63 | overload -> 64 | {error, overload}; 65 | Other -> 66 | Other 67 | end. 68 | 69 | -spec spawn(resource(), module(), atom(), [term()]) -> {ok, pid()} | 70 | {error, overload}. 71 | spawn(Name, Mod, Fun, Args) -> 72 | ?MODULE:spawn(Name, {Mod, Fun, Args}). 73 | 74 | -spec which_children(resource()) -> [pid()]. 75 | which_children(Name) -> 76 | Workers = tuple_to_list(Name:workers()), 77 | Children = [gen_server:call(Worker, get_children) || Worker <- Workers], 78 | lists:flatten(Children). 79 | 80 | %%%=================================================================== 81 | %%% gen_server callbacks 82 | %%%=================================================================== 83 | 84 | init([Name]) -> 85 | process_flag(trap_exit, true), 86 | {ok, #state{name=Name}}. 87 | 88 | handle_call(get_children, _From, State=#state{children=Children}) -> 89 | {reply, sets:to_list(Children), State}; 90 | 91 | handle_call({start_child, Mod, Fun, Args}, _From, State) -> 92 | Result = (catch apply(Mod, Fun, Args)), 93 | {Reply, State2} = case Result of 94 | {ok, Pid} when is_pid(Pid) -> 95 | {Result, add_child(Pid, State)}; 96 | {ok, Pid, _Info} when is_pid(Pid) -> 97 | {Result, add_child(Pid, State)}; 98 | ignore -> 99 | {{ok, undefined}, State}; 100 | {error, _} -> 101 | {Result, State}; 102 | Error -> 103 | {{error, Error}, State} 104 | end, 105 | {reply, Reply, State2}; 106 | 107 | handle_call({spawn, Fun}, _From, State) -> 108 | Pid = case Fun of 109 | _ when is_function(Fun) -> 110 | spawn_link(Fun); 111 | {M, F, A} -> 112 | spawn_link(M, F, A) 113 | end, 114 | State2 = add_child(Pid, State), 115 | {reply, Pid, State2}; 116 | 117 | handle_call(_Request, _From, State) -> 118 | {reply, ok, State}. 119 | 120 | handle_cast(_Msg, State) -> 121 | {noreply, State}. 122 | 123 | handle_info({'EXIT', Pid, Reason}, State=#state{children=Children, 124 | died=Died}) -> 125 | case sets:is_element(Pid, Children) of 126 | true -> 127 | Children2 = sets:del_element(Pid, Children), 128 | Died2 = Died + 1, 129 | State2 = State#state{children=Children2, died=Died2}, 130 | {noreply, State2}; 131 | false -> 132 | {stop, Reason, State} 133 | end; 134 | 135 | handle_info(_Info, State) -> 136 | {noreply, State}. 137 | 138 | terminate(_Reason, _State) -> 139 | ok. 140 | 141 | code_change(_OldVsn, State, _Extra) -> 142 | {ok, State}. 143 | 144 | current_usage(#state{children=Children}) -> 145 | {message_queue_len, Pending} = process_info(self(), message_queue_len), 146 | Current = sets:size(Children), 147 | Pending + Current. 148 | 149 | rate(State=#state{spawned=Spawned, died=Died}) -> 150 | State2 = State#state{spawned=0, 151 | died=0}, 152 | {Spawned, Died, State2}. 153 | 154 | %%%=================================================================== 155 | %%% Internal functions 156 | %%%=================================================================== 157 | 158 | add_child(Pid, State=#state{children=Children, spawned=Spawned}) -> 159 | Children2 = sets:add_element(Pid, Children), 160 | Spawned2 = Spawned + 1, 161 | State#state{children=Children2, spawned=Spawned2}. 162 | -------------------------------------------------------------------------------- /src/sidejob_worker.erl: -------------------------------------------------------------------------------- 1 | %% ------------------------------------------------------------------- 2 | %% 3 | %% Copyright (c) 2013 Basho Technologies, Inc. All Rights Reserved. 4 | %% 5 | %% This file is provided to you under the Apache License, 6 | %% Version 2.0 (the "License"); you may not use this file 7 | %% except in compliance with the License. You may obtain 8 | %% a copy of the License at 9 | %% 10 | %% http://www.apache.org/licenses/LICENSE-2.0 11 | %% 12 | %% Unless required by applicable law or agreed to in writing, 13 | %% software distributed under the License is distributed on an 14 | %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | %% KIND, either express or implied. See the License for the 16 | %% specific language governing permissions and limitations 17 | %% under the License. 18 | %% 19 | %% ------------------------------------------------------------------- 20 | 21 | %% @doc 22 | %% This module implements the sidejob_worker logic used by all worker 23 | %% processes created to manage a sidejob resource. This code emulates 24 | %% the gen_server API, wrapping a provided user-specified module which 25 | %% implements the gen_server behavior. 26 | %% 27 | %% The primary purpose of this module is updating the usage information 28 | %% published in a given resource's ETS table, such that capacity limiting 29 | %% operates correctly. The sidejob_worker also cooperates with a given 30 | %% {@link sidejob_resource_stats} server to maintain statistics about a 31 | %% given resource. 32 | %% 33 | %% By default, a sidejob_worker calculates resource usage based on message 34 | %% queue size. However, the user-specified module can also choose to 35 | %% implement the `current_usage/1' and `rate/1' callbacks to change how 36 | %% usage is calculated. An example is the {@link sidejob_supervisor} module 37 | %% which reports usage as: queue size + num_children. 38 | 39 | -module(sidejob_worker). 40 | -behaviour(gen_server). 41 | 42 | %% API 43 | -export([start_link/6, reg_name/1, reg_name/2, workers/2]). 44 | 45 | %% gen_server callbacks 46 | -export([init/1, handle_call/3, handle_cast/2, handle_info/2, 47 | terminate/2, code_change/3]). 48 | 49 | -record(state, {id :: non_neg_integer(), 50 | ets :: term(), 51 | width :: pos_integer(), 52 | limit :: pos_integer(), 53 | reporter :: term(), 54 | mod :: module(), 55 | modstate :: term(), 56 | usage :: custom | default, 57 | last_mq_len = 0 :: non_neg_integer(), 58 | enqueue = 0 :: non_neg_integer(), 59 | dequeue = 0 :: non_neg_integer()}). 60 | 61 | %%%=================================================================== 62 | %%% API 63 | %%%=================================================================== 64 | 65 | reg_name(Id) -> 66 | IdBin = list_to_binary(integer_to_list(Id)), 67 | binary_to_atom(<<"sidejob_worker_", IdBin/binary>>, latin1). 68 | 69 | reg_name(Name, Id) when is_atom(Name) -> 70 | reg_name(atom_to_binary(Name, latin1), Id); 71 | reg_name(NameBin, Id) -> 72 | WorkerName = iolist_to_binary([NameBin, "_", integer_to_list(Id)]), 73 | binary_to_atom(WorkerName, latin1). 74 | 75 | workers(Name, Count) -> 76 | NameBin = atom_to_binary(Name, latin1), 77 | [reg_name(NameBin, Id) || Id <- lists:seq(1,Count)]. 78 | 79 | start_link(RegName, ResName, Id, ETS, StatsName, Mod) -> 80 | gen_server:start_link({local, RegName}, ?MODULE, 81 | [ResName, Id, ETS, StatsName, Mod], []). 82 | 83 | %%%=================================================================== 84 | %%% gen_server callbacks 85 | %%%=================================================================== 86 | 87 | init([ResName, Id, ETS, StatsName, Mod]) -> 88 | %% TODO: Add ability to pass args 89 | case Mod:init([ResName]) of 90 | {ok, ModState} -> 91 | Exports = proplists:get_value(exports, Mod:module_info()), 92 | Usage = case lists:member({current_usage, 1}, Exports) of 93 | true -> 94 | custom; 95 | false -> 96 | default 97 | end, 98 | schedule_tick(), 99 | Width = ResName:width(), 100 | Limit = ResName:limit(), 101 | State = #state{id=Id, 102 | ets=ETS, 103 | mod=Mod, 104 | modstate=ModState, 105 | usage=Usage, 106 | width=Width, 107 | limit=Limit, 108 | reporter=StatsName}, 109 | ets:insert(ETS, [{usage,0}, {full,0}]), 110 | {ok, State}; 111 | Other -> 112 | Other 113 | end. 114 | 115 | handle_call(Request, From, State=#state{mod=Mod, 116 | modstate=ModState}) -> 117 | Result = Mod:handle_call(Request, From, ModState), 118 | {Pos, ModState2} = case Result of 119 | {reply,_Reply,NewState} -> 120 | {3, NewState}; 121 | {reply,_Reply,NewState,hibernate} -> 122 | {3, NewState}; 123 | {reply,_Reply,NewState,_Timeout} -> 124 | {3, NewState}; 125 | {noreply,NewState} -> 126 | {2, NewState}; 127 | {noreply,NewState,hibernate} -> 128 | {2, NewState}; 129 | {noreply,NewState,_Timeout} -> 130 | {2, NewState}; 131 | {stop,_Reason,_Reply,NewState} -> 132 | {4, NewState}; 133 | {stop,_Reason,NewState} -> 134 | {3, NewState} 135 | end, 136 | State2 = State#state{modstate=ModState2}, 137 | State3 = update_rate(update_usage(State2)), 138 | Return = setelement(Pos, Result, State3), 139 | Return. 140 | 141 | handle_cast(Request, State=#state{mod=Mod, 142 | modstate=ModState}) -> 143 | Result = Mod:handle_cast(Request, ModState), 144 | {Pos, ModState2} = case Result of 145 | {noreply,NewState} -> 146 | {2, NewState}; 147 | {noreply,NewState,hibernate} -> 148 | {2, NewState}; 149 | {noreply,NewState,_Timeout} -> 150 | {2, NewState}; 151 | {stop,_Reason,NewState} -> 152 | {3, NewState} 153 | end, 154 | State2 = State#state{modstate=ModState2}, 155 | State3 = update_rate(update_usage(State2)), 156 | Return = setelement(Pos, Result, State3), 157 | Return. 158 | 159 | handle_info('$sidejob_worker_tick', State) -> 160 | State2 = tick(State), 161 | schedule_tick(), 162 | {noreply, State2}; 163 | 164 | handle_info(Info, State=#state{mod=Mod, 165 | modstate=ModState}) -> 166 | Result = Mod:handle_info(Info, ModState), 167 | {Pos, ModState2} = case Result of 168 | {noreply,NewState} -> 169 | {2, NewState}; 170 | {noreply,NewState,hibernate} -> 171 | {2, NewState}; 172 | {noreply,NewState,_Timeout} -> 173 | {2, NewState}; 174 | {stop,_Reason,NewState} -> 175 | {3, NewState} 176 | end, 177 | State2 = State#state{modstate=ModState2}, 178 | State3 = update_rate(update_usage(State2)), 179 | Return = setelement(Pos, Result, State3), 180 | Return. 181 | 182 | terminate(_Reason, _State) -> 183 | ok. 184 | 185 | code_change(_OldVsn, State, _Extra) -> 186 | {ok, State}. 187 | 188 | %%%=================================================================== 189 | %%% Internal functions 190 | %%%=================================================================== 191 | 192 | schedule_tick() -> 193 | erlang:send_after(1000, self(), '$sidejob_worker_tick'). 194 | 195 | tick(State=#state{id=Id, reporter=Reporter}) -> 196 | Usage = current_usage(State), 197 | {In, Out, State2} = current_rate(State), 198 | sidejob_resource_stats:report(Reporter, Id, Usage, In, Out), 199 | State2. 200 | 201 | update_usage(State=#state{ets=ETS, width=Width, limit=Limit}) -> 202 | Usage = current_usage(State), 203 | Full = case Usage >= (Limit div Width) of 204 | true -> 205 | 1; 206 | false -> 207 | 0 208 | end, 209 | ets:insert(ETS, [{usage, Usage}, 210 | {full, Full}]), 211 | State. 212 | 213 | current_usage(#state{usage=default}) -> 214 | {message_queue_len, Len} = process_info(self(), message_queue_len), 215 | Len; 216 | current_usage(#state{usage=custom, mod=Mod, modstate=ModState}) -> 217 | Mod:current_usage(ModState). 218 | 219 | update_rate(State=#state{usage=custom}) -> 220 | %% Assume this is updated internally in the custom module 221 | State; 222 | update_rate(State=#state{usage=default, 223 | last_mq_len=LastLen}) -> 224 | {message_queue_len, Len} = process_info(self(), message_queue_len), 225 | Enqueue = Len - LastLen + 1, 226 | Dequeue = State#state.dequeue + 1, 227 | State#state{enqueue=Enqueue, dequeue=Dequeue}. 228 | 229 | %% TODO: Probably should rename since it resets rate 230 | current_rate(State=#state{usage=default, 231 | enqueue=Enqueue, 232 | dequeue=Dequeue}) -> 233 | State2 = State#state{enqueue=0, dequeue=0}, 234 | {Enqueue, Dequeue, State2}; 235 | current_rate(State=#state{usage=custom, mod=Mod, modstate=ModState}) -> 236 | {Enqueue, Dequeue, ModState2} = Mod:rate(ModState), 237 | State2 = State#state{modstate=ModState2}, 238 | {Enqueue, Dequeue, State2}. 239 | -------------------------------------------------------------------------------- /src/sidejob_worker_sup.erl: -------------------------------------------------------------------------------- 1 | %% ------------------------------------------------------------------- 2 | %% 3 | %% Copyright (c) 2013 Basho Technologies, Inc. All Rights Reserved. 4 | %% 5 | %% This file is provided to you under the Apache License, 6 | %% Version 2.0 (the "License"); you may not use this file 7 | %% except in compliance with the License. You may obtain 8 | %% a copy of the License at 9 | %% 10 | %% http://www.apache.org/licenses/LICENSE-2.0 11 | %% 12 | %% Unless required by applicable law or agreed to in writing, 13 | %% software distributed under the License is distributed on an 14 | %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | %% KIND, either express or implied. See the License for the 16 | %% specific language governing permissions and limitations 17 | %% under the License. 18 | %% 19 | %% ------------------------------------------------------------------- 20 | -module(sidejob_worker_sup). 21 | -behaviour(supervisor). 22 | 23 | %% API 24 | -export([start_link/4]). 25 | 26 | %% Supervisor callbacks 27 | -export([init/1]). 28 | 29 | %%%=================================================================== 30 | %%% API functions 31 | %%%=================================================================== 32 | 33 | start_link(Name, NumWorkers, StatsName, Mod) -> 34 | NameBin = atom_to_binary(Name, latin1), 35 | RegName = binary_to_atom(<>, latin1), 36 | supervisor:start_link({local, RegName}, ?MODULE, 37 | [Name, NumWorkers, StatsName, Mod]). 38 | 39 | %%%=================================================================== 40 | %%% Supervisor callbacks 41 | %%%=================================================================== 42 | 43 | init([Name, NumWorkers, StatsName, Mod]) -> 44 | Children = [begin 45 | WorkerName = sidejob_worker:reg_name(Name, Id), 46 | {WorkerName, 47 | {sidejob_worker, start_link, 48 | [WorkerName, Name, Id, WorkerName, StatsName, Mod]}, 49 | permanent, 5000, worker, [sidejob_worker]} 50 | end || Id <- lists:seq(1, NumWorkers)], 51 | {ok, {{one_for_one, 10, 10}, Children}}. 52 | -------------------------------------------------------------------------------- /test/full_par_ce.eqc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/basho/sidejob/10abce4fc76054c8aad230943b5c1a31b67efc6f/test/full_par_ce.eqc -------------------------------------------------------------------------------- /test/overload_children_ce.eqc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/basho/sidejob/10abce4fc76054c8aad230943b5c1a31b67efc6f/test/overload_children_ce.eqc -------------------------------------------------------------------------------- /test/pool_full_par_ce.eqc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/basho/sidejob/10abce4fc76054c8aad230943b5c1a31b67efc6f/test/pool_full_par_ce.eqc -------------------------------------------------------------------------------- /test/sidejob_eqc_prop_par_ce.eqc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/basho/sidejob/10abce4fc76054c8aad230943b5c1a31b67efc6f/test/sidejob_eqc_prop_par_ce.eqc -------------------------------------------------------------------------------- /test/sidejob_eqc_prop_seq_ce.eqc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/basho/sidejob/10abce4fc76054c8aad230943b5c1a31b67efc6f/test/sidejob_eqc_prop_seq_ce.eqc -------------------------------------------------------------------------------- /test/sidejob_eqc_prop_seq_ce_+s12:12.eqc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/basho/sidejob/10abce4fc76054c8aad230943b5c1a31b67efc6f/test/sidejob_eqc_prop_seq_ce_+s12:12.eqc -------------------------------------------------------------------------------- /test/supervisor_race_overrun.eqc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/basho/sidejob/10abce4fc76054c8aad230943b5c1a31b67efc6f/test/supervisor_race_overrun.eqc -------------------------------------------------------------------------------- /test/which_children_ce.eqc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/basho/sidejob/10abce4fc76054c8aad230943b5c1a31b67efc6f/test/which_children_ce.eqc -------------------------------------------------------------------------------- /test/which_children_pulse_ce.eqc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/basho/sidejob/10abce4fc76054c8aad230943b5c1a31b67efc6f/test/which_children_pulse_ce.eqc --------------------------------------------------------------------------------