├── .gitignore ├── .gitmodules ├── LICENSE ├── README.md ├── bin └── cleanstart.sh ├── docs ├── cql-query-execution.shapes ├── freshet-query-execution.png └── notes.md ├── freshet-core ├── LICENSE ├── README.md ├── doc │ ├── intro.md │ └── query-lang-data-models-papers.md ├── project.clj ├── resources │ ├── where-as-json.json │ └── wikipedia-activity-test-data.csv ├── src │ ├── clojure │ │ └── org │ │ │ └── pathirage │ │ │ └── freshet │ │ │ └── core.clj │ └── java │ │ └── org │ │ └── pathirage │ │ └── freshet │ │ ├── Constants.java │ │ ├── FreshetException.java │ │ ├── data │ │ ├── StreamDefinition.java │ │ └── StreamElement.java │ │ ├── operators │ │ ├── AggregateOperator.java │ │ ├── DeserializeOperator.java │ │ ├── FreshetOperator.java │ │ ├── FreshetOperatorType.java │ │ ├── GroupByOperator.java │ │ ├── IStreamOperator.java │ │ ├── MaterializeOperator.java │ │ ├── ProjectOperator.java │ │ ├── RStreamOperator.java │ │ ├── SelectOperator.java │ │ ├── WindowOperator.java │ │ ├── aggregate │ │ │ ├── AggregateFunction.java │ │ │ ├── AggregateFunctionFactory.java │ │ │ ├── AggregateType.java │ │ │ ├── Average.java │ │ │ ├── Count.java │ │ │ ├── Max.java │ │ │ ├── Min.java │ │ │ └── Sum.java │ │ └── select │ │ │ ├── Expression.java │ │ │ ├── ExpressionEvaluator.java │ │ │ ├── ExpressionType.java │ │ │ ├── OperatorType.java │ │ │ └── PredicateType.java │ │ ├── package-info.java │ │ ├── serde │ │ ├── AvroSerde.java │ │ ├── AvroSerdeFactory.java │ │ ├── QueueNodeSerde.java │ │ ├── QueueNodeSerdeFactory.java │ │ ├── StreamElementSerde.java │ │ └── StreamElementSerdeFactory.java │ │ └── utils │ │ ├── ExpressionSerde.java │ │ ├── KVStorageBackedEvictingQueue.java │ │ ├── QueueNode.java │ │ ├── Utilities.java │ │ ├── WikipediaFeedStreamTask.java │ │ └── system │ │ ├── WikipediaConsumer.java │ │ ├── WikipediaFeed.java │ │ └── WikipediaSystemFactory.java └── test │ ├── clojure │ └── org │ │ └── pathirage │ │ └── freshet │ │ ├── config_test.clj │ │ ├── expressioneval_test.clj │ │ ├── expresssionserde_test.clj │ │ └── helpers │ │ └── expressions.clj │ ├── java │ └── org │ │ └── pathirage │ │ └── freshet │ │ └── test │ │ ├── Constants.java │ │ └── ExpressionEvaluationTestUtils.java │ └── resources │ └── config-test.properties ├── freshet-dsl ├── .gitignore ├── LICENSE ├── README.md ├── doc │ ├── cql-to-samza.md │ ├── intro.md │ └── samples.md ├── project.clj ├── src │ └── clojure │ │ └── org │ │ └── pathirage │ │ └── freshet │ │ ├── dsl │ │ ├── compiler.clj │ │ ├── core.clj │ │ ├── helpers.clj │ │ └── samza.clj │ │ ├── samples │ │ ├── expressions.clj │ │ └── streams.clj │ │ └── utils │ │ └── config.clj └── test │ └── clojure │ └── org │ └── pathirage │ └── freshet │ └── expressiondsl_test.clj ├── freshet-helpers ├── .gitignore ├── LICENSE ├── README.md ├── doc │ └── wikipedia-activity-collector.md ├── project.clj ├── src │ ├── clojure │ │ └── org │ │ │ └── pathirage │ │ │ └── freshet │ │ │ └── helpers │ │ │ └── core.clj │ └── java │ │ └── org │ │ └── pathirage │ │ └── freshet │ │ └── helpers │ │ ├── KafkaMonitor.java │ │ ├── ParseWikipediaActivity.java │ │ └── WikipediaActivityFeed.java ├── wikipedia-actvities-2014-11-04T00:24:19.1.csv └── wikipedia-actvities-2014-11-04T00:26:14.32.csv ├── freshet-job-package ├── .gitignore ├── pom.xml └── src │ └── main │ ├── assembly │ └── src.xml │ └── resources │ └── log4j.xml ├── freshet-shell ├── bin │ ├── fshell │ ├── grid │ ├── log4j-console.xml │ ├── run-class.sh │ ├── run-job.sh │ └── setup.sh ├── conf │ ├── core-site.xml │ ├── freshet.conf │ ├── hdfs-site.xml │ └── yarn-site.xml ├── doc │ ├── intro.md │ └── shell-design.md └── project.clj └── references ├── art%3A10.1007%2Fs002360050095.pdf ├── atc14-paper-hu.pdf ├── bockermann_2014b.pdf ├── paper_199.pdf ├── rc25401.pdf ├── sacmat68-xing.pdf └── secret_vldbj13.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | ### IDEA ### 2 | *.iml 3 | .idea 4 | ### Clojure ### 5 | #pom.xml 6 | #pom.xml.asc 7 | *jar 8 | /lib/ 9 | /classes/ 10 | /target/ 11 | /checkouts/ 12 | .lein-deps-sum 13 | .lein-repl-history 14 | .lein-plugins/ 15 | .lein-failures 16 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milinda/Freshet-Old/3f387e55f3fe62cc9dd4adc8287abdbecf292991/.gitmodules -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Freshet 2 | 3 | [CQL](http://dl.acm.org/citation.cfm?id=1146463) based Clojure DSL for Apache Samza. 4 | 5 | Freshet is the first step towards a complete implementation of Kappa Architecture based on extension to SQL [1] to sup- port continuous queries. Freshet implements a subset(select, windowing, aggregates) of CQL on top of Apache Samza. Freshet implements *RStream* and *IStream* relation-to-stream operators, tuple and time based sliding windows to convert streams to relations and basic relation to relation operators for implementing business logic. Following CQL, Freshet uses *insert/delete* stream to model *instantaneous relations*. 6 | 7 | ## Freshet Query DSL 8 | 9 | ### Defining Streams 10 | 11 | ```clojure 12 | (defstream wikipedia-feed 13 | (stream-fields [:title :string 14 | :user :string 15 | :diff-bytes :integer 16 | :is-talk :boolean 17 | :is-new :boolean 18 | :is-bot-edit :boolean 19 | :timestamp :long]) 20 | (ts :timestamp)) 21 | ``` 22 | 23 | ### SELECT Queries 24 | 25 | ```clojure 26 | (select wikipedia-feed 27 | (modifiers :istream) 28 | (window (unbounded)) 29 | (where (> :diff-bytes 200))) 30 | ``` 31 | 32 | ## Freshet Query Execution (Based on CQL Execution Semantics) 33 | 34 | docs/freshet-query-execution.png 35 | 36 | ![Freshet Query Execution Diagram](/docs/freshet-query-execution.png?raw=true "Freshet Query Execution") 37 | 38 | ## License 39 | 40 | Freshet is licensed under the Apache License, version 2.0 41 | 42 | [1] Arvind Arasu, Shivnath Babu, and Jennifer Widom. 2006. The CQL continuous query language: semantic foundations and query execution. The VLDB Journal 15, 2 (June 2006), 121-142. DOI=10.1007/s00778-004-0147-z http://dx.doi.org/10.1007/s00778-004-0147-z 43 | 44 | -------------------------------------------------------------------------------- /bin/cleanstart.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | freshet_version=0.1.0-SNAPSHOT 4 | 5 | home_dir=`pwd` 6 | 7 | base_dir=$(dirname $0)/.. 8 | 9 | cd $base_dir 10 | base_dir=`pwd` 11 | cd $home_dir 12 | 13 | username=$(whoami) 14 | 15 | # Build and Install Freshet Core 16 | cd $base_dir/freshet-core 17 | lein install 18 | 19 | cd $base_dir/freshet-dsl 20 | lein install 21 | 22 | # Clean Freshet Job Package 23 | cd $base_dir/freshet-job-package 24 | mvn clean 25 | 26 | # Run Freshet Shell after local setup 27 | cd $base_dir/freshet-shell 28 | ./bin/setup.sh local 29 | ./bin/fshell 30 | -------------------------------------------------------------------------------- /docs/cql-query-execution.shapes: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milinda/Freshet-Old/3f387e55f3fe62cc9dd4adc8287abdbecf292991/docs/cql-query-execution.shapes -------------------------------------------------------------------------------- /docs/freshet-query-execution.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milinda/Freshet-Old/3f387e55f3fe62cc9dd4adc8287abdbecf292991/docs/freshet-query-execution.png -------------------------------------------------------------------------------- /docs/notes.md: -------------------------------------------------------------------------------- 1 | # Notes (01/07/2015) 2 | 3 | I have window and select operators working. Need to get join and aggregates working. Other than that there are several 4 | issues related to implementation. 5 | 6 | - Execution layer is not properly connected to DSL. 7 | - Operators are designed as StreamTasks and it prevent us from implementing stream optimizations like **fusion** and **fission**. 8 | - Stream element definition assumes a DB row like data. This prevent Freshet from supporting JSON/XML streams. 9 | - There is no proper representation for streaming query. 10 | - No way of supporting user defined functions, data types and operators. 11 | 12 | 13 | Yi Pan's API for Samza StreamQL has several nice concepts such as separation of operator layer from execution layer. This 14 | can be used as a base for implementing Freshet operator layer. 15 | 16 | If we separate out operator layer from execution layer, we can easily support different back-ends based on different requirements. 17 | Also this will allow us to do multiple levels of streaming optimizations. For example, 18 | 19 | - CQL level optimizations at operator layer and how operators are assigned to StreamTasks. 20 | - Then at StreamTask level we can do low level optimizations. 21 | 22 | # Notes (12/02/2014) 23 | 24 | ## Stream Processing Language Calculus 25 | 26 | Main entities: 27 | 28 | - Inputs: These are queues 29 | - Outputs: These are queues 30 | - Operators: Take queues and variables as inputs and output queues and variables 31 | - Variables: Used to maintain state 32 | 33 | Execution Configuration: 34 | 35 | - Function name to implementation map 36 | - Variable names to variable value map 37 | - Queue name to actual queue map 38 | 39 | # TODOS (10/15/2014) 40 | 41 | * Write a summary about CQL 42 | * Define minimal set of CQL constructs to support 43 | * Define set of samples which shows the usefulness of above subset 44 | * Design the DSL based on above 45 | * Define the internal representation of CQL 46 | * CQL to Execution Plan 47 | * Understand how IStream, DStream and RStream works and their semantics in CQL 48 | 49 | # KappaQL Query Layer Design Notes 50 | 51 | * First problem is what is the serialization format of the events comes in to Kafka from outside world. For the 52 | prototype we can use flat JSON objects. 53 | * Then how we are going to define the stream: 54 | > Given that we choose JSON as the serialization format above, we can just use a mapping of fields to their types 55 | > as the stream definition. Then the problem is how we annotate the ID/Primary Key of this stream in the definition. 56 | > And also which field contains the timestamp. In the first version its mandatory to have a timestamp field. 57 | > We can use something like follows. 58 | 59 | > ```clojure 60 | > (defstream stream 61 | > (fields [:name :string :address :string :age :integer :timestamp :long]) 62 | > (pk :id) 63 | > (ts :timestamp)) 64 | > ``` 65 | 66 | * How the queries looks like 67 | > ```clojure 68 | > (select stream 69 | > (fields [:name :firstname] :address :age) 70 | > (where {:age (less-than 34)})) 71 | >``` 72 | 73 | ## Queries Supported in v0.1 74 | 75 | - Only **stream-to-stream** queries are supported. 76 | - **select** with **where** clause and **projection** is supported. 77 | - *less-than*, *greater-than*, *equal*, *like*, *(greater-than|less-than)-or-equal* conditions composed with *AND* or *OR* is supported. 78 | 79 | 80 | ## Queries Supported in v0.2 81 | 82 | - Aggregates support with **stream-to-relation** queries. 83 | - In addition to v0.1 queries **group-by** and **aggregate** is supported. 84 | 85 | > ```clojure 86 | > (select stream 87 | > (aggregate ) 88 | > (group-by )) 89 | > ``` 90 | -------------------------------------------------------------------------------- /freshet-core/README.md: -------------------------------------------------------------------------------- 1 | # freshet-core 2 | 3 | A Clojure library designed to ... well, that part is up to you. 4 | 5 | ## Usage 6 | 7 | FIXME 8 | 9 | ## License 10 | 11 | Copyright © 2014 Milinda Pathirage 12 | 13 | Distributed under the Apache License Version 2.0 or (at 14 | your option) any later version. 15 | -------------------------------------------------------------------------------- /freshet-core/doc/intro.md: -------------------------------------------------------------------------------- 1 | # Introduction to kappaql-core 2 | 3 | TODO: write [great documentation](http://jacobian.org/writing/what-to-write/) 4 | -------------------------------------------------------------------------------- /freshet-core/doc/query-lang-data-models-papers.md: -------------------------------------------------------------------------------- 1 | # Query Languages and Data Models for Database Sequences and Data Streams 2 | 3 | Studies limitations of **relation algebra** and **SQL** in supporting queries over data streams 4 | and present alternative query language and data model. 5 | 6 | Notion of **Nonblocking Queries**, only continuous queries that can be supported on data streams. 7 | They are equivalent to monotonic queries. 8 | 9 | ## NB-Completeness 10 | 11 | RA's ability to express all mono- tonic queries expressible in RA using only the monotonic 12 | operators of RA. 13 | 14 | **RA is not NB-Complete and SQL is not more powerful than RA** 15 | 16 | ## Solutions 17 | 18 | - User defined aggregates natively coded in SQL 19 | - A generalisation of union operator to support the merging of multiple streams according to their timestamps 20 | 21 | ## Blocking Operators Are Not Allowed in Streaming Context 22 | 23 | - NOT IN 24 | - NOT EXISTS 25 | - ALL 26 | - EXCEPT 27 | 28 | **all monotonic queries, and only those, can be expressed using nonblocking computations** 29 | 30 | Cursor based programming model cannot be supported in data stream management systems. 31 | 32 | **Precense of timestamp is required for query completeness.** 33 | 34 | ## Related Work 35 | 36 | - Tapestry: Append-only databases supporting continuous queries 37 | 38 | 39 | ## Definitions 40 | 41 | - Sequence consists of ordered tuples, where as the order is immaterial in relational tables. 42 | - Streams are sequences of unbounded length, where the tuples are ordered by, and possibly time-stamped with, their arrival time. 43 | - Blocking query operator is a query operator that is unable to produce the first tuples of the output until it has seen the entire input. 44 | - A nonblocking query operator is one that produces all the tuples of output before it has detected the end of the input. 45 | 46 | **Query *Q* on a stream *S* can be implemented by a nonblocking operator iff *Q(S)* is monotonic with respect to *presequence*.** 47 | 48 | - *physically ordered relations*: those where only the relative positions of tuples in sequence are of significance. 49 | - *unordered relations*: traditional db relations, cal Codd's relaitons. 50 | - *logically ordered relations*: sequences where tuples are ordered by their timestamps or other logical keys. 51 | 52 | 53 | *Physical order* model is conductive to great expressive power, but cannot support binary operators as naturally as it does for unary ones. For example 54 | SQL *union* of two tabless T1 and T2 is nromally implemented by first returning all the tuples in T1 and then all the tuples in T2. The resulting operator 55 | is not suitable for continuous queries, since it is partially blocking with respect to T1. These issues can be resolved by using Codd's relations or logically 56 | order relations. 57 | 58 | 59 | ## Open Problem 60 | 61 | **What generalization of the relation data model, algebra, and query languages are needed to deal with sequences and streams.** 62 | -------------------------------------------------------------------------------- /freshet-core/project.clj: -------------------------------------------------------------------------------- 1 | (defproject org.pathirage.freshet/freshet-core "0.1.0-SNAPSHOT" 2 | :description "Freshet Core: CQL On Top Of Samza." 3 | :url "http://github.com/milinda/Freshet" 4 | :license {:name "Apache License, Version 2.0" 5 | :url "http://www.apache.org/licenses/LICENSE-2.0.html"} 6 | :repositories [["codehaus" "http://repository.codehaus.org/org/codehaus"]] 7 | :dependencies [[org.clojure/clojure "1.6.0"] 8 | [org.apache.samza/samza-api "0.7.0"] 9 | [org.apache.samza/samza-serializers_2.10 "0.7.0"] 10 | [org.apache.samza/samza-core_2.10 "0.7.0"] 11 | [org.apache.samza/samza-yarn_2.10 "0.7.0"] 12 | [org.apache.samza/samza-kv_2.10 "0.7.0"] 13 | [org.apache.samza/samza-kafka_2.10 "0.7.0"] 14 | [org.apache.kafka/kafka_2.10 "0.8.1"] 15 | [org.slf4j/slf4j-api "1.6.2"] 16 | [org.slf4j/slf4j-log4j12 "1.6.2"] 17 | [com.google.guava/guava "18.0"] 18 | [com.esotericsoftware/kryo "3.0.0"] 19 | [org.codehaus.jackson/jackson-jaxrs "1.8.5"] 20 | [org.apache.avro/avro "1.7.7"] 21 | [org.schwering/irclib "1.10"] 22 | [commons-codec/commons-codec "1.4"]] 23 | :source-paths ["src/clojure"] 24 | :java-source-paths ["src/java"] 25 | :test-paths ["test/clojure" "test/java"] 26 | :javac-options ["-target" "1.6" "-source" "1.6" "-Xlint:-options"] 27 | :profiles {:test {:resource-paths ["test/resources"]}}) 28 | -------------------------------------------------------------------------------- /freshet-core/resources/where-as-json.json: -------------------------------------------------------------------------------- 1 | {"predicate": "AND", "lhs": {"predicate" : "==", "lhs" : {"type": "field", "value": "name"}, "rhs" : {"type": "value", "value": "Milinda"}}, "rhs": {"predicate" : ">", "lhs" : {"type": "field", "value": "age"}, "rhs" : {"type": "value", "value": 25}}} -------------------------------------------------------------------------------- /freshet-core/src/clojure/org/pathirage/freshet/core.clj: -------------------------------------------------------------------------------- 1 | ;; 2 | ;; 3 | ;; Copyright 2014 Milinda Pathirage 4 | ;; 5 | ;; Licensed under the Apache License, Version 2.0 (the "License"); 6 | ;; you may not use this file except in compliance with the License. 7 | ;; You may obtain a copy of the License at 8 | ;; 9 | ;; http://www.apache.org/licenses/LICENSE-2.0 10 | ;; 11 | ;; Unless required by applicable law or agreed to in writing, software 12 | ;; distributed under the License is distributed on an "AS IS" BASIS, 13 | ;; WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | ;; See the License for the specific language governing permissions and 15 | ;; limitations under the License. 16 | ;; 17 | ;; 18 | 19 | (ns org.pathirage.freshet.core) 20 | 21 | (comment 22 | "Defining streams" 23 | (defstream stream 24 | (fields [:name :string :address :string :age :integer :timestamp :long]) 25 | (pk :id) 26 | (ts :timestamp)) 27 | 28 | "Querying" 29 | (select stream 30 | (fields [:name :firstname] :address :age) 31 | (window (range 30)) 32 | (where {:age (less-than 34)})) 33 | 34 | "Relation Algebric Expression" 35 | (def query {:stream stock-ticks :project [name, xx] :select condition}) 36 | (def condition [:less-than :field-name value]) 37 | (def complex-condition [:and [:less-than :field-name value] [:equal :field-name value]])) 38 | 39 | (defmacro defstream 40 | "Define a stream representing a topic in Kafka, applying functions in the body which changes the stream definition." 41 | [stream & body]) 42 | 43 | (defmacro select 44 | "Build a select query, apply any modifiers specified in the body and then generate and submit DAG of Samza jobs 45 | which is the physical execution plan of the continuous query on stream specified by `stream`. `stream` is an stream 46 | created by `defstream`. Returns a job identifier which can used to monitor the query or error incase of a failure." 47 | [stream & body]) 48 | 49 | (defmacro do-until 50 | [& clauses] 51 | (when clauses 52 | (list 'clojure.core/when (first clauses) 53 | (if (next clauses) 54 | (second clauses) 55 | (throw (IllegalArgumentException. "do-until requires even number of forms."))) 56 | (cons 'do-until (nnext clauses))))) 57 | 58 | (defmacro unless 59 | [condition & body] 60 | `(if (not ~condition) 61 | (do ~@body))) 62 | 63 | (declare handle-things) 64 | 65 | (defmacro domain 66 | [name & body] 67 | `{:tag :domain 68 | :attrs {:name (str '~name)} 69 | :content [~@body]}) 70 | 71 | (defmacro grouping 72 | [name & body] 73 | `{:tag :grouping 74 | :attrs {:name (str '~name)} 75 | :content [~@(handle-things body)]}) 76 | 77 | (declare grok-attr grok-props) 78 | 79 | (defn handle-things [things] 80 | (for [t things] 81 | {:tag :thing 82 | :attr (grok-attrs (take-while (comp not vector?) t)) 83 | :content (if-let [c (grok-props (drop-while (comp not vector?) t))] 84 | [c] 85 | [])})) 86 | 87 | (defn grok-attrs [attrs] 88 | (into {:name (str (first attrs))} 89 | (for [a (rest attrs)] 90 | (cond 91 | (list? a) [:isa (str (second a))] 92 | (string? a) [:comment a])))) 93 | 94 | (defn grok-props [props] 95 | (when props 96 | {:tag :properties, :attrs nil, 97 | :content (apply vector (for [p props] 98 | {:tag :property, 99 | :attrs {:name (str (first p))}, 100 | :content nil}))})) 101 | 102 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/Constants.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet; 19 | 20 | public class Constants { 21 | public static final String CONST_STR_UNDEFINED = "kappaql.undefined"; 22 | public static final String CONST_STR_DEFAULT_SYSTEM = "kafka"; 23 | 24 | public static final String CONF_QUERY_ID = "org.pathirage.kappaql.query.id"; 25 | public static final String CONF_SYSTEM = "org.pathirage.kappaql.system"; 26 | public static final String CONF_DOWN_STREAM_TOPIC = "org.pathirage.kappaql.downstream.topic"; 27 | 28 | public static final String CONF_SAMZA_TASK_INPUTS = "task.inputs"; 29 | public static final String CONF_SAMZA_TASK_CLASS = "task.class"; 30 | public static final String CONF_SAMZA_TASK_CHECKPOINT_FACTORY = "task.checkpoint.factory"; 31 | public static final String CONF_SAMZA_TASK_CHECKPOINT_SYSTEM = "task.checkpoint.system"; 32 | public static final String CONF_SAMZA_TASK_CHECKPOINT_REPLICATION_FACTOR = "task.checkpoint.replication.factor"; 33 | public static final String CONF_SAMZA_JOB_NAME = "job.name"; 34 | public static final String CONF_SAMZA_JOB_FACTORY_CLASS = "job.factory.class"; 35 | 36 | public static final String CONF_OPERATOR_INPUT_STREAMS = "org.pathirage.kappaql.input.streams."; 37 | public static final String CONF_OPERATOR_OUTPUT_STREAMS = "org.pathirage.kappaql.output.streams."; 38 | 39 | public static final String CONF_WINDOW_RANGE = "org.pathirage.kappaql.window.range"; 40 | public static final String CONF_WINDOW_RANGE_SLOT_SIZE = "org.pathirage.kappaql.window.range.slot.size"; 41 | public static final String CONF_WINDOW_ROWS = "org.pathirage.kappaql.window.rows"; 42 | 43 | public static final String CONF_SYSTEMS_WIKIPEDIA_FACTORY = "systems.wikipedia.samza.factory"; 44 | public static final String CONF_SYSTEMS_WIKIPEDIA_HOST = "systems.wikipedia.host"; 45 | public static final String CONF_SYSTEMS_WIKIPEDIA_PORT = "systems.wikipedia.port"; 46 | 47 | public static final String CONF_GROUPBY_FIELDS = "org.pathirage.kappaql.groupby.fields"; 48 | 49 | public static final String CONF_AGGREGATE_AGGREGATES = "org.pathirage.kappaql.aggregate.aggregrates."; 50 | public static final String CONF_AGGREGATE_TYPE = "type"; 51 | public static final String CONF_AGGREGATE_FIELD = "field"; 52 | public static final String CONF_AGGREGATE_ALIAS = "alias"; 53 | 54 | public static final String CONF_INPUT_STREAM = "org.pathirage.freshet.input.stream"; 55 | 56 | public static final String CONF_STREAM_AVRO_SCHEMA = "org.pathirage.freshet.stream.avro.schema"; 57 | 58 | public static final String CONF_SELECT_WHERE_EXPRESSION = "org.pathirage.freshet.select.where.expression"; 59 | 60 | public static final String ERROR_UNDEFINED_OUTPUT_STREAM = "Undefined output stream."; 61 | public static final String ERROR_UNABLE_TO_FIND_CONFIGURATION = "Unable to find the configuration."; 62 | public static final String ERROR_UNDEFINED_OPERATOR_TYPE = "Undefined operator type."; 63 | public static final String ERROR_UNDEFINED_GROUP_BY_FIELDS = "Undefined group by fields."; 64 | 65 | public static final String WARN_BOTH_ROWS_AND_RANGE_DEFINED = "Both time based and tuple based windows are defined. Priority goes to time based windows."; 66 | } 67 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/FreshetException.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet; 19 | 20 | public class FreshetException extends RuntimeException { 21 | public FreshetException() { 22 | super(); 23 | } 24 | 25 | public FreshetException(String message) { 26 | super(message); 27 | } 28 | 29 | public FreshetException(String message, Throwable cause) { 30 | super(message, cause); 31 | } 32 | 33 | public FreshetException(Throwable cause) { 34 | super(cause); 35 | } 36 | 37 | protected FreshetException(String message, Throwable cause, boolean enableSuppression, boolean writableStackTrace) { 38 | super(message, cause, enableSuppression, writableStackTrace); 39 | } 40 | } 41 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/data/StreamDefinition.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.data; 19 | 20 | import java.util.HashMap; 21 | import java.util.Map; 22 | import java.util.Set; 23 | 24 | public class StreamDefinition { 25 | 26 | private Map fieldTypeMap; 27 | 28 | public StreamDefinition(){ 29 | fieldTypeMap = new HashMap(); 30 | } 31 | 32 | public StreamDefinition(Map fieldTypeMap){ 33 | this.fieldTypeMap = fieldTypeMap; 34 | } 35 | 36 | public Set getFields(){ 37 | return this.fieldTypeMap.keySet(); 38 | } 39 | 40 | public FieldType getType(String field){ 41 | return fieldTypeMap.get(field); 42 | } 43 | 44 | public boolean isValidField(String field){ 45 | return fieldTypeMap.containsKey(field); 46 | } 47 | 48 | public void setFieldTypeMap(Map fieldTypeMap) { 49 | this.fieldTypeMap.putAll(fieldTypeMap); 50 | } 51 | 52 | public enum FieldType { 53 | INTEGER, 54 | STRING, 55 | BOOLEAN, 56 | FLOAT, 57 | TIME, 58 | LONG 59 | } 60 | } 61 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/data/StreamElement.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.data; 19 | 20 | import com.google.common.collect.ImmutableMap; 21 | 22 | import java.util.HashMap; 23 | import java.util.Map; 24 | 25 | /* Represent a element of streams in KappaQL. This is a immutable data structure. */ 26 | public class StreamElement { 27 | 28 | /* System wide notion of clock will be required in future. */ 29 | private long globalClock; 30 | 31 | /* Time this element got introduced to the world. */ 32 | private long timestamp; 33 | 34 | /* Unique ID may use in future to identify the element. */ 35 | private String id; 36 | 37 | /* Delete and insert elements are used to simulate relations. delete true means removing 38 | * element from relation, otherwise its a insert. */ 39 | private boolean delete; 40 | 41 | /* Actual contents of stream element. */ 42 | private final Map fields; 43 | 44 | public StreamElement(Map fields, long globalClock, long timestamp, String id){ 45 | this.globalClock = globalClock; 46 | this.timestamp = timestamp; 47 | this.id = id; 48 | this.fields = fields; 49 | } 50 | 51 | public Object getField(String fieldName) { 52 | return fields.get(fieldName); 53 | } 54 | 55 | public String getStringField(String fieldName){ 56 | return (String)fields.get(fieldName); 57 | } 58 | 59 | public Integer getIntegerField(String fieldName){ 60 | return (Integer)fields.get(fieldName); 61 | } 62 | 63 | public Double getDoubleField(String fieldName){ 64 | return (Double)fields.get(fieldName); 65 | } 66 | 67 | public Float getFloatField(String fieldName){ 68 | return (Float)fields.get(fieldName); 69 | } 70 | 71 | public Long getLongField(String fieldName) { 72 | return (Long)fields.get(fieldName); 73 | } 74 | 75 | public Boolean getBoolField(String fieldName){ 76 | return (Boolean)fields.get(fieldName); 77 | } 78 | 79 | public StreamElement extend(Map newFields){ 80 | for(Map.Entry entry: fields.entrySet()){ 81 | if(!newFields.containsKey(entry.getKey())){ 82 | newFields.put(entry.getKey(), entry.getValue()); 83 | } 84 | } 85 | StreamElement newElement = new StreamElement(newFields, this.globalClock, this.timestamp, this.id); 86 | 87 | return newElement; 88 | } 89 | 90 | public boolean isDelete() { 91 | return delete; 92 | } 93 | 94 | public void setDelete(boolean delete) { 95 | this.delete = delete; 96 | } 97 | 98 | 99 | 100 | public long getGlobalClock() { 101 | return globalClock; 102 | } 103 | 104 | public long getTimestamp() { 105 | return timestamp; 106 | } 107 | 108 | public String getId() { 109 | return id; 110 | } 111 | 112 | public Map getFields(){ 113 | Map r = new HashMap(); 114 | 115 | for(Map.Entry e : fields.entrySet()){ 116 | r.put(e.getKey(), e.getValue()); 117 | } 118 | 119 | return r; 120 | } 121 | } 122 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/AggregateOperator.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators; 19 | 20 | import org.apache.samza.config.Config; 21 | import org.apache.samza.system.IncomingMessageEnvelope; 22 | import org.apache.samza.task.*; 23 | import org.pathirage.freshet.Constants; 24 | import org.pathirage.freshet.operators.aggregate.AggregateFunction; 25 | import org.pathirage.freshet.operators.aggregate.AggregateFunctionFactory; 26 | import org.slf4j.Logger; 27 | import org.slf4j.LoggerFactory; 28 | 29 | import java.util.List; 30 | 31 | public class AggregateOperator extends FreshetOperator implements StreamTask, InitableTask { 32 | private static final Logger log = LoggerFactory.getLogger(AggregateOperator.class); 33 | 34 | /* Aggregates map. Single query can have multiple aggregates. */ 35 | private List aggregates; 36 | 37 | @Override 38 | public void init(Config config, TaskContext taskContext) throws Exception { 39 | this.config = config; 40 | 41 | initOperator(FreshetOperatorType.AGGREGATE); 42 | 43 | /* To specify the aggregates, let assume we use prefixed property with 1, 2, 3, .. to specify the order. */ 44 | Config aggregatesConfig = config.subset(Constants.CONF_AGGREGATE_AGGREGATES); 45 | 46 | for(int i = 0; i < aggregatesConfig.size(); i++){ 47 | aggregates.add(i, AggregateFunctionFactory.buildAggregateFunction( 48 | aggregatesConfig.get(Integer.toString(i), Constants.CONST_STR_UNDEFINED), 49 | this.inputStreams)); 50 | } 51 | } 52 | 53 | @Override 54 | public void process(IncomingMessageEnvelope incomingMessageEnvelope, 55 | MessageCollector messageCollector, 56 | TaskCoordinator taskCoordinator) throws Exception { 57 | String incoming = incomingMessageEnvelope.getSystemStreamPartition().getStream(); 58 | } 59 | 60 | } 61 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/DeserializeOperator.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators; 19 | 20 | import org.apache.avro.Schema; 21 | import org.apache.avro.generic.GenericDatumReader; 22 | import org.apache.avro.generic.GenericRecord; 23 | import org.apache.avro.io.DatumReader; 24 | import org.apache.samza.config.Config; 25 | import org.apache.samza.system.IncomingMessageEnvelope; 26 | import org.apache.samza.system.OutgoingMessageEnvelope; 27 | import org.apache.samza.system.SystemStream; 28 | import org.apache.samza.task.*; 29 | import org.pathirage.freshet.Constants; 30 | import org.pathirage.freshet.FreshetException; 31 | import org.pathirage.freshet.data.StreamDefinition; 32 | import org.pathirage.freshet.data.StreamElement; 33 | import org.slf4j.Logger; 34 | import org.slf4j.LoggerFactory; 35 | 36 | public class DeserializeOperator extends FreshetOperator implements StreamTask, InitableTask{ 37 | private static final Logger log = LoggerFactory.getLogger(DeserializeOperator.class); 38 | 39 | private Schema inputStreamAvroSchema; 40 | 41 | private String inputStream; 42 | 43 | @Override 44 | public void init(Config config, TaskContext taskContext) throws Exception { 45 | this.config = config; 46 | 47 | initOperator(FreshetOperatorType.DESERIALIZE); 48 | 49 | String schemaStr = config.get(Constants.CONF_STREAM_AVRO_SCHEMA, Constants.CONST_STR_UNDEFINED); 50 | 51 | if(!schemaStr.equals(Constants.CONST_STR_UNDEFINED)){ 52 | this.inputStreamAvroSchema = new Schema.Parser().parse(schemaStr); 53 | } else { 54 | String errMsg = buildLogMessage("Cannot find Avro schema for stream elements."); 55 | log.error(errMsg); 56 | throw new FreshetException(errMsg); 57 | } 58 | 59 | String inputStream = config.get(Constants.CONF_INPUT_STREAM, Constants.CONST_STR_UNDEFINED); 60 | 61 | if(!inputStream.equals(Constants.CONST_STR_UNDEFINED)){ 62 | this.inputStream = inputStream; 63 | } else { 64 | String errMsg = buildLogMessage("Cannot find input stream in configuration."); 65 | log.error(errMsg); 66 | throw new FreshetException(errMsg); 67 | } 68 | } 69 | 70 | @Override 71 | public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector, TaskCoordinator taskCoordinator) throws Exception { 72 | String incomingStream = incomingMessageEnvelope.getSystemStreamPartition().getStream(); 73 | 74 | if(incomingStream.equals(inputStream)){ 75 | GenericRecord message = (GenericRecord)incomingMessageEnvelope.getMessage(); 76 | 77 | StreamDefinition streamDefinition = inputStreams.get(incomingStream); 78 | } 79 | } 80 | } 81 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/FreshetOperator.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators; 19 | 20 | import org.apache.samza.config.Config; 21 | import org.pathirage.freshet.Constants; 22 | import org.pathirage.freshet.FreshetException; 23 | import org.pathirage.freshet.data.StreamDefinition; 24 | import org.pathirage.freshet.utils.Utilities; 25 | import org.slf4j.Logger; 26 | import org.slf4j.LoggerFactory; 27 | 28 | import java.util.HashMap; 29 | import java.util.Map; 30 | import java.util.UUID; 31 | 32 | /* In KappaQL, query is transformed in to execution plan which consists of DAG of operators(Samza jobs) connected via 33 | * Kakfa queues. */ 34 | public abstract class FreshetOperator { 35 | private static final Logger log = LoggerFactory.getLogger(FreshetOperator.class); 36 | 37 | /* Type of the query operator */ 38 | private FreshetOperatorType type; 39 | 40 | /* Identify the Samza job specific to a query */ 41 | private String id; 42 | 43 | /* Query this job belongs to */ 44 | private String queryId; 45 | 46 | /* Topic to push the downstream. */ 47 | protected String downStreamTopic; 48 | 49 | protected Config config; 50 | 51 | /* Samza System */ 52 | protected String system; 53 | 54 | /* Definitions of input streams for this operator */ 55 | protected Map inputStreams = new HashMap(); 56 | 57 | /* Definitions of output streams of this operator */ 58 | protected Map outputStreams = new HashMap(); 59 | 60 | protected void initOperator(FreshetOperatorType type){ 61 | if(config == null){ 62 | log.error(Constants.ERROR_UNABLE_TO_FIND_CONFIGURATION); 63 | throw new FreshetException(Constants.ERROR_UNABLE_TO_FIND_CONFIGURATION); 64 | } 65 | 66 | this.type = type; 67 | this.queryId = config.get(Constants.CONF_QUERY_ID, Constants.CONST_STR_UNDEFINED); 68 | 69 | if(type != null){ 70 | this.id = type + "-" + this.queryId + "-" + UUID.randomUUID(); 71 | } else { 72 | log.error(Constants.ERROR_UNDEFINED_OPERATOR_TYPE); 73 | throw new FreshetException(Constants.ERROR_UNDEFINED_OPERATOR_TYPE); 74 | } 75 | 76 | String downStreamTopic = config.get(Constants.CONF_DOWN_STREAM_TOPIC, Constants.CONST_STR_UNDEFINED); 77 | if (downStreamTopic.equals(Constants.CONST_STR_UNDEFINED)) { 78 | log.warn("Down stream topic undefined."); 79 | } 80 | 81 | this.downStreamTopic = downStreamTopic; 82 | 83 | this.system = config.get(Constants.CONF_SYSTEM, Constants.CONST_STR_DEFAULT_SYSTEM); 84 | 85 | Config inputStreamsConfig = config.subset(Constants.CONF_OPERATOR_INPUT_STREAMS); 86 | for(String inputStream : inputStreamsConfig.keySet()){ 87 | // TODO: How to handle undefined 88 | Map fields = Utilities.parseMap(inputStreamsConfig.get(inputStream)); 89 | Map fieldTypes = new HashMap(); 90 | for(Map.Entry e : fields.entrySet()){ 91 | fieldTypes.put(e.getKey(), StreamDefinition.FieldType.valueOf(e.getValue().toUpperCase())); 92 | } 93 | 94 | this.inputStreams.put(inputStream, new StreamDefinition(fieldTypes)); 95 | } 96 | 97 | Config outputStreamsConfig = config.subset(Constants.CONF_OPERATOR_OUTPUT_STREAMS); 98 | for(String outputStream : outputStreamsConfig.keySet()){ 99 | Map fields = Utilities.parseMap(outputStreamsConfig.get(outputStream)); 100 | Map fieldTypes = new HashMap(); 101 | for(Map.Entry e : fields.entrySet()){ 102 | fieldTypes.put(e.getKey(), StreamDefinition.FieldType.valueOf(e.getValue())); 103 | } 104 | 105 | this.outputStreams.put(outputStream, new StreamDefinition(fieldTypes)); 106 | } 107 | } 108 | 109 | 110 | public FreshetOperatorType getType() { 111 | return type; 112 | } 113 | 114 | public String getId() { 115 | return id; 116 | } 117 | 118 | public String getQueryId() { 119 | return queryId; 120 | } 121 | 122 | public String buildLogMessage(String error){ 123 | return String.format("Query: %s, Operator Type: %s, Operator ID: %s, System: %s, Error: %s", queryId, type, id, system, error); 124 | } 125 | } 126 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/FreshetOperatorType.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators; 19 | 20 | public enum FreshetOperatorType { 21 | WINDOW, 22 | SELECT, 23 | PROJECT, 24 | GROUP_BY, 25 | AGGREGATE, 26 | DESERIALIZE 27 | } 28 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/GroupByOperator.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators; 19 | 20 | import com.google.common.base.Joiner; 21 | import com.google.common.collect.Ordering; 22 | import org.apache.samza.config.Config; 23 | import org.apache.samza.system.IncomingMessageEnvelope; 24 | import org.apache.samza.system.OutgoingMessageEnvelope; 25 | import org.apache.samza.system.SystemStream; 26 | import org.apache.samza.task.*; 27 | import org.pathirage.freshet.Constants; 28 | import org.pathirage.freshet.FreshetException; 29 | import org.pathirage.freshet.data.StreamElement; 30 | import org.slf4j.Logger; 31 | import org.slf4j.LoggerFactory; 32 | 33 | import java.util.*; 34 | 35 | /** 36 | * Divide input stream into multiple output streams based on the group by key. 37 | *

38 | * 10/08/2014 39 | * ---------- 40 | * Main issue with group-by operator is lack of support for dynamic routing. Because we don't know the cardinality 41 | * of the group-by attribute its hard to do static planning. Current solution is to use Kafka topic's partitioning to 42 | * parallelize the execution among multiple down stream aggregators. 43 | */ 44 | public class GroupByOperator extends FreshetOperator implements StreamTask, InitableTask { 45 | private static final Logger log = LoggerFactory.getLogger(GroupByOperator.class); 46 | 47 | /* Order is important. */ 48 | private List groupByFields; 49 | 50 | @Override 51 | public void init(Config config, TaskContext taskContext) throws Exception { 52 | initOperator(FreshetOperatorType.GROUP_BY); 53 | 54 | /* Comma separated values of group by fields */ 55 | String groupByFields = config.get(Constants.CONF_GROUPBY_FIELDS, Constants.CONST_STR_UNDEFINED); 56 | 57 | if (groupByFields.equals(Constants.CONST_STR_UNDEFINED)) { 58 | throw new FreshetException(Constants.ERROR_UNDEFINED_GROUP_BY_FIELDS); 59 | } 60 | 61 | 62 | this.groupByFields = Arrays.asList(groupByFields.split("\\s*,\\s*")); 63 | Collections.sort(this.groupByFields, Ordering.usingToString()); 64 | } 65 | 66 | @Override 67 | public void process(IncomingMessageEnvelope incomingMessageEnvelope, 68 | MessageCollector messageCollector, 69 | TaskCoordinator taskCoordinator) throws Exception { 70 | StreamElement se = (StreamElement) incomingMessageEnvelope.getMessage(); 71 | 72 | /* Based on group by fields we create new key for the message. This key is used to partitioned messages 73 | * from different group to different partition. 74 | * 75 | * 10/08/2014 76 | * ---------- 77 | * I assume Samza creates one partition for each group dynamically. 78 | * I assumes this code maintains the order of fields. 79 | * TODO: Test the order maintenance. */ 80 | List values = new LinkedList(); 81 | for (String f : groupByFields) { 82 | values.add(se.getField(f)); 83 | } 84 | 85 | String partitionKey = Joiner.on("-").skipNulls().join(values); 86 | 87 | messageCollector.send(new OutgoingMessageEnvelope(new SystemStream(system, downStreamTopic), 88 | partitionKey, 89 | partitionKey, 90 | se)); 91 | } 92 | } 93 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/IStreamOperator.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators; 19 | 20 | import org.apache.samza.config.Config; 21 | import org.apache.samza.system.IncomingMessageEnvelope; 22 | import org.apache.samza.task.*; 23 | 24 | public class IStreamOperator extends FreshetOperator implements StreamTask, InitableTask { 25 | @Override 26 | public void init(Config config, TaskContext taskContext) throws Exception { 27 | 28 | } 29 | 30 | @Override 31 | public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector, TaskCoordinator taskCoordinator) throws Exception { 32 | 33 | } 34 | } 35 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/MaterializeOperator.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators; 19 | 20 | /** 21 | * Materialize relations generated into a persistent storage for use by applications. 22 | */ 23 | public class MaterializeOperator { 24 | } 25 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/ProjectOperator.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators; 19 | 20 | import org.apache.samza.config.Config; 21 | import org.apache.samza.system.IncomingMessageEnvelope; 22 | import org.apache.samza.task.*; 23 | 24 | public class ProjectOperator extends FreshetOperator implements StreamTask, InitableTask{ 25 | @Override 26 | public void init(Config config, TaskContext taskContext) throws Exception { 27 | 28 | } 29 | 30 | @Override 31 | public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector, TaskCoordinator taskCoordinator) throws Exception { 32 | 33 | } 34 | } 35 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/RStreamOperator.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators; 19 | 20 | import org.apache.samza.config.Config; 21 | import org.apache.samza.system.IncomingMessageEnvelope; 22 | import org.apache.samza.task.*; 23 | 24 | public class RStreamOperator extends FreshetOperator implements StreamTask, InitableTask { 25 | @Override 26 | public void init(Config config, TaskContext taskContext) throws Exception { 27 | 28 | } 29 | 30 | @Override 31 | public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector, TaskCoordinator taskCoordinator) throws Exception { 32 | 33 | } 34 | } 35 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/SelectOperator.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators; 19 | 20 | import org.apache.commons.codec.binary.Base64; 21 | import org.apache.samza.config.Config; 22 | import org.apache.samza.system.IncomingMessageEnvelope; 23 | import org.apache.samza.system.OutgoingMessageEnvelope; 24 | import org.apache.samza.system.SystemStream; 25 | import org.apache.samza.task.*; 26 | import org.pathirage.freshet.Constants; 27 | import org.pathirage.freshet.FreshetException; 28 | import org.pathirage.freshet.data.StreamDefinition; 29 | import org.pathirage.freshet.data.StreamElement; 30 | import org.pathirage.freshet.operators.select.Expression; 31 | import org.pathirage.freshet.operators.select.ExpressionEvaluator; 32 | import org.pathirage.freshet.utils.ExpressionSerde; 33 | import org.slf4j.Logger; 34 | import org.slf4j.LoggerFactory; 35 | 36 | public class SelectOperator extends FreshetOperator implements StreamTask, InitableTask{ 37 | private static final Logger log = LoggerFactory.getLogger(SelectOperator.class); 38 | 39 | private Expression whereClause; 40 | 41 | private ExpressionEvaluator expressionEvaluator; 42 | 43 | @Override 44 | public void init(Config config, TaskContext taskContext) throws Exception { 45 | this.config = config; 46 | 47 | initOperator(FreshetOperatorType.SELECT); 48 | 49 | this.expressionEvaluator = new ExpressionEvaluator(); 50 | 51 | // Read where clause from config and build the expression. 52 | String expression = config.get(Constants.CONF_SELECT_WHERE_EXPRESSION, Constants.CONST_STR_UNDEFINED); 53 | if(!expression.equals(Constants.CONST_STR_UNDEFINED)){ 54 | Expression expr = ExpressionSerde.deserialize(new String(Base64.decodeBase64(expression.getBytes()))); 55 | if(!expr.isPredicate()){ 56 | String errMessage = "Unsupported expression type: " + expr.getType() + " expression: " + expression; 57 | log.error(errMessage); 58 | throw new FreshetException(errMessage); 59 | } 60 | 61 | this.whereClause = expr; 62 | } 63 | } 64 | 65 | @Override 66 | public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector, TaskCoordinator taskCoordinator) throws Exception { 67 | StreamElement se = (StreamElement)incomingMessageEnvelope.getMessage(); 68 | String inputStream = incomingMessageEnvelope.getSystemStreamPartition().getStream(); 69 | StreamDefinition sd = inputStreams.get(inputStream); 70 | 71 | if(sd == null){ 72 | String errMessage = "Unknown stream " + inputStream; 73 | log.error(errMessage); 74 | throw new FreshetException(errMessage); 75 | } 76 | 77 | if(expressionEvaluator.evalPredicate(se, sd, whereClause)){ 78 | messageCollector.send(new OutgoingMessageEnvelope(new SystemStream(system, downStreamTopic), se)); 79 | } 80 | 81 | // TODO: How down stream of select is handled. If it handled as insert/delete stream we need to modify select logic. 82 | // 12/02/2014: If select is done before window operator we don't need to handle insert/delete. 83 | } 84 | } 85 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/WindowOperator.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators; 19 | 20 | import org.apache.samza.config.Config; 21 | import org.apache.samza.metrics.Gauge; 22 | import org.apache.samza.storage.kv.KeyValueStore; 23 | import org.apache.samza.system.IncomingMessageEnvelope; 24 | import org.apache.samza.system.OutgoingMessageEnvelope; 25 | import org.apache.samza.system.SystemStream; 26 | import org.apache.samza.task.*; 27 | import org.pathirage.freshet.Constants; 28 | import org.pathirage.freshet.data.StreamElement; 29 | import org.pathirage.freshet.utils.KVStorageBackedEvictingQueue; 30 | import org.pathirage.freshet.utils.QueueNode; 31 | import org.slf4j.Logger; 32 | import org.slf4j.LoggerFactory; 33 | 34 | import java.util.concurrent.atomic.AtomicLong; 35 | 36 | /** 37 | * Sliding window operator reads the input stream's tuples from input queue, update the sliding-window 38 | * synopsis, and outputs the insertion and deletions to this window to the output queues. 39 | *

40 | * 10/08/2014 41 | * ---------- 42 | * - Only handles the tuple based sliding windows. 43 | * - Recovery is handled by persistent synopsis storage and Kafka queue. If the operator goes down, last successful 44 | * synopsis update will be there in local storage and last read tuple will be tracked internally by Samza. Upon 45 | * restart input stream read will last from the last read tuple and can recover the synopsis from local storage 46 | * assuming operator get restarted in same node. 47 | */ 48 | public class WindowOperator extends FreshetOperator implements StreamTask, InitableTask { 49 | private static Logger log = LoggerFactory.getLogger(WindowOperator.class); 50 | 51 | /* True if this a time-based sliding window. */ 52 | private boolean timeBased; 53 | 54 | /* Range of time based sliding window in seconds. */ 55 | private long range; 56 | 57 | /* sliding window can be divided into slots. */ 58 | private long slotSize; 59 | 60 | /* True if this a tuple-based sliding window. */ 61 | private boolean tupleBased; 62 | 63 | /* Max tuples in tuple based sliding window */ 64 | private int rows; 65 | 66 | /* CQL uses concept called synopses to implement windowing. This stores 67 | * synopsis as key/value pairs. This assumes every stream element has unique id. */ 68 | private KeyValueStore store; 69 | 70 | private KeyValueStore metadataStore; 71 | 72 | /* Window size gauge metric for reporting */ 73 | private Gauge windowSizeGauge; 74 | 75 | /* Current size of the window to handle handle/drop events to/from window as needed. */ 76 | private AtomicLong currentWindowSize = new AtomicLong(0); 77 | 78 | /* Window handler. */ 79 | private WindowHandler windowHandler; 80 | 81 | @Override 82 | public void init(Config config, TaskContext taskContext) throws Exception { 83 | this.config = config; 84 | 85 | initOperator(FreshetOperatorType.WINDOW); 86 | 87 | String range = config.get(Constants.CONF_WINDOW_RANGE, Constants.CONST_STR_UNDEFINED); 88 | if (!range.equals(Constants.CONST_STR_UNDEFINED)) { 89 | this.range = Long.valueOf(range); 90 | 91 | String slotSize = config.get(Constants.CONF_WINDOW_RANGE_SLOT_SIZE, Constants.CONST_STR_UNDEFINED); 92 | if (!slotSize.equals(Constants.CONST_STR_UNDEFINED)) { 93 | this.slotSize = Long.valueOf(slotSize); 94 | } else { 95 | this.slotSize = this.range; 96 | } 97 | 98 | timeBasedWindow(true); 99 | } 100 | 101 | String rows = config.get(Constants.CONF_WINDOW_ROWS, Constants.CONST_STR_UNDEFINED); 102 | if (!rows.equals(Constants.CONST_STR_UNDEFINED) && range.equals(Constants.CONST_STR_UNDEFINED)) { 103 | this.rows = Integer.valueOf(rows); 104 | timeBasedWindow(false); 105 | } else { 106 | timeBasedWindow(true); 107 | log.warn(Constants.WARN_BOTH_ROWS_AND_RANGE_DEFINED); 108 | } 109 | 110 | this.store = (KeyValueStore) taskContext.getStore("windowing-synopses"); 111 | this.metadataStore = (KeyValueStore) taskContext.getStore("windowing-metadata"); 112 | 113 | // TODO: Implement time based sliding window handler. 114 | if(this.tupleBased && !this.timeBased){ 115 | this.windowHandler = new TupleBasedSlidingWindowHandler(this.rows, store, metadataStore, this.system); 116 | } 117 | 118 | this.windowSizeGauge = taskContext.getMetricsRegistry().newGauge(getClass().getName(), "window-size", 0); 119 | } 120 | 121 | @Override 122 | public void process(IncomingMessageEnvelope incomingMessageEnvelope, 123 | MessageCollector messageCollector, TaskCoordinator taskCoordinator) throws Exception { 124 | windowHandler.handle((StreamElement) incomingMessageEnvelope.getMessage(), messageCollector); 125 | } 126 | 127 | private void timeBasedWindow(boolean b) { 128 | if (b) { 129 | this.timeBased = true; 130 | this.tupleBased = false; 131 | } else { 132 | this.timeBased = false; 133 | this.tupleBased = true; 134 | } 135 | } 136 | 137 | public interface WindowHandler { 138 | public void handle(StreamElement streamElement, MessageCollector messageCollector); 139 | } 140 | 141 | public class TupleBasedSlidingWindowHandler implements WindowHandler { 142 | private int maxSize; 143 | private KeyValueStore metadataStore; 144 | private KeyValueStore store; 145 | private KVStorageBackedEvictingQueue evictingQueue; 146 | private String system; 147 | 148 | public TupleBasedSlidingWindowHandler(int maxSize, 149 | KeyValueStore store, 150 | KeyValueStore metadataStore, 151 | String system) { 152 | this.maxSize = maxSize; 153 | this.metadataStore = metadataStore; 154 | this.store = store; 155 | this.evictingQueue = new KVStorageBackedEvictingQueue(maxSize, this.store, this.metadataStore); 156 | this.system = system; 157 | } 158 | 159 | public void handle(StreamElement streamElement, MessageCollector messageCollector) { 160 | log.info("Incoming stream element id: " + streamElement.getId()); 161 | log.info("Incoming stream element titles: " + streamElement.getStringField("title")); 162 | StreamElement evicted = evictingQueue.add(streamElement.getId(), streamElement); 163 | if (evicted != null) { 164 | /* Sending element deleted from window to down stream for processing. 165 | * Need to set delete property to of StreamElement true. */ 166 | evicted.setDelete(true); 167 | messageCollector.send(new OutgoingMessageEnvelope(new SystemStream(system, downStreamTopic), 168 | evicted.getId(), evicted)); 169 | } 170 | 171 | /* Sending insert to window element to down stream for processing. */ 172 | streamElement.setDelete(false); 173 | messageCollector.send(new OutgoingMessageEnvelope(new SystemStream(system, downStreamTopic), 174 | streamElement)); 175 | } 176 | } 177 | } 178 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/aggregate/AggregateFunction.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators.aggregate; 19 | 20 | import org.apache.samza.task.MessageCollector; 21 | import org.pathirage.freshet.data.StreamDefinition; 22 | import org.pathirage.freshet.data.StreamElement; 23 | 24 | import java.util.Map; 25 | 26 | public abstract class AggregateFunction { 27 | 28 | protected AggregateType type; 29 | 30 | protected String field; 31 | 32 | protected String alias; 33 | 34 | protected Map inputStreamDefs; 35 | 36 | public void handle(String stream, String key, StreamElement streamElement, MessageCollector messageCollector){} 37 | } 38 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/aggregate/AggregateFunctionFactory.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators.aggregate; 19 | 20 | import org.pathirage.freshet.Constants; 21 | import org.pathirage.freshet.FreshetException; 22 | import org.pathirage.freshet.data.StreamDefinition; 23 | import org.pathirage.freshet.utils.Utilities; 24 | 25 | import java.util.Map; 26 | 27 | public class AggregateFunctionFactory { 28 | public static AggregateFunction buildAggregateFunction(String config, Map inputStreamDefs){ 29 | Map aggregateConfig = Utilities.parseMap(config); 30 | 31 | AggregateType type = AggregateType.valueOf(aggregateConfig.get(Constants.CONF_AGGREGATE_TYPE)); 32 | String field = aggregateConfig.get(Constants.CONF_AGGREGATE_FIELD); 33 | String alias = aggregateConfig.get(Constants.CONF_AGGREGATE_ALIAS); 34 | 35 | switch (type) { 36 | case AVG: 37 | return new Average(field, alias, inputStreamDefs); 38 | case SUM: 39 | return new Sum(field, alias, inputStreamDefs); 40 | case MAX: 41 | return new Max(field, alias, inputStreamDefs); 42 | case MIN: 43 | return new Min(field, alias, inputStreamDefs); 44 | case COUNT: 45 | return new Count(field, alias, inputStreamDefs); 46 | default: 47 | throw new FreshetException("Unsupported aggregate type."); 48 | } 49 | } 50 | } -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/aggregate/AggregateType.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators.aggregate; 19 | 20 | public enum AggregateType { 21 | SUM, 22 | AVG, 23 | MIN, 24 | MAX, 25 | COUNT, 26 | COUNT_DISTINCT 27 | } 28 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/aggregate/Average.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators.aggregate; 19 | 20 | import org.apache.samza.task.MessageCollector; 21 | import org.pathirage.freshet.data.StreamDefinition; 22 | import org.pathirage.freshet.data.StreamElement; 23 | 24 | import java.util.Map; 25 | 26 | public class Average extends AggregateFunction { 27 | 28 | public Average(String field, String alias, Map inputStreamDefs){ 29 | this.field = field; 30 | this.alias = alias; 31 | this.type = AggregateType.AVG; 32 | this.inputStreamDefs = inputStreamDefs; 33 | } 34 | 35 | @Override 36 | public void handle(String stream, String key, StreamElement streamElement, MessageCollector messageCollector) { 37 | 38 | } 39 | } -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/aggregate/Count.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators.aggregate; 19 | 20 | import org.apache.samza.task.MessageCollector; 21 | import org.pathirage.freshet.data.StreamDefinition; 22 | import org.pathirage.freshet.data.StreamElement; 23 | 24 | import java.util.Map; 25 | 26 | public class Count extends AggregateFunction{ 27 | 28 | public Count(String field, String alias, Map inputStreamDefs){ 29 | this.field = field; 30 | this.alias = alias; 31 | this.type = AggregateType.COUNT; 32 | this.inputStreamDefs = inputStreamDefs; 33 | } 34 | 35 | @Override 36 | public void handle(String stream, String key, StreamElement streamElement, MessageCollector messageCollector) { 37 | 38 | } 39 | } 40 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/aggregate/Max.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators.aggregate; 19 | 20 | import org.apache.samza.task.MessageCollector; 21 | import org.pathirage.freshet.data.StreamDefinition; 22 | import org.pathirage.freshet.data.StreamElement; 23 | 24 | import java.util.Map; 25 | 26 | public class Max extends AggregateFunction { 27 | 28 | public Max(String field, String alias, Map inputStreamDefs){ 29 | this.field = field; 30 | this.alias = alias; 31 | this.type = AggregateType.MAX; 32 | this.inputStreamDefs = inputStreamDefs; 33 | } 34 | 35 | @Override 36 | public void handle(String stream, String key, StreamElement streamElement, MessageCollector messageCollector) { 37 | 38 | } 39 | } 40 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/aggregate/Min.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators.aggregate; 19 | 20 | import org.apache.samza.task.MessageCollector; 21 | import org.pathirage.freshet.data.StreamDefinition; 22 | import org.pathirage.freshet.data.StreamElement; 23 | 24 | import java.util.Map; 25 | 26 | public class Min extends AggregateFunction { 27 | 28 | public Min(String field, String alias, Map inputStreamDefs){ 29 | this.field = field; 30 | this.alias = alias; 31 | this.type = AggregateType.MIN; 32 | this.inputStreamDefs = inputStreamDefs; 33 | } 34 | 35 | @Override 36 | public void handle(String stream, String key, StreamElement streamElement, MessageCollector messageCollector) { 37 | 38 | } 39 | } 40 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/aggregate/Sum.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators.aggregate; 19 | 20 | import org.apache.samza.storage.kv.KeyValueStore; 21 | import org.apache.samza.task.MessageCollector; 22 | import org.pathirage.freshet.data.StreamDefinition; 23 | import org.pathirage.freshet.data.StreamElement; 24 | 25 | import java.util.Map; 26 | 27 | public class Sum extends AggregateFunction{ 28 | 29 | private KeyValueStore sumStore; 30 | 31 | public Sum(String field, String alias, Map inputStreamDefs){ 32 | this.field = field; 33 | this.alias = alias; 34 | this.type = AggregateType.SUM; 35 | this.inputStreamDefs = inputStreamDefs; 36 | } 37 | 38 | @Override 39 | public void handle(String stream, String key, StreamElement streamElement, MessageCollector messageCollector) { 40 | 41 | } 42 | } 43 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/select/Expression.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators.select; 19 | 20 | 21 | import org.codehaus.jackson.annotate.JsonIgnore; 22 | 23 | public class Expression { 24 | private ExpressionType type; 25 | 26 | // Not null if expression is a predicate 27 | private PredicateType predicate; 28 | 29 | // Not null if expression is numerical 30 | private OperatorType operator; 31 | 32 | // Not null if expression is a field 33 | private String field; 34 | 35 | // Not null if expression is a value 36 | private Object value; 37 | 38 | // Not null if if binary or unary predicate 39 | private Expression lhs; 40 | 41 | // Not null if binary predicate 42 | private Expression rhs; 43 | 44 | public Expression(){ 45 | this.type = null; 46 | this.predicate = null; 47 | this.field = null; 48 | this.value = null; 49 | this.lhs = null; 50 | this.rhs = null; 51 | } 52 | public Expression(ExpressionType type){ 53 | this.type = type; 54 | this.predicate = null; 55 | this.field = null; 56 | this.value = null; 57 | this.lhs = null; 58 | this.rhs = null; 59 | } 60 | 61 | public ExpressionType getType() { 62 | return type; 63 | } 64 | 65 | public void setType(ExpressionType type) { 66 | this.type = type; 67 | } 68 | 69 | @JsonIgnore 70 | public boolean isPredicate(){ 71 | return type == ExpressionType.PREDICATE; 72 | } 73 | 74 | @JsonIgnore 75 | public boolean isField(){ 76 | return type == ExpressionType.FIELD; 77 | } 78 | 79 | @JsonIgnore 80 | public boolean isValue(){ 81 | return type == ExpressionType.VALUE; 82 | } 83 | 84 | @JsonIgnore 85 | public boolean isNumerical() { 86 | return type == ExpressionType.NUMERICAL; 87 | } 88 | 89 | public PredicateType getPredicate() { 90 | return predicate; 91 | } 92 | 93 | public OperatorType getOperator() { 94 | return operator; 95 | } 96 | 97 | public void setOperator(OperatorType operator) { 98 | this.operator = operator; 99 | } 100 | 101 | public void setPredicate(PredicateType predicate) { 102 | this.predicate = predicate; 103 | } 104 | 105 | public String getField() { 106 | return field; 107 | } 108 | 109 | public void setField(String field) { 110 | this.field = field; 111 | } 112 | 113 | public Object getValue() { 114 | return value; 115 | } 116 | 117 | public void setValue(Object value) { 118 | this.value = value; 119 | } 120 | 121 | public Expression getLhs() { 122 | return lhs; 123 | } 124 | 125 | public void setLhs(Expression lhs) { 126 | this.lhs = lhs; 127 | } 128 | 129 | public Expression getRhs() { 130 | return rhs; 131 | } 132 | 133 | public void setRhs(Expression rhs) { 134 | this.rhs = rhs; 135 | } 136 | } 137 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/select/ExpressionEvaluator.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators.select; 19 | 20 | import org.pathirage.freshet.data.StreamDefinition; 21 | import org.pathirage.freshet.data.StreamElement; 22 | 23 | /** 24 | * Evaluate boolean expressions used in WHERE clause. 25 | */ 26 | public class ExpressionEvaluator { 27 | 28 | public boolean evalPredicate(StreamElement se, 29 | StreamDefinition streamDefinition, 30 | Expression expression) { 31 | if (expression.isPredicate()) { 32 | PredicateType predicateType = expression.getPredicate(); 33 | 34 | if(predicateType == PredicateType.AND || predicateType == PredicateType.OR || predicateType == PredicateType.NOT){ 35 | Expression lhs = expression.getLhs(); 36 | Expression rhs = expression.getRhs(); 37 | 38 | if(lhs.isField() || lhs.isNumerical() || lhs.isValue() || 39 | rhs.isField() || rhs.isNumerical() || rhs.isValue()){ 40 | throw new ExpressionEvaluationException("Unsupported operands for operator: " + predicateType); 41 | } 42 | } 43 | 44 | if (predicateType == PredicateType.AND) { 45 | return evalPredicate(se, streamDefinition, expression.getLhs()) && evalPredicate(se, streamDefinition, expression.getRhs()); 46 | } else if (predicateType == PredicateType.OR) { 47 | return evalPredicate(se, streamDefinition, expression.getLhs()) || evalPredicate(se, streamDefinition, expression.getRhs()); 48 | } else if (predicateType == PredicateType.NOT) { 49 | return !evalPredicate(se, streamDefinition, expression.getLhs()); 50 | } else if (predicateType == PredicateType.EQUAL) { 51 | return compare(se, streamDefinition, expression.getLhs(), expression.getRhs()) == 0; 52 | } else if (predicateType == PredicateType.NOT_EQUAL){ 53 | return compare(se, streamDefinition, expression.getLhs(), expression.getRhs()) != 0; 54 | } else if (predicateType == PredicateType.GREATER_THAN){ 55 | return compare(se, streamDefinition, expression.getLhs(), expression.getRhs()) > 0; 56 | } else if (predicateType == PredicateType.LESS_THAN){ 57 | return compare(se, streamDefinition, expression.getLhs(), expression.getRhs()) < 0; 58 | } else if (predicateType == PredicateType.GREATER_THAN_OR_EQUAL){ 59 | return compare(se, streamDefinition, expression.getLhs(), expression.getRhs()) >= 0; 60 | } else if (predicateType == PredicateType.LESS_THAN_OR_EQUAL){ 61 | return compare(se, streamDefinition, expression.getLhs(), expression.getRhs()) <= 0; 62 | } 63 | } else { 64 | throw new ExpressionEvaluationException("Expression type " + expression.getType() + " is not valid at this state."); 65 | } 66 | 67 | return false; 68 | } 69 | 70 | public double compare(StreamElement se, StreamDefinition streamDefinition, Expression lhs, Expression rhs){ 71 | if(lhs == null || rhs == null){ 72 | throw new ExpressionEvaluationException("Compare operator requires two operands. lhs: " + lhs + " rhs: " + rhs); 73 | } 74 | 75 | Object lhsValue = evalExpValue(se, streamDefinition, lhs); 76 | Object rhsValue = evalExpValue(se, streamDefinition, rhs); 77 | 78 | if(lhsValue instanceof String || rhsValue instanceof String){ 79 | return (lhsValue.equals(rhsValue) ? 0 : -1); 80 | } else if(lhsValue instanceof Number && rhsValue instanceof Number){ 81 | return ((Number)lhsValue).doubleValue() - ((Number)rhsValue).doubleValue(); 82 | } else if(lhsValue instanceof Boolean && rhsValue instanceof Boolean) { 83 | if((Boolean)lhsValue == (Boolean)rhsValue){ 84 | return 0; 85 | } else { 86 | return -1; 87 | } 88 | } else { 89 | throw new ExpressionEvaluationException("Unsupported expression."); 90 | } 91 | } 92 | 93 | public Double evalNumericalExpression(StreamElement se, StreamDefinition streamDefinition, Expression expression){ 94 | if(expression == null || expression.getOperator() == null){ 95 | throw new ExpressionEvaluationException("Undefined expression or empty operator."); 96 | } 97 | 98 | OperatorType operator = expression.getOperator(); 99 | 100 | Object lhsValue = evalExpValue(se, streamDefinition, expression.getLhs()); 101 | Object rhsValue = evalExpValue(se, streamDefinition, expression.getRhs()); 102 | 103 | if(!(lhsValue instanceof Double) || !(rhsValue instanceof Double)){ 104 | throw new ExpressionEvaluationException("At lease one operand is not a number."); 105 | } 106 | 107 | if(operator == OperatorType.PLUS){ 108 | return (Double)lhsValue + (Double)rhsValue; 109 | } else if (operator == OperatorType.MINUS) { 110 | return (Double)lhsValue - (Double)rhsValue; 111 | } else if (operator == OperatorType.MULTIPLY){ 112 | return (Double)lhsValue * (Double)rhsValue; 113 | }else if (operator == OperatorType.DIVIDE){ 114 | return (Double)lhsValue / (Double)rhsValue; 115 | } else { 116 | throw new ExpressionEvaluationException("Unsupported operator: " + operator); 117 | } 118 | } 119 | 120 | public Object evalExpValue(StreamElement se, StreamDefinition streamDefinition, Expression expression) { 121 | if(expression == null){ 122 | throw new ExpressionEvaluationException("Empty expression."); 123 | } 124 | 125 | if (expression.isField()) { 126 | String fieldName = expression.getField(); 127 | 128 | if (fieldName == null || !streamDefinition.isValidField(fieldName)){ 129 | throw new ExpressionEvaluationException("Unknown field: " + fieldName); 130 | } 131 | 132 | StreamDefinition.FieldType fieldType = streamDefinition.getType(fieldName); 133 | 134 | if (fieldType == StreamDefinition.FieldType.STRING) { 135 | return se.getStringField(fieldName); 136 | } else if (fieldType == StreamDefinition.FieldType.INTEGER) { 137 | return se.getIntegerField(fieldName).doubleValue(); 138 | } else if (fieldType == StreamDefinition.FieldType.LONG) { 139 | return se.getLongField(fieldName).doubleValue(); 140 | } else if (fieldType == StreamDefinition.FieldType.BOOLEAN) { 141 | return se.getBoolField(fieldName); 142 | } else if (fieldType == StreamDefinition.FieldType.FLOAT) { 143 | return se.getFloatField(fieldName).doubleValue(); 144 | } else { 145 | throw new ExpressionEvaluationException("Unsupported field type " + fieldType + "!"); 146 | } 147 | } else if (expression.isValue()) { 148 | return expression.getValue(); 149 | } else if(expression.isNumerical()) { 150 | return evalNumericalExpression(se, streamDefinition, expression); 151 | } else { 152 | throw new ExpressionEvaluationException("Unsupported value expression type " + expression.getType()); 153 | } 154 | } 155 | 156 | public class ExpressionEvaluationException extends RuntimeException { 157 | public ExpressionEvaluationException() { 158 | super(); 159 | } 160 | 161 | public ExpressionEvaluationException(String message) { 162 | super(message); 163 | } 164 | 165 | public ExpressionEvaluationException(String message, Throwable cause) { 166 | super(message, cause); 167 | } 168 | 169 | public ExpressionEvaluationException(Throwable cause) { 170 | super(cause); 171 | } 172 | 173 | protected ExpressionEvaluationException(String message, Throwable cause, boolean enableSuppression, boolean writableStackTrace) { 174 | super(message, cause, enableSuppression, writableStackTrace); 175 | } 176 | } 177 | } 178 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/select/ExpressionType.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators.select; 19 | 20 | public enum ExpressionType { 21 | PREDICATE, 22 | NUMERICAL, 23 | FIELD, 24 | VALUE 25 | } 26 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/select/OperatorType.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators.select; 19 | 20 | public enum OperatorType { 21 | PLUS, 22 | MINUS, 23 | MULTIPLY, 24 | DIVIDE 25 | } 26 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/operators/select/PredicateType.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.operators.select; 19 | 20 | public enum PredicateType { 21 | AND, 22 | OR, 23 | EQUAL, 24 | NOT_EQUAL, 25 | GREATER_THAN, 26 | LESS_THAN, 27 | GREATER_THAN_OR_EQUAL, 28 | LESS_THAN_OR_EQUAL, 29 | NOT 30 | } 31 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/package-info.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | package org.pathirage.freshet; -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/serde/AvroSerde.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.serde; 19 | 20 | import org.apache.avro.Schema; 21 | import org.apache.avro.generic.GenericDatumReader; 22 | import org.apache.avro.generic.GenericDatumWriter; 23 | import org.apache.avro.generic.GenericRecord; 24 | import org.apache.avro.io.DecoderFactory; 25 | import org.apache.avro.io.EncoderFactory; 26 | import org.apache.samza.serializers.Serde; 27 | import org.slf4j.Logger; 28 | import org.slf4j.LoggerFactory; 29 | 30 | import java.io.ByteArrayOutputStream; 31 | import java.io.IOException; 32 | 33 | public class AvroSerde implements Serde { 34 | private static final Logger log = LoggerFactory.getLogger(AvroSerde.class); 35 | 36 | private Schema avroSchema; 37 | 38 | public AvroSerde(Schema avroSchema) { 39 | this.avroSchema = avroSchema; 40 | } 41 | 42 | @Override 43 | public GenericRecord fromBytes(byte[] bytes) { 44 | GenericDatumReader serveReader = new GenericDatumReader(avroSchema); 45 | try { 46 | return serveReader.read(null, DecoderFactory.get().binaryDecoder(bytes, null)); 47 | } catch (IOException e) { 48 | log.error("Cannot deserialize byte array to GenericRecord."); 49 | return null; 50 | } 51 | } 52 | 53 | @Override 54 | public byte[] toBytes(GenericRecord genericRecord) { 55 | GenericDatumWriter serveWriter = new GenericDatumWriter(avroSchema); 56 | ByteArrayOutputStream out = new ByteArrayOutputStream(); 57 | try { 58 | serveWriter.write(genericRecord, EncoderFactory.get().binaryEncoder(out, null)); 59 | return out.toByteArray(); 60 | } catch (IOException e) { 61 | log.error("Cannot serialize GenericRecord."); 62 | } 63 | return new byte[0]; 64 | } 65 | } 66 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/serde/AvroSerdeFactory.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.serde; 19 | 20 | import org.apache.avro.Schema; 21 | import org.apache.avro.generic.GenericRecord; 22 | import org.apache.samza.config.Config; 23 | import org.apache.samza.serializers.Serde; 24 | import org.apache.samza.serializers.SerdeFactory; 25 | import org.pathirage.freshet.Constants; 26 | import org.pathirage.freshet.FreshetException; 27 | import org.slf4j.Logger; 28 | import org.slf4j.LoggerFactory; 29 | 30 | public class AvroSerdeFactory implements SerdeFactory { 31 | private static final Logger log = LoggerFactory.getLogger(AvroSerdeFactory.class); 32 | 33 | private Schema inputStreamAvroSchema; 34 | 35 | @Override 36 | public Serde getSerde(String s, Config config) { 37 | String schemaStr = config.get(Constants.CONF_STREAM_AVRO_SCHEMA, Constants.CONST_STR_UNDEFINED); 38 | 39 | if(!schemaStr.equals(Constants.CONST_STR_UNDEFINED)){ 40 | this.inputStreamAvroSchema = new Schema.Parser().parse(schemaStr); 41 | } else { 42 | String errMsg = "Cannot find Avro schema for stream elements."; 43 | log.error(errMsg); 44 | throw new FreshetException(errMsg); 45 | } 46 | 47 | return new AvroSerde(inputStreamAvroSchema); 48 | } 49 | } 50 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/serde/QueueNodeSerde.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.serde; 19 | 20 | import com.esotericsoftware.kryo.Kryo; 21 | import com.esotericsoftware.kryo.io.Input; 22 | import com.esotericsoftware.kryo.io.Output; 23 | import org.apache.samza.serializers.Serde; 24 | import org.pathirage.freshet.utils.QueueNode; 25 | 26 | import java.io.ByteArrayOutputStream; 27 | 28 | public class QueueNodeSerde implements Serde { 29 | private Kryo kryo; 30 | 31 | public QueueNodeSerde(Kryo kryo){ 32 | this.kryo = kryo; 33 | } 34 | 35 | @Override 36 | public QueueNode fromBytes(byte[] bytes) { 37 | Input input = new Input(bytes); 38 | return kryo.readObject(input, QueueNode.class); 39 | } 40 | 41 | @Override 42 | public byte[] toBytes(QueueNode queueNode) { 43 | Output output = new Output(new ByteArrayOutputStream(2048)); 44 | kryo.writeObject(output, queueNode); 45 | 46 | return output.toBytes(); 47 | } 48 | } 49 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/serde/QueueNodeSerdeFactory.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.serde; 19 | 20 | import com.esotericsoftware.kryo.Kryo; 21 | import org.apache.samza.config.Config; 22 | import org.apache.samza.serializers.Serde; 23 | import org.apache.samza.serializers.SerdeFactory; 24 | import org.pathirage.freshet.data.StreamElement; 25 | import org.pathirage.freshet.utils.QueueNode; 26 | 27 | public class QueueNodeSerdeFactory implements SerdeFactory{ 28 | private static Kryo kryo = new Kryo(); 29 | 30 | static { 31 | kryo.register(QueueNode.class); 32 | kryo.register(StreamElement.class); 33 | } 34 | 35 | @Override 36 | public Serde getSerde(String s, Config config) { 37 | return new QueueNodeSerde(kryo); 38 | } 39 | } 40 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/serde/StreamElementSerde.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.serde; 19 | 20 | import com.esotericsoftware.kryo.Kryo; 21 | import com.esotericsoftware.kryo.io.Input; 22 | import com.esotericsoftware.kryo.io.Output; 23 | import org.apache.samza.serializers.Serde; 24 | import org.pathirage.freshet.data.StreamElement; 25 | 26 | import java.io.ByteArrayOutputStream; 27 | 28 | public class StreamElementSerde implements Serde { 29 | 30 | private Kryo kryo; 31 | 32 | public StreamElementSerde(Kryo kryo){ 33 | this.kryo = kryo; 34 | } 35 | 36 | @Override 37 | public StreamElement fromBytes(byte[] bytes) { 38 | Input input = new Input(bytes); 39 | return kryo.readObject(input, StreamElement.class); 40 | } 41 | 42 | @Override 43 | public byte[] toBytes(StreamElement streamElement) { 44 | Output output = new Output(new ByteArrayOutputStream(1024)); 45 | kryo.writeObject(output, streamElement); 46 | 47 | return output.toBytes(); 48 | } 49 | } 50 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/serde/StreamElementSerdeFactory.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.serde; 19 | 20 | import com.esotericsoftware.kryo.Kryo; 21 | import org.apache.samza.config.Config; 22 | import org.apache.samza.serializers.Serde; 23 | import org.apache.samza.serializers.SerdeFactory; 24 | import org.pathirage.freshet.data.StreamElement; 25 | 26 | public class StreamElementSerdeFactory implements SerdeFactory { 27 | 28 | private static Kryo kryo = new Kryo(); 29 | 30 | static { 31 | kryo.register(StreamElement.class); 32 | } 33 | 34 | @Override 35 | public Serde getSerde(String s, Config config) { 36 | return new StreamElementSerde(kryo); 37 | } 38 | } 39 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/utils/ExpressionSerde.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.utils; 19 | 20 | import org.codehaus.jackson.map.ObjectMapper; 21 | import org.pathirage.freshet.operators.select.Expression; 22 | import org.pathirage.freshet.operators.select.ExpressionType; 23 | import org.pathirage.freshet.operators.select.PredicateType; 24 | 25 | import java.io.IOException; 26 | import java.io.StringWriter; 27 | 28 | /** 29 | * Serialize/deserialize expressions to/from JSON. 30 | */ 31 | public class ExpressionSerde { 32 | 33 | public static String serialize(Expression expression) throws IOException { 34 | ObjectMapper objectMapper = new ObjectMapper(); 35 | 36 | StringWriter sw = new StringWriter(); 37 | objectMapper.writeValue(sw, expression); 38 | 39 | return sw.toString(); 40 | } 41 | 42 | public static Expression deserialize(String expression) throws IOException { 43 | ObjectMapper objectMapper = new ObjectMapper(); 44 | 45 | return objectMapper.readValue(expression, Expression.class); 46 | } 47 | 48 | public static void main(String[] args) throws IOException { 49 | String exp = "{\"type\":\"PREDICATE\",\"predicate\":\"EQUAL\",\"field\":null,\"value\":null,\"lhs\":{\"type\":\"FIELD\",\"predicate\":null,\"field\":\"name\",\"value\":null,\"lhs\":null,\"rhs\":null},\"rhs\":{\"type\":\"VALUE\",\"predicate\":null,\"field\":null,\"value\":\"Milinda\",\"lhs\":null,\"rhs\":null}}\n"; 50 | Expression test = new Expression(ExpressionType.PREDICATE); 51 | test.setPredicate(PredicateType.EQUAL); 52 | 53 | Expression lhs = new Expression(ExpressionType.FIELD); 54 | lhs.setField("age"); 55 | test.setLhs(lhs); 56 | 57 | Expression rhs = new Expression(ExpressionType.VALUE); 58 | rhs.setValue(20); 59 | test.setRhs(rhs); 60 | 61 | System.out.println(ExpressionSerde.serialize(test)); 62 | 63 | 64 | Expression e = ExpressionSerde.deserialize(exp); 65 | System.out.println(ExpressionSerde.serialize(e)); 66 | } 67 | } 68 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/utils/KVStorageBackedEvictingQueue.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.utils; 19 | 20 | import org.apache.samza.storage.kv.KeyValueStore; 21 | import org.pathirage.freshet.data.StreamElement; 22 | 23 | import java.util.concurrent.atomic.AtomicInteger; 24 | 25 | public class KVStorageBackedEvictingQueue { 26 | 27 | private static final String CONST_HEAD = "kvstoragebacked-eq-head"; 28 | private static final String CONST_TAIL = "kvstoragebacked-eq-tail"; 29 | private static final String CONST_SIZE = "kvstoragebacked-eq-size"; 30 | private static final String CONST_MAX_SIZE = "kvstoragebacked-eq-max-size"; 31 | private static final String CONST_UNDEFINED = "kvstoragebacked-undefined"; 32 | 33 | private KeyValueStore metadataStore; 34 | private KeyValueStore store; 35 | private int maxSize; 36 | private AtomicInteger size; 37 | 38 | public KVStorageBackedEvictingQueue(int maxSize, 39 | KeyValueStore store, 40 | KeyValueStore metadataStore) { 41 | this.metadataStore = metadataStore; 42 | this.store = store; 43 | 44 | String persistedMaxSize = this.metadataStore.get(CONST_MAX_SIZE); 45 | if (persistedMaxSize != null) { 46 | this.maxSize = Integer.valueOf(persistedMaxSize); 47 | } else { 48 | this.maxSize = maxSize; 49 | this.metadataStore.put(CONST_MAX_SIZE, Integer.toString(maxSize)); 50 | } 51 | 52 | String size = this.metadataStore.get(CONST_SIZE); 53 | if (size != null) { 54 | this.size = new AtomicInteger(Integer.valueOf(size)); 55 | } else { 56 | this.size = new AtomicInteger(0); 57 | this.metadataStore.put(CONST_SIZE, Integer.toString(0)); 58 | } 59 | 60 | if (this.size.equals(new AtomicInteger(0))) { 61 | this.metadataStore.put(CONST_HEAD, CONST_UNDEFINED); 62 | this.metadataStore.put(CONST_TAIL, CONST_UNDEFINED); 63 | } 64 | } 65 | 66 | public StreamElement add(String key, StreamElement value) { 67 | int newSize; 68 | QueueNode head; 69 | QueueNode tail; 70 | 71 | String headKey = metadataStore.get(CONST_HEAD); 72 | String tailKey = metadataStore.get(CONST_TAIL); 73 | 74 | QueueNode newElement = new QueueNode(); 75 | newElement.setValue(value); 76 | 77 | if(headKey != null) { 78 | newElement.setNext(headKey); 79 | } else { 80 | newElement.setNext(CONST_UNDEFINED); 81 | } 82 | 83 | newElement.setPrev(CONST_UNDEFINED); 84 | 85 | if(headKey != null) { 86 | // Update the old head prev ot point new head 87 | head = store.get(headKey); 88 | head.setPrev(key); 89 | 90 | // Put the old head with updated info 91 | store.put(headKey, head); 92 | } 93 | 94 | // Put the new head. 95 | store.put(key, newElement); 96 | 97 | // Update metadata 98 | metadataStore.put(CONST_HEAD, key); 99 | if(tailKey == null){ 100 | // In a empty queue, newly added head becomes the tail. 101 | metadataStore.put(CONST_TAIL, key); 102 | } 103 | 104 | if(size.get() < maxSize){ 105 | newSize = size.incrementAndGet(); 106 | metadataStore.put(CONST_SIZE, Integer.toString(newSize)); 107 | 108 | return null; 109 | } else { 110 | tail = store.get(tailKey); 111 | 112 | // Old tail's prev becomes new tail 113 | metadataStore.put(CONST_TAIL, tail.getPrev()); 114 | 115 | // Delete the old tail 116 | store.delete(tailKey); 117 | 118 | return tail.getValue(); 119 | } 120 | } 121 | } 122 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/utils/QueueNode.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.utils; 19 | 20 | 21 | import org.pathirage.freshet.data.StreamElement; 22 | 23 | public class QueueNode { 24 | private String next; 25 | private String prev; 26 | private StreamElement value; 27 | 28 | 29 | public String getNext() { 30 | return next; 31 | } 32 | 33 | public void setNext(String next) { 34 | this.next = next; 35 | } 36 | 37 | public String getPrev() { 38 | return prev; 39 | } 40 | 41 | public void setPrev(String prev) { 42 | this.prev = prev; 43 | } 44 | 45 | public StreamElement getValue() { 46 | return value; 47 | } 48 | 49 | public void setValue(StreamElement value) { 50 | this.value = value; 51 | } 52 | } -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/utils/Utilities.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.utils; 19 | 20 | import com.google.common.base.Splitter; 21 | 22 | import java.util.Map; 23 | 24 | public class Utilities { 25 | public static Map parseMap(String formattedMap) { 26 | return Splitter.on(",").withKeyValueSeparator("=").split(formattedMap); 27 | } 28 | } 29 | 30 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/utils/WikipediaFeedStreamTask.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.utils; 19 | 20 | import org.apache.samza.system.IncomingMessageEnvelope; 21 | import org.apache.samza.system.OutgoingMessageEnvelope; 22 | import org.apache.samza.system.SystemStream; 23 | import org.apache.samza.task.MessageCollector; 24 | import org.apache.samza.task.StreamTask; 25 | import org.apache.samza.task.TaskCoordinator; 26 | import org.pathirage.freshet.data.StreamElement; 27 | import org.pathirage.freshet.utils.system.WikipediaFeed; 28 | import org.slf4j.Logger; 29 | import org.slf4j.LoggerFactory; 30 | 31 | import java.util.Date; 32 | import java.util.HashMap; 33 | import java.util.Map; 34 | import java.util.UUID; 35 | import java.util.regex.Matcher; 36 | import java.util.regex.Pattern; 37 | 38 | public class WikipediaFeedStreamTask implements StreamTask { 39 | private static Logger log = LoggerFactory.getLogger(WikipediaFeedStreamTask.class); 40 | 41 | private static final SystemStream OUTPUT_STREAM = new SystemStream("kafka", "wikipedia-raw"); 42 | 43 | @Override 44 | public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector, TaskCoordinator taskCoordinator) throws Exception { 45 | StreamElement wikipediaFeedEvent = (StreamElement) incomingMessageEnvelope.getMessage(); 46 | 47 | try { 48 | Map parsedEvent = parse(wikipediaFeedEvent.getStringField("rawEvent")); 49 | 50 | parsedEvent.put("channel", wikipediaFeedEvent.getStringField("channel")); 51 | parsedEvent.put("source", wikipediaFeedEvent.getStringField("source")); 52 | 53 | StreamElement se = new StreamElement(parsedEvent, wikipediaFeedEvent.getLongField("time"), wikipediaFeedEvent.getLongField("time"), UUID.randomUUID().toString()); 54 | 55 | messageCollector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM, se)); 56 | }catch (Exception e){ 57 | log.error("Unable to parse the wikipedia event.", e); 58 | } 59 | } 60 | 61 | public static Map parse(String line) { 62 | Pattern p = Pattern.compile("\\[\\[(.*)\\]\\]\\s(.*)\\s(.*)\\s\\*\\s(.*)\\s\\*\\s\\(\\+?(.\\d*)\\)\\s(.*)"); 63 | Matcher m = p.matcher(line); 64 | 65 | if (m.find() && m.groupCount() == 6) { 66 | String title = m.group(1); 67 | String flags = m.group(2); 68 | String diffUrl = m.group(3); 69 | String user = m.group(4); 70 | int byteDiff = Integer.parseInt(m.group(5)); 71 | String summary = m.group(6); 72 | 73 | Map root = new HashMap(); 74 | 75 | root.put("title", title); 76 | root.put("user", user); 77 | root.put("unparsed-flags", flags); 78 | root.put("diff-bytes", byteDiff); 79 | root.put("diff-url", diffUrl); 80 | root.put("summary", summary); 81 | 82 | root.put("is-minor", flags.contains("M")); 83 | root.put("is-new", flags.contains("N")); 84 | root.put("is-unpatrolled", flags.contains("!")); 85 | root.put("is-bot-edit", flags.contains("B")); 86 | root.put("is-special", title.startsWith("Special:")); 87 | root.put("is-talk", title.startsWith("Talk:")); 88 | 89 | 90 | return root; 91 | } else { 92 | throw new IllegalArgumentException("Illegal event " + line); 93 | } 94 | } 95 | } 96 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/utils/system/WikipediaConsumer.java: -------------------------------------------------------------------------------- 1 | /* 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, 13 | * software distributed under the License is distributed on an 14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | * KIND, either express or implied. See the License for the 16 | * specific language governing permissions and limitations 17 | * under the License. 18 | */ 19 | 20 | package org.pathirage.freshet.utils.system; 21 | 22 | import org.apache.samza.Partition; 23 | import org.apache.samza.metrics.MetricsRegistry; 24 | import org.apache.samza.system.IncomingMessageEnvelope; 25 | import org.apache.samza.system.SystemStreamPartition; 26 | import org.apache.samza.util.BlockingEnvelopeMap; 27 | import org.pathirage.freshet.data.StreamElement; 28 | import org.slf4j.Logger; 29 | import org.slf4j.LoggerFactory; 30 | 31 | import java.util.*; 32 | 33 | public class WikipediaConsumer extends BlockingEnvelopeMap implements WikipediaFeed.WikipediaFeedListener { 34 | private static final Logger log = LoggerFactory.getLogger(WikipediaConsumer.class); 35 | private final List channels; 36 | private final String systemName; 37 | private final WikipediaFeed feed; 38 | 39 | public WikipediaConsumer(String systemName, WikipediaFeed feed, MetricsRegistry registry) { 40 | this.channels = new ArrayList(); 41 | this.systemName = systemName; 42 | this.feed = feed; 43 | } 44 | 45 | public void onEvent(final WikipediaFeed.WikipediaFeedEvent event) { 46 | SystemStreamPartition systemStreamPartition = new SystemStreamPartition(systemName, event.getChannel(), new Partition(0)); 47 | 48 | Map fields = new HashMap(); 49 | fields.put("rawEvent", event.getRawEvent()); 50 | fields.put("channel", event.getChannel()); 51 | fields.put("source", event.getChannel()); 52 | fields.put("time", event.getTime()); 53 | 54 | Date now = new Date(); 55 | 56 | StreamElement se = new StreamElement(fields, now.getTime(), now.getTime(), null); 57 | 58 | try { 59 | put(systemStreamPartition, new IncomingMessageEnvelope(systemStreamPartition, null, null, se)); 60 | } catch (Exception e) { 61 | log.error("Error sending messages downstream.", e); 62 | } 63 | } 64 | 65 | @Override 66 | public void register(SystemStreamPartition systemStreamPartition, String startingOffset) { 67 | super.register(systemStreamPartition, startingOffset); 68 | 69 | channels.add(systemStreamPartition.getStream()); 70 | } 71 | 72 | @Override 73 | public void start() { 74 | feed.start(); 75 | 76 | for (String channel : channels) { 77 | feed.listen(channel, this); 78 | } 79 | } 80 | 81 | @Override 82 | public void stop() { 83 | for (String channel : channels) { 84 | feed.unlisten(channel, this); 85 | } 86 | 87 | feed.stop(); 88 | } 89 | } 90 | -------------------------------------------------------------------------------- /freshet-core/src/java/org/pathirage/freshet/utils/system/WikipediaSystemFactory.java: -------------------------------------------------------------------------------- 1 | /* 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, 13 | * software distributed under the License is distributed on an 14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | * KIND, either express or implied. See the License for the 16 | * specific language governing permissions and limitations 17 | * under the License. 18 | */ 19 | 20 | package org.pathirage.freshet.utils.system; 21 | 22 | import org.apache.samza.SamzaException; 23 | import org.apache.samza.config.Config; 24 | import org.apache.samza.metrics.MetricsRegistry; 25 | import org.apache.samza.system.SystemAdmin; 26 | import org.apache.samza.system.SystemConsumer; 27 | import org.apache.samza.system.SystemFactory; 28 | import org.apache.samza.system.SystemProducer; 29 | import org.apache.samza.util.SinglePartitionWithoutOffsetsSystemAdmin; 30 | 31 | public class WikipediaSystemFactory implements SystemFactory { 32 | @Override 33 | public SystemAdmin getAdmin(String systemName, Config config) { 34 | return new SinglePartitionWithoutOffsetsSystemAdmin(); 35 | } 36 | 37 | @Override 38 | public SystemConsumer getConsumer(String systemName, Config config, MetricsRegistry registry) { 39 | String host = config.get("systems." + systemName + ".host"); 40 | int port = config.getInt("systems." + systemName + ".port"); 41 | WikipediaFeed feed = new WikipediaFeed(host, port); 42 | 43 | return new WikipediaConsumer(systemName, feed, registry); 44 | } 45 | 46 | @Override 47 | public SystemProducer getProducer(String systemName, Config config, MetricsRegistry registry) { 48 | throw new SamzaException("You can't produce to a Wikipedia feed! How about making some edits to a Wiki, instead?"); 49 | } 50 | } 51 | -------------------------------------------------------------------------------- /freshet-core/test/clojure/org/pathirage/freshet/config_test.clj: -------------------------------------------------------------------------------- 1 | (ns org.pathirage.freshet.config-test 2 | (:import [org.apache.samza.config MapConfig]) 3 | (:require [clojure.test :refer :all] 4 | [clojure.java.io :as io])) 5 | 6 | 7 | (deftest properties-file-map-value-test 8 | (testing "Map as a value of a property in propeties file" 9 | (with-open [^java.io.Reader reader (-> "config-test.properties" io/resource io/file io/reader)] 10 | (let [props (java.util.Properties.)] 11 | (.load props reader) 12 | (let [map-config (MapConfig. props) 13 | stream-defs (.subset map-config "org.pathirage.kappaql.input")] 14 | (is (= "name=String,age=Integer" (.get stream-defs ".stream1"))) 15 | (is (= "orderId=String,Quantity=Integer" (.get stream-defs ".stream2")))))))) -------------------------------------------------------------------------------- /freshet-core/test/clojure/org/pathirage/freshet/expressioneval_test.clj: -------------------------------------------------------------------------------- 1 | (ns org.pathirage.freshet.expressioneval-test 2 | (:import (org.pathirage.freshet.operators.select Expression ExpressionType PredicateType ExpressionEvaluator) 3 | (org.pathirage.freshet.data StreamDefinition StreamDefinition$FieldType StreamElement)) 4 | (:require [clojure.test :refer :all] 5 | [clojure.java.io :as io] 6 | [org.pathirage.freshet.utils.expressions :as expressions])) 7 | 8 | 9 | ;; Expression Evaluator Test 10 | ;; 11 | ;; Tasks 12 | ;; - Collect some wikipedia activity streams and build in-memory stream 13 | ;; - Define some where conditions for wikipdeia activities 14 | 15 | 16 | 17 | (def wikipedia-activity-stream-definition 18 | (let [type-map (java.util.HashMap. {"channel" StreamDefinition$FieldType/STRING 19 | "source" StreamDefinition$FieldType/STRING 20 | "time" StreamDefinition$FieldType/LONG 21 | "title" StreamDefinition$FieldType/STRING 22 | "user" StreamDefinition$FieldType/STRING 23 | "diff-bytes" StreamDefinition$FieldType/INTEGER 24 | "diff-url" StreamDefinition$FieldType/STRING 25 | "summary" StreamDefinition$FieldType/STRING 26 | "is-minor" StreamDefinition$FieldType/BOOLEAN 27 | "is-talk" StreamDefinition$FieldType/BOOLEAN 28 | "is-bot-edit" StreamDefinition$FieldType/BOOLEAN 29 | "is-new" StreamDefinition$FieldType/BOOLEAN 30 | "is-unpatrolled" StreamDefinition$FieldType/BOOLEAN 31 | "is-special" StreamDefinition$FieldType/BOOLEAN 32 | "unparsed-flags" StreamDefinition$FieldType/STRING}) 33 | stream-def (StreamDefinition. type-map)] 34 | stream-def)) 35 | 36 | (def test-stream-element-with-diff-bytes->-100 37 | (let [fields (java.util.HashMap. {"channel" "#en.wikipedia" 38 | "source" "rc-pmtpa" 39 | "time" 1415078790283 40 | "title" "Xu Wanquan" 41 | "user" "G503" 42 | "diff-bytes" (Integer. 3205) 43 | "diff-url" "http://en.wikipedia.org/w/index.php?oldid=632381032&rcid=690940421" 44 | "summary" "[[WP:AES|←]]Created page with '{{Infobox football biography | name= Xu Wanquan
许万权 | birth_date = {{birth date and age|1993|4|19}} | birth_place = [[Dalian]], [[Liaoning]], China...'" 45 | "is-minor" false 46 | "is-talk" false 47 | "is-bot-edit" false 48 | "is-new" true 49 | "is-unpatrolled" true 50 | "is-special" false 51 | "unparsed-flags" "!N"}) 52 | stream-element (StreamElement. fields 1415078790283 1415078790283 "Xu Wanquan")] 53 | stream-element)) 54 | 55 | (deftest expression-evaluation-test 56 | (testing "Greater than predicate" 57 | (let [exp-evaluator (ExpressionEvaluator.)] 58 | (is (= 59 | true 60 | (.evalPredicate 61 | exp-evaluator 62 | test-stream-element-with-diff-bytes->-100 63 | wikipedia-activity-stream-definition 64 | expressions/where-diff-bytes->-100))))) 65 | (testing "Equal boolean" 66 | (let [exp-evaluator (ExpressionEvaluator.)] 67 | (is (= 68 | true 69 | (.evalPredicate 70 | exp-evaluator 71 | test-stream-element-with-diff-bytes->-100 72 | wikipedia-activity-stream-definition 73 | expressions/where-is-new-edit))))) 74 | (testing "AND operator" 75 | (let [exp-evaluator (ExpressionEvaluator.)] 76 | (is (= 77 | true 78 | (.evalPredicate 79 | exp-evaluator 80 | test-stream-element-with-diff-bytes->-100 81 | wikipedia-activity-stream-definition 82 | expressions/new-edit-and->100-diff))))) 83 | (testing "< operator" 84 | (let [exp-evaluator (ExpressionEvaluator.)] 85 | (is (= 86 | false 87 | (.evalPredicate 88 | exp-evaluator 89 | test-stream-element-with-diff-bytes->-100 90 | wikipedia-activity-stream-definition 91 | expressions/where-diff-bytes-<-100)))))) 92 | 93 | 94 | 95 | 96 | 97 | 98 | -------------------------------------------------------------------------------- /freshet-core/test/clojure/org/pathirage/freshet/expresssionserde_test.clj: -------------------------------------------------------------------------------- 1 | (ns org.pathirage.freshet.expresssionserde-test 2 | (:import (org.pathirage.freshet.utils ExpressionSerde)) 3 | (:require [clojure.test :refer :all] 4 | [clojure.java.io :as io] 5 | [org.pathirage.freshet.utils.expressions :as expressions])) 6 | 7 | (deftest expression-serde-test 8 | (testing "Expression serializing and deserializing" 9 | (let [serialized-expr (ExpressionSerde/serialize expressions/where-diff-bytes->-100) 10 | expr (ExpressionSerde/deserialize serialized-expr)] 11 | (is (= serialized-expr (ExpressionSerde/serialize expr)))))) 12 | 13 | 14 | -------------------------------------------------------------------------------- /freshet-core/test/clojure/org/pathirage/freshet/helpers/expressions.clj: -------------------------------------------------------------------------------- 1 | (ns org.pathirage.freshet.utils.expressions 2 | (:import (org.pathirage.freshet.operators.select ExpressionType PredicateType Expression))) 3 | 4 | (def where-diff-bytes->-100 5 | (let [lhs (doto (Expression. ExpressionType/FIELD) 6 | (.setField "diff-bytes")) 7 | rhs (doto (Expression. ExpressionType/VALUE) 8 | (.setValue 100))] 9 | (doto (Expression. ExpressionType/PREDICATE) 10 | (.setPredicate PredicateType/GREATER_THAN) 11 | (.setLhs lhs) 12 | (.setRhs rhs)))) 13 | 14 | (def where-diff-bytes-<-100 15 | (let [lhs (doto (Expression. ExpressionType/FIELD) 16 | (.setField "diff-bytes")) 17 | rhs (doto (Expression. ExpressionType/VALUE) 18 | (.setValue 100))] 19 | (doto (Expression. ExpressionType/PREDICATE) 20 | (.setPredicate PredicateType/LESS_THAN) 21 | (.setLhs lhs) 22 | (.setRhs rhs)))) 23 | 24 | (def where-is-new-edit 25 | (let [lhs (doto (Expression. ExpressionType/FIELD) 26 | (.setField "is-new")) 27 | rhs (doto (Expression. ExpressionType/VALUE) 28 | (.setValue true))] 29 | (doto (Expression. ExpressionType/PREDICATE) 30 | (.setPredicate PredicateType/EQUAL) 31 | (.setLhs lhs) 32 | (.setRhs rhs)))) 33 | 34 | (def new-edit-and->100-diff 35 | (doto (Expression. ExpressionType/PREDICATE) 36 | (.setPredicate PredicateType/AND) 37 | (.setLhs where-diff-bytes->-100) 38 | (.setRhs where-is-new-edit))) 39 | -------------------------------------------------------------------------------- /freshet-core/test/java/org/pathirage/freshet/test/Constants.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.test; 19 | 20 | public class Constants { 21 | } 22 | -------------------------------------------------------------------------------- /freshet-core/test/java/org/pathirage/freshet/test/ExpressionEvaluationTestUtils.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.test; 19 | 20 | public class ExpressionEvaluationTestUtils { 21 | } 22 | -------------------------------------------------------------------------------- /freshet-core/test/resources/config-test.properties: -------------------------------------------------------------------------------- 1 | org.pathirage.kappaql.input.stream1=name=String,age=Integer 2 | org.pathirage.kappaql.input.stream2=orderId=String,Quantity=Integer -------------------------------------------------------------------------------- /freshet-dsl/.gitignore: -------------------------------------------------------------------------------- 1 | /target 2 | /classes 3 | /checkouts 4 | pom.xml 5 | pom.xml.asc 6 | *.jar 7 | *.class 8 | /.lein-* 9 | /.nrepl-port 10 | -------------------------------------------------------------------------------- /freshet-dsl/README.md: -------------------------------------------------------------------------------- 1 | # freshet-dsl 2 | 3 | [CQL](http://research.microsoft.com/apps/pubs/default.aspx?id=77607) inspired Clojure DSL for querying streams. 4 | 5 | ## Usage 6 | 7 | FIXME 8 | 9 | ## License 10 | 11 | Copyright © 2014 Milinda Pathirage 12 | 13 | Distributed under the Apache License Version 2.0 or (at 14 | your option) any later version. 15 | 16 | 17 | -------------------------------------------------------------------------------- /freshet-dsl/doc/cql-to-samza.md: -------------------------------------------------------------------------------- 1 | # Freshet DSL Compilation 2 | 3 | ## Freshet DSL defaults (from CQL) 4 | 5 | - When a stream is referenced in a Freshet query where relation is expected, an *Unbounded* window is applied to the stream by default. 6 | - *Istream* operator is added by default whenever the query produces a *monotonic* relation. Static monotonicity test, is 7 | used ~ base relation is monotonic if relation is append only, like ```(window (unbounded))``` and join of two monotonic relations also is monotonic. 8 | - If we can't determine the monotonicity, we depends on the query author. 9 | - For inner subquery, we add an *Istream* operator by default whenever the subquery is monotonic. Other case is still ambiguous. 10 | - *Istream-Unbounded* is default when window specification is omitted. 11 | 12 | ## Common Patterns 13 | 14 | - **Filters** are implemented using *Istream-Unbounded* window combination or an *Rstream-Now* window combination. 15 | - When a stream is joined with a relation, it is usually most meaningful to apply a *Now* window over the stream and *Rstream* operator over the join result. 16 | 17 | ## Market feed stream definition 18 | 19 | ```clojure 20 | (defstream market-feed 21 | (stream-fields [:symbol :string 22 | :bid :float 23 | :bid-size :float 24 | :exchange :string 25 | :volume :float])) 26 | 27 | ``` 28 | 29 | ## Queries over market feed stream 30 | 31 | ### Select tuples with symbol "APPL" 32 | 33 | ```clojure 34 | (select market-feed 35 | (where (= :symbol "APPL"))) 36 | ``` 37 | 38 | As per the CQL defaults this query get transformed into query like below: 39 | 40 | ```clojure 41 | (select market-feed 42 | (modifiers :istream) 43 | (window (unbounded)) 44 | (where (= :symbol "APPL"))) 45 | ``` 46 | 47 | After optimizations, this above will transformed into: 48 | 49 | ```clojure 50 | (select market-feed 51 | (modifiers :rstream) 52 | (window (now)) 53 | (where (= :symbol "APPL"))) 54 | ``` 55 | 56 | Finally to Samza job graph which looks like following: 57 | 58 | ```clojure 59 | {:operator :window :tuplebased true :range 1 :query-id "some-id" :input-stream "topic-from-stream-def" :output-stream "qid-window-out-market-feed" :input-streams [{:stream "market-feed" ...}] :output-streams [{:stream "qid-window-out-market-feed" ..}]} 60 | -> {:operator :select :where-exp "" :input-stream "qid-window-out-market-feed" :output-stream "qid-select-out-market-feed" :input-streams [{:stream "qid-window-out-market-feed"}] :output-streams [{:stream "qid-select-out-rstream-market-feed" ..}]} 61 | -> {:operator :rstream :input-stream "qid-select-out-market-feed"} 62 | ``` 63 | 64 | -------------------------------------------------------------------------------- /freshet-dsl/doc/intro.md: -------------------------------------------------------------------------------- 1 | # Introduction to freshet-dsl 2 | 3 | TODO: write [great documentation](http://jacobian.org/writing/what-to-write/) 4 | -------------------------------------------------------------------------------- /freshet-dsl/doc/samples.md: -------------------------------------------------------------------------------- 1 | # Freshet DSL Samples 2 | 3 | ## Defining streams 4 | 5 | ```clojure 6 | (defstream market-feed 7 | (stream-fields [:symbol :string 8 | :bid :float 9 | :ask :float 10 | :bid-size :float 11 | :ask-size :float 12 | :quote-time :time 13 | :trade-time :time 14 | :exchange :string 15 | :volume :float])) 16 | ``` 17 | 18 | ## Queries 19 | 20 | ### Project with renaming 21 | 22 | ```clojure 23 | (select (rstream [:symbol [:bid :as :b] :volume]) 24 | (from [(window market-feed (now))]) 25 | (where (= :symbol "APPL"))) 26 | ``` 27 | 28 | ### Aggregates 29 | 30 | ```clojure 31 | (select (istream [(avg :bid) :symbol]) 32 | (from [(window market-feed (range 400))])) 33 | ``` 34 | 35 | ### Joins 36 | 37 | ```clojure 38 | (select (istream [:*]) 39 | (from [(window s1 (rows 5)) (window s2 (rows 10))]) 40 | (where (= s1.a s2.a))) 41 | ``` 42 | 43 | 44 | ### Defaults 45 | 46 | ```clojure 47 | (select [:*] 48 | (from [market-feed]) 49 | (where (= :symbol "APPL"))) 50 | ``` -------------------------------------------------------------------------------- /freshet-dsl/project.clj: -------------------------------------------------------------------------------- 1 | (defproject org.pathirage.freshet/freshet-dsl "0.1.0-SNAPSHOT" 2 | :description "Freshet DSL: Clojure DSL based on CQL." 3 | :url "http://github.com/milinda/Freshet" 4 | :license {:name "Apache License, Version 2.0" 5 | :url "http://www.apache.org/licenses/LICENSE-2.0.html"} 6 | :dependencies [[org.clojure/clojure "1.6.0"] 7 | [org.pathirage.freshet/freshet-core "0.1.0-SNAPSHOT"] 8 | [clojurewerkz/propertied "1.2.0"] 9 | [org.apache.hadoop/hadoop-yarn-client "2.2.0"] 10 | [org.apache.hadoop/hadoop-yarn-common "2.2.0"] 11 | [org.apache.hadoop/hadoop-common "2.2.0"] 12 | [commons-codec/commons-codec "1.4"]] 13 | :source-paths ["src/clojure"] 14 | :java-source-paths ["src/java"] 15 | :test-paths ["test/clojure" "test/java"]) 16 | -------------------------------------------------------------------------------- /freshet-dsl/src/clojure/org/pathirage/freshet/dsl/compiler.clj: -------------------------------------------------------------------------------- 1 | (ns org.pathirage.freshet.dsl.compiler 2 | (:import (org.pathirage.freshet.operators.select Expression ExpressionType PredicateType)) 3 | (:gen-class)) 4 | 5 | (def freshet-type-map 6 | {:integer "int" 7 | :string "string" 8 | :long "long" 9 | :double "double" 10 | :float "float"}) 11 | 12 | (defn- freshet-type-to-avro-type 13 | [t] 14 | (let [at (get freshet-type-map t)] 15 | (if at 16 | at 17 | (throw (Exception. (str "Invalid type " t)))))) 18 | 19 | (defn- freshet-fields-to-avro-fields 20 | [fields] 21 | (let [fields-seq (seq fields)] 22 | (vec (map (fn [e] {"name" (first e) "type" (freshet-type-to-avro-type (second e))}) fields-seq)))) 23 | 24 | (defn stream-to-avro-schema 25 | "Generate avro schema from stream definition. Avro schema needs a namespace. Default is 'freshet'." 26 | [stream] 27 | (let [fields (:fields stream) 28 | ns (:ns stream) 29 | name (str (:name stream))] 30 | {"namespace" ns 31 | "type" "record" 32 | "name" name 33 | "fields" (freshet-fields-to-avro-fields fields)})) 34 | 35 | (defn- pred-to-pred-type 36 | [pred] 37 | (case pred 38 | :and PredicateType/AND 39 | :or PredicateType/OR 40 | := PredicateType/EQUAL 41 | :not= PredicateType/NOT_EQUAL 42 | :not PredicateType/NOT 43 | :< PredicateType/LESS_THAN 44 | :<= PredicateType/LESS_THAN_OR_EQUAL 45 | :> PredicateType/GREATER_THAN 46 | :>= PredicateType/GREATER_THAN_OR_EQUAL 47 | (throw (Exception. (str "Unknown predicate type " pred))))) 48 | 49 | (defn compile-expression 50 | [expr] 51 | (let [pred (:pred expr) 52 | operator (:op expr) 53 | lhs (first (:args expr)) 54 | rhs (second (:args expr)) 55 | lhs-expr (cond 56 | (keyword? lhs) (doto (Expression. ExpressionType/FIELD) 57 | (.setField (str lhs))) 58 | (or (number? lhs) (string? lhs)) (doto (Expression. ExpressionType/VALUE) 59 | (.setValue lhs)) 60 | (map? lhs) (compile-expression lhs) 61 | :else (throw (Exception. (str "Unknown argument: " lhs " expression: " expr)))) 62 | rhs-expr (if (not (= pred :not)) 63 | (cond 64 | (keyword? rhs) (doto (Expression. ExpressionType/FIELD) 65 | (.setField (str rhs))) 66 | (or (number? rhs) (string? rhs)) (doto (Expression. ExpressionType/VALUE) 67 | (.setValue rhs)) 68 | (map? rhs) (compile-expression rhs) 69 | :else (throw (Exception. (str "Unknown argument: " rhs)))))] 70 | (cond 71 | (and pred (not (= pred :not))) (doto (Expression. ExpressionType/PREDICATE) 72 | (.setPredicate (pred-to-pred-type pred)) 73 | (.setLhs lhs-expr) 74 | (.setRhs rhs-expr)) 75 | (and pred (= pred :not)) (doto (Expression. ExpressionType/PREDICATE) 76 | (.setPredicate (pred-to-pred-type pred)) 77 | (.setLhs lhs-expr)) 78 | operator (throw (Exception. "Operators are not yet supported at DSL level.")) 79 | :else (throw (Exception. (str "Unsupported expression: " expr)))))) 80 | 81 | (defn create-raexp 82 | [id] 83 | {:id id 84 | :is-project false 85 | :is-select true 86 | :result-fields [] 87 | :select nil 88 | :from nil}) 89 | 90 | (defn uuid [] (str (java.util.UUID/randomUUID))) 91 | 92 | (defn- query-id 93 | [stream] 94 | (let [ns (str (:ns stream)) 95 | name (str (:name stream))] 96 | (str "query-on-" ns "-" name "-" (uuid)))) 97 | 98 | (defn- is-* 99 | [fields-from-def fields-in-query] 100 | (every? true? (map (fn [f] (contains? fields-from-def f)) fields-in-query))) 101 | 102 | (defn handle-fields 103 | [raexp query] 104 | "Handle projections. 105 | 106 | Limitations 107 | - Doesn't support renaming" 108 | (let [fields (:fields query) 109 | fields-from-def (:fields (:stream (first (:from query))))] 110 | (cond 111 | (= :* (first fields)) (let [r1 (assoc raexp :is-project false) 112 | r2 (assoc r1 :is-select true)] 113 | (assoc r2 :result-fields (not-empty (keys fields-from-def)))) 114 | (> (count (remove #(= % :*) fields)) 1) (let [eq-* (is-* fields-from-def (remove #(= % :*) fields))] 115 | (if eq-* 116 | (let [r1 (assoc raexp :is-project false) 117 | r2 (assoc r1 :is-select true)] 118 | (assoc r2 :result-fields (not-empty (keys fields-from-def)))) 119 | (let [r1 (assoc raexp :is-project true) 120 | r2 (assoc r1 :is-select false)] 121 | (assoc r2 :result-fields (remove #(= % :*) fields))))) 122 | :else (throw (Exception. (str "Unsupported projection/selection " fields)))))) 123 | 124 | (defn handle-where 125 | [raexp query] 126 | (let [where-exp (:where query) 127 | compiled-where (compile-expression where-exp)] 128 | (assoc raexp :select compiled-where))) 129 | 130 | (defn handle-windows 131 | [raexp query] 132 | (let [window (:window query)] 133 | (assoc raexp :window window))) 134 | 135 | (defn handle-from 136 | [raexp query] 137 | (let [stream (:stream (first (:from query)))] 138 | (assoc raexp :from stream))) 139 | 140 | (defn sql-to-raexp 141 | "Converts SQL statement to relational algebra expression. 142 | 143 | Fundamental property: 144 | - Every operator in the algebra accepts (one or two) relation instances as arguments and returns a relation instance 145 | as the result." 146 | [query] 147 | (let [raexp (create-raexp (query-id (:stream (first (:from query)))))] 148 | (-> raexp (handle-from query) (handle-fields query) (handle-where query) (handle-windows query)))) 149 | 150 | -------------------------------------------------------------------------------- /freshet-dsl/src/clojure/org/pathirage/freshet/dsl/core.clj: -------------------------------------------------------------------------------- 1 | (ns org.pathirage.freshet.dsl.core 2 | (:refer-clojure :exclude [range]) 3 | (:require [clojure.walk :as walk] 4 | [clojure.set :as set]) 5 | (:gen-class)) 6 | 7 | (comment "Most of the DSL constructs are inspired by SQLKorma(http://sqlkorma.com) library by Chris Ganger.") 8 | 9 | (defn create-stream 10 | "Create a stream representing a topic in Kafka." 11 | [name] 12 | {:stream name 13 | :name name 14 | :ns "freshet" 15 | :pk :id 16 | :fields [] 17 | :ts :timestamp}) 18 | 19 | (defn stream-fields 20 | "Fields in a stream. These will get retrieved by default in select query if there aren't any projections." 21 | [stream fields] 22 | (assoc stream :fields (apply array-map fields))) 23 | 24 | (defn pk 25 | [stream k] 26 | (assoc stream :pk (keyword k))) 27 | 28 | (defn ts 29 | [stream s] 30 | (assoc stream :ts (keyword s))) 31 | 32 | (defn namespace 33 | [stream ns] 34 | (assoc stream :ns ns)) 35 | 36 | (defmacro defstream 37 | "Define a stream representing a topic in Kafka, applying functions in the body which changes the stream definition." 38 | [stream & body] 39 | `(let [s# (-> (create-stream ~(name stream)) ~@body)] 40 | (def ~stream s#))) 41 | 42 | (defn select* 43 | "Creates the base query configuration for the given stream." 44 | [r2s-with-fields] 45 | (let [fields (cond 46 | (map? r2s-with-fields) (:fields r2s-with-fields) 47 | (vector? r2s-with-fields) r2s-with-fields 48 | :else (throw (Exception. (str "Unsupported fields spec: " r2s-with-fields)))) 49 | r2s (if (map? r2s-with-fields) 50 | (:r2s-operator r2s-with-fields) 51 | :istream)] 52 | {:type :select 53 | :fields fields 54 | :r2s r2s 55 | :from [] 56 | :modifiers [] 57 | :window {:type :window :window-type :unbounded} 58 | :where [] 59 | :having [] 60 | :aliases #{} 61 | :group [] 62 | :aggregate [] 63 | :joins []})) 64 | 65 | (defn- update-fields 66 | [query fields] 67 | (let [[first-in-current] (:fields query)] 68 | (if (= first-in-current :*) 69 | (assoc query :fields fields) 70 | (update-in query [:fields] (fn [v1 v2] (vec (concat v1 v2))) fields)))) 71 | 72 | (defn fields 73 | "Set fields which should be selected by the query. Fields can be a keyword 74 | or pair of keywords in a vector [field alias] 75 | 76 | ex: (fields [:name :username] :address :age)" 77 | [query & fields] 78 | (let [aliases (set (map second (filter vector? fields)))] 79 | (-> query 80 | (update-in [:aliases] set/union aliases) 81 | (update-fields fields)))) 82 | 83 | ;; TODO: use named parameters for configuring sliding windows. 84 | (defn window-range 85 | [window seconds] 86 | (let [window (assoc window :window-type :range)] 87 | (assoc window :range seconds))) 88 | 89 | (defn window-rows 90 | [window count] 91 | (let [window (assoc window :window-type :rows)] 92 | (assoc window :rows count))) 93 | 94 | (defn window-now 95 | [window] 96 | (assoc window :window-type :now)) 97 | 98 | (defn window-unbounded 99 | [window] 100 | (assoc window :window-type :unbounded)) 101 | 102 | (defn window*_ 103 | [] 104 | {:type :window}) 105 | 106 | ;; TODO: How to handle multiple stream situation. We need to change how from clause is specified in DSL. 107 | (defmacro window_ 108 | "Set windowing method for stream-to-relational mapping. 109 | ex: (window (range 30))" 110 | [query & wm] 111 | `(let [window# (-> (window*) ~@wm)] 112 | (update-in ~query [:window] merge window#))) 113 | 114 | (defn window* 115 | [stream] 116 | {:stream stream}) 117 | 118 | (defmacro window 119 | [stream & wspec] 120 | `(-> (window* ~stream) ~@wspec)) 121 | 122 | (defn modifiers 123 | "Set modifier to the select query to filter which results are returned. 124 | 125 | ex: (select wikipedia-stream 126 | (modifier :distinct) 127 | (window (range 60)))" 128 | [query & m] 129 | (update-in query [:modifiers] conj m)) 130 | 131 | (comment 132 | "How where clauses should be transformed" 133 | 134 | (or (> :delta 100) (= :newPage "True")) 135 | 136 | {::pred or ::args [{::pred > ::args [:delta 100]} {::pred = ::args [:newPage "True"]}]} 137 | 138 | "Binding based approach used in Korma is needed to implement aliases and table prefixes.") 139 | 140 | ;; TODO: Difference between where and having is important. Where is executed before perfoming any aggregations -- 141 | ;; TODO: basically to filter rows in a relation before performing group by and aggregations -- having is executed after 142 | ;; TODO: aggregations are done. 143 | 144 | (def predicates 145 | {'and 'org.pathirage.freshet.dsl.core/pred-and 146 | 'or 'org.pathirage.freshet.dsl.core/pred-or 147 | '= 'org.pathirage.freshet.dsl.core/pred-= 148 | 'not= 'org.pathirage.freshet.dsl.core/pred-not= 149 | '< 'org.pathirage.freshet.dsl.core/pred-< 150 | '> 'org.pathirage.freshet.dsl.core/pred-> 151 | '<= 'org.pathirage.freshet.dsl.core/pred-<= 152 | '>= 'org.pathirage.freshet.dsl.core/pred->= 153 | 'not 'org.pathirage.freshet.dsl.core/pred-not}) 154 | 155 | (defn pred-and 156 | [l r] 157 | {:pred :and :args [l r]}) 158 | 159 | (defn pred-or 160 | [l r] 161 | {:pred :or :args [l r]}) 162 | 163 | (defn pred-= 164 | [l r] 165 | {:pred := :args [l r]}) 166 | 167 | (defn pred-not= 168 | [l r] 169 | {:pred :not= :args [l r]}) 170 | 171 | (defn pred-< 172 | [l r] 173 | {:pred :< :args [l r]}) 174 | 175 | (defn pred-> 176 | [l r] 177 | {:pred :> :args [l r]}) 178 | 179 | (defn pred-<= 180 | [l r] 181 | {:pred :<= :args [l r]}) 182 | 183 | (defn pred->= 184 | [l r] 185 | {:pred :>= :args [l r]}) 186 | 187 | (defn pred-not 188 | [l r] 189 | {:pred :not :args [l r]}) 190 | 191 | (defn pred-conj 192 | [l r] 193 | (if (empty? l) 194 | r 195 | {:pred :and :args (conj l r)})) 196 | 197 | (defn parse-where 198 | [form] 199 | (walk/postwalk-replace predicates form)) 200 | 201 | (defn- handle-where-or-having-clauses 202 | [where*or-having* query form] 203 | `(let [q# ~query] 204 | (~where*or-having* q# ~(parse-where `~form)))) 205 | 206 | (defn where* 207 | "Add where clauses to the query. Clauses are a map and will be joined together via AND to the existing clauses." 208 | [query clause] 209 | (update-in query [:where] pred-conj clause)) 210 | 211 | (defmacro where 212 | "Add where clauses to query, clauses can express in clojure with keywords to refer to the stream fields. 213 | 214 | ex: (where query (> :delta 100)) 215 | 216 | 217 | Supported predicates: and, or, =, not=, <, >, <=, >=, not" 218 | [query form] 219 | (handle-where-or-having-clauses #'where* query form)) 220 | 221 | (defn having* 222 | "Add having clauses to the query. Clauses are a map and will be joined together via AND to the existing clauses." 223 | [query clause] 224 | (update-in query [:having] pred-conj clause)) 225 | 226 | (defmacro having 227 | "Add where clauses to query, clauses can express in clojure with keywords to refer to the stream fields. 228 | 229 | ex: (where query (> :delta 100)) 230 | 231 | 232 | Supported predicates: and, or, =, not=, <, >, <=, >=, not" 233 | [query form] 234 | (handle-where-or-having-clauses #'having* query form)) 235 | 236 | (defn istream 237 | [fields-with-renaming] 238 | {:r2s-operator :istream :fields fields-with-renaming}) 239 | 240 | (defn rstream 241 | [fields-with-renaming] 242 | {:r2s-operator :rstream :fields fields-with-renaming}) 243 | 244 | (defn from* 245 | [query f] 246 | (let [normalized-f (vec (map #(if (contains? % :window-type) % {:window-type :unbounded :stream %}) f))] 247 | (update-in query [:from] into normalized-f))) 248 | 249 | (defmacro from 250 | [query s2r] 251 | `(from* ~query ~s2r)) 252 | 253 | (defn execute-query 254 | "Execute a continuous query. Query will first get converted to extension of relation algebra, then 255 | to physical query plan before getting deployed in to the stream processing engine." 256 | [query] 257 | (prn query)) 258 | 259 | (defmacro select 260 | "Build a select query, apply any modifiers specified in the body and then generate and submit DAG of Samza jobs 261 | which is the physical execution plan of the continuous query on stream specified by `stream`. `stream` is an stream 262 | created by `defstream`. Returns a job identifier which can used to monitor the query or error incase of a failure. 263 | 264 | ex: (select stock-ticks 265 | (fields :symbol :bid :ask) 266 | (where {:symbol 'APPL'}))" 267 | [fwm & body] 268 | `(let [query# (-> (select* ~fwm) ~@body)] 269 | query#)) 270 | 271 | 272 | -------------------------------------------------------------------------------- /freshet-dsl/src/clojure/org/pathirage/freshet/dsl/helpers.clj: -------------------------------------------------------------------------------- 1 | (ns org.pathirage.freshet.dsl.helpers 2 | (:refer-clojure :exclude [range]) 3 | (:import [org.pathirage.freshet Constants] 4 | [org.pathirage.freshet.operators.select Expression ExpressionType PredicateType OperatorType] 5 | [org.apache.samza.config.factories PropertiesConfigFactory] 6 | [org.apache.samza.job JobRunner] 7 | [java.net URI] 8 | [java.io File FileInputStream] 9 | [java.util Properties] 10 | [org.apache.commons.codec.binary Base64]) 11 | (:require [clojurewerkz.propertied.properties :as props] 12 | [org.pathirage.freshet.dsl.core :refer [defstream ts stream-fields]] 13 | [clojure.string :as string]) 14 | (:gen-class)) 15 | 16 | (defn default-without-mterics-props 17 | "Create map of default properties for Freshet Samza jobs. 18 | 19 | zookeeper-list should look like - zk1.example.com:2181,zk2.example.com:2181,.. 20 | broker-list should look like - kafka1.example.com:9092,kafka2.example.com:9092, .." 21 | [zookeeper-list broker-list yarn-package-path] 22 | {"job.factory.class" "org.apache.samza.job.yarn.YarnJobFactory" 23 | "yarn.package.path" yarn-package-path 24 | "systems.kafka.samza.factory" "org.apache.samza.system.kafka.KafkaSystemFactory" 25 | "serializers.registry.streamelement.class" "org.pathirage.freshet.serde.StreamElementSerdeFactory" 26 | "systems.kafka.samza.msg.serde" "streamelement" 27 | "systems.kafka.consumer.zookeeper.connect" zookeeper-list 28 | "systems.kafka.consumer.auto.offset.reset" "largest" 29 | "systems.kafka.producer.metadata.broker.list" broker-list 30 | "systems.kafka.producer.producer.type" "sync" 31 | "systems.kafka.producer.batch.num.messages" "1"}) 32 | 33 | (defn wikipedia-stream-def [] 34 | (defstream wikipedia-activity 35 | (stream-fields [:title :string 36 | :user :string 37 | :diff-bytes :integer 38 | :is-talk :boolean 39 | :is-new :boolean 40 | :is-bot-edit :boolean 41 | :timestamp :long]) 42 | (ts :timestamp))) 43 | 44 | (defn wikipedia-raw-def [] 45 | (defstream wikipedia-raw 46 | (stream-fields [:title :string 47 | :user :string 48 | :diff-bytes :integer 49 | :diff-url :string 50 | :unparsed-flags :string 51 | :summary :string 52 | :is-minor :boolean 53 | :is-unpatrolled :boolean 54 | :is-special :boolean 55 | :is-talk :boolean 56 | :is-new :boolean 57 | :is-bot-edit :boolean 58 | :timestamp :long]) 59 | (ts :timestamp))) 60 | 61 | ; TODO: Clojure maps describing wikipedia activity feed and window operator jobs. Use samza default conf. 62 | (defn wikipedia-activity-feed-job [zookeeper kafka-brokers] 63 | {:job-name "wikipedia-feed" 64 | :inputs "wikipedia.#en.wikipedia,wikipedia.#en.wiktionary,wikipedia.#en.wikinews" 65 | :zookeeper zookeeper 66 | :broker kafka-brokers 67 | :yarn-package (yarn-package-path) 68 | :task-class "org.pathirage.freshet.utils.WikipediaFeedStreamTask"}) 69 | 70 | (defn- file-path-to-uri 71 | [path] 72 | (let [f (File. path)] 73 | (if (.exists f) 74 | (.toString (.toURI f)) 75 | path))) 76 | 77 | (defn gene-wikipedia-feed-job-props 78 | [op-config] 79 | (let [wikiprops (default-without-mterics-props (:zookeeper op-config) (:broker op-config) (file-path-to-uri (:yarn-package op-config))) 80 | wikiprops (-> (assoc wikiprops Constants/CONF_SAMZA_JOB_NAME (:job-name op-config)) 81 | (assoc Constants/CONF_SAMZA_TASK_CLASS (:task-class op-config)) 82 | (assoc Constants/CONF_SAMZA_TASK_INPUTS (:inputs op-config)) 83 | (assoc Constants/CONF_SYSTEMS_WIKIPEDIA_FACTORY "org.pathirage.freshet.utils.system.WikipediaSystemFactory") 84 | (assoc Constants/CONF_SYSTEMS_WIKIPEDIA_HOST "irc.wikimedia.org") 85 | (assoc Constants/CONF_SYSTEMS_WIKIPEDIA_PORT "6667") 86 | (assoc "serializers.registry.json.class" "org.apache.samza.serializers.JsonSerdeFactory")) 87 | properties-file-name (str "helper-" (:job-name op-config)) 88 | properties-file (java.io.File/createTempFile properties-file-name ".properties")] 89 | (props/store-to wikiprops properties-file) 90 | (.toString (.toURI properties-file)))) -------------------------------------------------------------------------------- /freshet-dsl/src/clojure/org/pathirage/freshet/samples/expressions.clj: -------------------------------------------------------------------------------- 1 | (ns org.pathirage.freshet.samples.expressions 2 | (:import [org.pathirage.freshet.operators.select Expression ExpressionType PredicateType OperatorType])) 3 | 4 | (def where-diff-bytes->-100 5 | (let [lhs (doto (Expression. ExpressionType/FIELD) 6 | (.setField "diff-bytes")) 7 | rhs (doto (Expression. ExpressionType/VALUE) 8 | (.setValue 100))] 9 | (doto (Expression. ExpressionType/PREDICATE) 10 | (.setPredicate PredicateType/GREATER_THAN) 11 | (.setLhs lhs) 12 | (.setRhs rhs)))) 13 | 14 | (def where-diff-bytes-<-100 15 | (let [lhs (doto (Expression. ExpressionType/FIELD) 16 | (.setField "diff-bytes")) 17 | rhs (doto (Expression. ExpressionType/VALUE) 18 | (.setValue 100))] 19 | (doto (Expression. ExpressionType/PREDICATE) 20 | (.setPredicate PredicateType/LESS_THAN) 21 | (.setLhs lhs) 22 | (.setRhs rhs)))) 23 | 24 | (def where-is-new-edit 25 | (let [lhs (doto (Expression. ExpressionType/FIELD) 26 | (.setField "is-new")) 27 | rhs (doto (Expression. ExpressionType/VALUE) 28 | (.setValue true))] 29 | (doto (Expression. ExpressionType/PREDICATE) 30 | (.setPredicate PredicateType/EQUAL) 31 | (.setLhs lhs) 32 | (.setRhs rhs)))) 33 | 34 | (def new-edit-and->100-diff 35 | (doto (Expression. ExpressionType/PREDICATE) 36 | (.setPredicate PredicateType/AND) 37 | (.setLhs where-diff-bytes->-100) 38 | (.setRhs where-is-new-edit))) 39 | 40 | -------------------------------------------------------------------------------- /freshet-dsl/src/clojure/org/pathirage/freshet/samples/streams.clj: -------------------------------------------------------------------------------- 1 | (ns org.pathirage.freshet.samples.streams 2 | (require [org.pathirage.freshet.dsl.core :refer [defstream ts stream-fields]])) 3 | 4 | (defstream wikipedia-raw 5 | (stream-fields [:title :string 6 | :user :string 7 | :diff-bytes :integer 8 | :diff-url :string 9 | :unparsed-flags :string 10 | :summary :string 11 | :is-minor :boolean 12 | :is-unpatrolled :boolean 13 | :is-special :boolean 14 | :is-talk :boolean 15 | :is-new :boolean 16 | :is-bot-edit :boolean 17 | :timestamp :long]) 18 | (ts :timestamp)) 19 | -------------------------------------------------------------------------------- /freshet-dsl/src/clojure/org/pathirage/freshet/utils/config.clj: -------------------------------------------------------------------------------- 1 | (ns org.pathirage.freshet.utils.config 2 | (:import [java.io File FileInputStream] 3 | [java.util Properties] 4 | [org.pathirage.freshet Constants] 5 | [org.apache.commons.codec.binary Base64]) 6 | (:require [clojure.string :as string])) 7 | 8 | (defmacro read-property-from-freshet-configuration 9 | [property] 10 | `(let [freshet-home (System/getenv "FRESHET_HOME") 11 | freshet-conf (str freshet-home "/deploy/freshet/conf/freshet.conf") 12 | freshet-conf-in (FileInputStream. freshet-conf) 13 | props (Properties.)] 14 | (.load props freshet-conf-in) 15 | (.get props ~property))) 16 | 17 | (defn yarn-package-path 18 | "Read freshet YARN package path from freshet.conf. 19 | 20 | Freshet configuration is read relative to the FRESHET_HOME directory." 21 | [] 22 | (read-property-from-freshet-configuration "freshet.yarn.package.path")) 23 | 24 | (defn zookeeper-node-list 25 | "Read Zookeeper node list from freshet.conf." 26 | [] 27 | (read-property-from-freshet-configuration "freshet.kafka.zookeeper.connect")) 28 | 29 | (defn kafka-broker-list 30 | "Read Kafka broker list from freshet.conf" 31 | [] 32 | (read-property-from-freshet-configuration "freshet.kafka.broker.list")) 33 | 34 | (defn serialize-streamdef 35 | "Serialize stream definition to a string representation." 36 | [stream] 37 | (let [fields (:fields stream)] 38 | (string/join "," (map (fn [kv] (str (name (key kv)) "=" (name (val kv)))) fields)))) 39 | 40 | (defn stream-to-streamdef-prop 41 | "Generate stream definition as a Samza job config property." 42 | [stream] 43 | {(str Constants/CONF_OPERATOR_INPUT_STREAMS (:name stream)) (serialize-streamdef stream)}) 44 | 45 | (defn streams-to-streamdef-props 46 | "Generate list of stream definition properties" 47 | [streams] 48 | (reduce merge (map stream-to-streamdef-prop streams))) 49 | 50 | (defn base64-encode 51 | [^String str] 52 | (let [original-bytes (.getBytes str)] 53 | (String. (Base64/encodeBase64 original-bytes)))) -------------------------------------------------------------------------------- /freshet-dsl/test/clojure/org/pathirage/freshet/expressiondsl_test.clj: -------------------------------------------------------------------------------- 1 | (ns org.pathirage.freshet.expressiondsl-test 2 | (:require [org.pathirage.freshet.dsl.core :as fcore] 3 | [clojure.test :refer :all])) 4 | 5 | (comment 6 | "Tests for where clause building in DSL") 7 | 8 | (deftest expression-building-text 9 | (testing "Simple expression" 10 | (let [e (fcore/pred-= :delta 100)] 11 | (prn (str "expression: " e)) 12 | (is (= (fcore/pred-= :delta 100) {:pred := :args [:delta 100]})))) 13 | (testing "Complex expression" 14 | (let [e (fcore/pred-and (fcore/pred-< :delta 100) (fcore/pred-> :beta 340))] 15 | (prn (str "expression" e)) 16 | (is (= (fcore/pred-and (fcore/pred-< :delta 100) (fcore/pred-> :beta 340)) 17 | {:pred :and :args [{:pred :< :args [:delta 100]} {:pred :> :args [:beta 340]}]}))))) -------------------------------------------------------------------------------- /freshet-helpers/.gitignore: -------------------------------------------------------------------------------- 1 | /target 2 | /classes 3 | /checkouts 4 | pom.xml 5 | pom.xml.asc 6 | *.jar 7 | *.class 8 | /.lein-* 9 | /.nrepl-port 10 | -------------------------------------------------------------------------------- /freshet-helpers/README.md: -------------------------------------------------------------------------------- 1 | # freshet-utils 2 | 3 | A Clojure library designed to ... well, that part is up to you. 4 | 5 | ## Usage 6 | 7 | FIXME 8 | 9 | ## License 10 | 11 | Copyright © 2014 FIXME 12 | 13 | Distributed under the Eclipse Public License either version 1.0 or (at 14 | your option) any later version. 15 | -------------------------------------------------------------------------------- /freshet-helpers/doc/wikipedia-activity-collector.md: -------------------------------------------------------------------------------- 1 | # Collecting Wikipedia Activities for Building Test Data Set 2 | 3 | This was implemented based on Samza examples project. Listen to Wikipedia IRC channel for activity messages 4 | and append them to CSV file after parsing the activity. 5 | 6 | ## CSV format 7 | 8 | channel,source,time,title,user,diff-bytes,diff-url,summary,is-minor,is-talk,is-bot-edit,is-new,is-unpatrolled,is-special,unparsed-flags 9 | -------------------------------------------------------------------------------- /freshet-helpers/project.clj: -------------------------------------------------------------------------------- 1 | (defproject org.pathirage.freshet/freshet-helpers "0.1.0-SNAPSHOT" 2 | :description "Freshet Utils: Tools and Utilities of Freshet project." 3 | :url "http://github.com/milinda/Freshet" 4 | :license {:name "Apache License, Version 2.0" 5 | :url "http://www.apache.org/licenses/LICENSE-2.0.html"} 6 | :repositories [["codehaus" "http://repository.codehaus.org/org/codehaus"]] 7 | :dependencies [[org.clojure/clojure "1.6.0"] 8 | [org.schwering/irclib "1.10"] 9 | [org.slf4j/slf4j-api "1.6.2"] 10 | [org.slf4j/slf4j-log4j12 "1.6.2"] 11 | [com.fasterxml.jackson.core/jackson-core "2.4.0"] 12 | [com.fasterxml.jackson.core/jackson-databind "2.4.0"] 13 | [net.sf.opencsv/opencsv "2.0"] 14 | [org.clojure/tools.cli "0.3.1"] 15 | [org.apache.samza/samza-api "0.7.0"] 16 | [org.apache.samza/samza-serializers_2.10 "0.7.0"] 17 | [org.apache.samza/samza-core_2.10 "0.7.0"] 18 | [org.apache.samza/samza-yarn_2.10 "0.7.0"] 19 | [org.apache.samza/samza-kv_2.10 "0.7.0"] 20 | [org.apache.samza/samza-kafka_2.10 "0.7.0"] 21 | [org.pathirage.freshet/freshet-core "0.1.0-SNAPSHOT"]] 22 | :main org.pathirage.freshet.utils.core 23 | :source-paths ["src/clojure"] 24 | :java-source-paths ["src/java"] 25 | :test-paths ["test/clojure" "test/java"]) 26 | -------------------------------------------------------------------------------- /freshet-helpers/src/clojure/org/pathirage/freshet/helpers/core.clj: -------------------------------------------------------------------------------- 1 | (ns org.pathirage.freshet.utils.core 2 | (:import (helpers WikipediaActivityFeed WikipediaActivityFeed$WikipediaActivitiesToCSV)) 3 | (:require [clojure.tools.cli :as cli]) 4 | (:gen-class)) 5 | 6 | (def cli-options 7 | [["-t" "--time SECONDS" "Data Collection Time" 8 | :default 60 9 | :parse-fn #(Integer/parseInt %) 10 | :validate [#(< 1 %) "Must be grater than 1 second"]]]) 11 | 12 | (defn -main 13 | [& args] 14 | (let [opts (cli/parse-opts args cli-options) 15 | feed (WikipediaActivityFeed. "irc.wikimedia.org" 6667)] 16 | (prn (:time (:options opts))) 17 | (.start feed) 18 | (.listen feed "#en.wikipedia" (WikipediaActivityFeed$WikipediaActivitiesToCSV.)) 19 | (Thread/sleep (* 1000 (:time (:options opts)))) 20 | (.stop feed))) 21 | 22 | 23 | -------------------------------------------------------------------------------- /freshet-helpers/src/java/org/pathirage/freshet/helpers/KafkaMonitor.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.helpers; 19 | 20 | import kafka.consumer.Consumer; 21 | import kafka.consumer.ConsumerConfig; 22 | import kafka.consumer.ConsumerIterator; 23 | import kafka.consumer.KafkaStream; 24 | import kafka.javaapi.consumer.ConsumerConnector; 25 | import org.pathirage.freshet.data.StreamElement; 26 | import org.pathirage.freshet.serde.StreamElementSerde; 27 | import org.pathirage.freshet.serde.StreamElementSerdeFactory; 28 | 29 | import java.util.*; 30 | import java.util.concurrent.ExecutorService; 31 | import java.util.concurrent.Executors; 32 | 33 | public class KafkaMonitor { 34 | private final ConsumerConnector consumer; 35 | private ExecutorService executor; 36 | 37 | 38 | public KafkaMonitor(String zk){ 39 | this.consumer = Consumer.createJavaConsumerConnector(createConsumerConfig(zk, UUID.randomUUID().toString())); 40 | this.executor = Executors.newCachedThreadPool(); 41 | } 42 | 43 | public void registerTopic(String topic, int partitions){ 44 | Map topicCountMap = new HashMap(); 45 | topicCountMap.put(topic, partitions); 46 | 47 | Map>> consumerMap = consumer.createMessageStreams(topicCountMap); 48 | 49 | List> streams = consumerMap.get(topic); 50 | 51 | int threadNumber = 0; 52 | for (final KafkaStream stream : streams) { 53 | executor.submit(new MessageConsumer(stream, new StreamHandler(topic), threadNumber)); 54 | threadNumber++; 55 | } 56 | } 57 | 58 | private static ConsumerConfig createConsumerConfig(String zk, String groupId) { 59 | Properties props = new Properties(); 60 | props.put("zookeeper.connect", zk); 61 | props.put("group.id", groupId); 62 | props.put("zookeeper.session.timeout.ms", "400"); 63 | props.put("zookeeper.sync.time.ms", "200"); 64 | props.put("auto.commit.interval.ms", "1000"); 65 | 66 | return new ConsumerConfig(props); 67 | } 68 | 69 | public static void main(String[] args) { 70 | KafkaMonitor kafkaMonitor = new KafkaMonitor("localhost:2181"); 71 | kafkaMonitor.registerTopic("wikipedia-selectnew", 1); 72 | } 73 | 74 | public class MessageConsumer implements Runnable { 75 | private KafkaStream ks; 76 | private StreamHandler sh; 77 | int tid; 78 | private StreamElementSerde seSerde; 79 | 80 | public MessageConsumer(KafkaStream ks, StreamHandler sh, int threadNumber){ 81 | this.ks = ks; 82 | this.sh = sh; 83 | this.tid = threadNumber; 84 | this.seSerde = (StreamElementSerde)(new StreamElementSerdeFactory().getSerde(null, null)); 85 | } 86 | 87 | @Override 88 | public void run() { 89 | ConsumerIterator itr = ks.iterator(); 90 | 91 | while(itr.hasNext()){ 92 | StreamElement se = this.seSerde.fromBytes(itr.next().message()); 93 | sh.handle(se); 94 | } 95 | } 96 | } 97 | 98 | public class StreamHandler { 99 | private String topic; 100 | public StreamHandler(String topic){ 101 | this.topic = topic; 102 | } 103 | private Map elements = new HashMap<>(); 104 | 105 | // public void handle(StreamElement se){ 106 | // if(!se.isDelete()){ 107 | // elements.put(se.getId(), se.getStringField("title")); 108 | // } else { 109 | // String s = elements.remove(se.getId()); 110 | // if(s != null){ 111 | // System.out.println("Deleting item already seen: " + se.getId()); 112 | // } 113 | // } 114 | // } 115 | 116 | public void handle(StreamElement se){ 117 | System.out.println(topic + " - " + se.getIntegerField("diff-bytes") + ": "+ se.getStringField("title")); 118 | } 119 | 120 | } 121 | 122 | 123 | } 124 | -------------------------------------------------------------------------------- /freshet-helpers/src/java/org/pathirage/freshet/helpers/ParseWikipediaActivity.java: -------------------------------------------------------------------------------- 1 | /* 2 | * (C) Copyright 2014 Milinda Pathirage. 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | * 16 | */ 17 | 18 | package org.pathirage.freshet.helpers; 19 | 20 | import java.util.HashMap; 21 | import java.util.Map; 22 | import java.util.regex.Matcher; 23 | import java.util.regex.Pattern; 24 | 25 | public class ParseWikipediaActivity { 26 | public static Map parse(String line) { 27 | System.out.println(line); 28 | Pattern p = Pattern.compile("\\[\\[(.*)\\]\\]\\s(.*)\\s(.*)\\s\\*\\s(.*)\\s\\*\\s\\(\\+?(.\\d*)\\)\\s(.*)"); 29 | Matcher m = p.matcher(line); 30 | 31 | if (m.find() && m.groupCount() == 6) { 32 | String title = m.group(1); 33 | String flags = m.group(2); 34 | String diffUrl = m.group(3); 35 | String user = m.group(4); 36 | int byteDiff = Integer.parseInt(m.group(5)); 37 | String summary = m.group(6); 38 | 39 | Map flagMap = new HashMap(); 40 | 41 | flagMap.put("is-minor", flags.contains("M")); 42 | flagMap.put("is-new", flags.contains("N")); 43 | flagMap.put("is-unpatrolled", flags.contains("!")); 44 | flagMap.put("is-bot-edit", flags.contains("B")); 45 | flagMap.put("is-special", title.startsWith("Special:")); 46 | flagMap.put("is-talk", title.startsWith("Talk:")); 47 | 48 | Map root = new HashMap(); 49 | 50 | root.put("title", title); 51 | root.put("user", user); 52 | root.put("unparsed-flags", flags); 53 | root.put("diff-bytes", byteDiff); 54 | root.put("diff-url", diffUrl); 55 | root.put("summary", summary); 56 | root.put("flags", flagMap); 57 | 58 | return root; 59 | } else { 60 | return null; 61 | } 62 | } 63 | } 64 | -------------------------------------------------------------------------------- /freshet-helpers/wikipedia-actvities-2014-11-04T00:24:19.1.csv: -------------------------------------------------------------------------------- 1 | "#en.wikipedia","rc-pmtpa","1415078669891","04 New England Patriots season","AMLNet49","4","http://en.wikipedia.org/w/index.php?diff=632380887&oldid=632380148","/* ""The Night That Courage Wore Orange"" */","false","false","false","false","false","false","" 2 | "#en.wikipedia","rc-pmtpa","1415078671133","User:BrianGroen","BrianGroen","1169","http://en.wikipedia.org/w/index.php?diff=632380888&oldid=632147595","","false","false","false","false","false","false","" 3 | "#en.wikipedia","rc-pmtpa","1415078672907","Marvin Agustin","01:9:7080:7E3:5918:A762:20A2:FFB5","-43","http://en.wikipedia.org/w/index.php?diff=632380889&oldid=632144887","/* Television */","false","false","false","false","false","false","" 4 | "#en.wikipedia","rc-pmtpa","1415078672954","Greaser (subculture)","Madreterra","98","http://en.wikipedia.org/w/index.php?diff=632380890&oldid=632380645","/* Portrayals in popular culture */","false","false","false","false","false","false","" 5 | "#en.wikipedia","rc-pmtpa","1415078674268","The New Batman Adventures",".227.204.194","15","http://en.wikipedia.org/w/index.php?diff=632380891&oldid=629943503","Antagonists","false","false","false","false","false","false","" 6 | "#en.wikipedia","rc-pmtpa","1415078674450","Powder (hundred)","SoledadKabocha","21","http://en.wikipedia.org/w/index.php?diff=632380893&oldid=419055800","R to list entry","true","false","false","false","false","false","M" 7 | "#en.wikipedia","rc-pmtpa","1415078674599","Athens International Airport",".71.162.176","-2","http://en.wikipedia.org/w/index.php?diff=632380892&oldid=632280102","","false","false","false","false","false","false","" 8 | "#en.wikipedia","rc-pmtpa","1415078674762","User:Viriditas/Psychedelics and ecology","Viriditas","339","http://en.wikipedia.org/w/index.php?diff=632380894&oldid=632380028","/* Background */ add temporary note.","false","false","false","false","false","false","" 9 | "#en.wikipedia","rc-pmtpa","1415078678415","Dhvani Desai","BG19bot","39","http://en.wikipedia.org/w/index.php?diff=632380895&oldid=632374186","/* Heading text */Add empty section tag. Do [[Wikipedia:GENFIXES|general fixes]] if a problem exists. -, added Empty section (1) tag using [[Project:AWB|AWB]] (10480)","true","false","true","false","false","false","MB" 10 | "#en.wikipedia","rc-pmtpa","1415078680099","Draft:Ram Upendra Das","Geetikas","-1","http://en.wikipedia.org/w/index.php?diff=632380896&oldid=632380755","/* = Books */","false","false","false","false","false","false","" 11 | -------------------------------------------------------------------------------- /freshet-job-package/.gitignore: -------------------------------------------------------------------------------- 1 | *.iml 2 | target/ 3 | 4 | ## File-based project format: 5 | *.ipr 6 | *.iws 7 | 8 | .idea/ 9 | -------------------------------------------------------------------------------- /freshet-job-package/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 20 | 22 | 4.0.0 23 | 24 | org.pathirage.freshet 25 | freshet-job-package 26 | 0.1.0-SNAPSHOT 27 | Freshet distribution to use when submitting jobs 28 | jar 29 | 30 | 31 | 32 | clojars.org 33 | http://clojars.org/repo 34 | 35 | 36 | 37 | 38 | 39 | org.apache.samza 40 | samza-shell 41 | 0.7.0 42 | dist 43 | tgz 44 | runtime 45 | 46 | 47 | org.apache.samza 48 | samza-core_2.10 49 | 0.7.0 50 | runtime 51 | 52 | 53 | org.apache.samza 54 | samza-serializers_2.10 55 | 0.7.0 56 | runtime 57 | 58 | 59 | org.apache.samza 60 | samza-yarn_2.10 61 | 0.7.0 62 | runtime 63 | 64 | 65 | asm 66 | asm 67 | 68 | 69 | 70 | 71 | org.apache.samza 72 | samza-kv_2.10 73 | 0.7.0 74 | runtime 75 | 76 | 77 | org.apache.samza 78 | samza-kafka_2.10 79 | 0.7.0 80 | runtime 81 | 82 | 83 | org.apache.kafka 84 | kafka_2.10 85 | 0.8.1 86 | runtime 87 | 88 | 89 | org.slf4j 90 | slf4j-log4j12 91 | 1.6.2 92 | runtime 93 | 94 | 95 | org.apache.hadoop 96 | hadoop-hdfs 97 | 2.2.0 98 | runtime 99 | 100 | 101 | asm 102 | asm 103 | 104 | 105 | 106 | 107 | 108 | 109 | org.codehaus.jackson 110 | jackson-jaxrs 111 | 1.8.5 112 | runtime 113 | 114 | 115 | com.esotericsoftware 116 | kryo 117 | 3.0.0 118 | runtime 119 | 120 | 121 | org.apache.avro 122 | avro 123 | 1.7.7 124 | runtime 125 | 126 | 127 | org.pathirage.freshet 128 | freshet-core 129 | ${freshet.version} 130 | runtime 131 | 132 | 133 | org.pathirage.freshet 134 | freshet-dsl 135 | ${freshet.version} 136 | runtime 137 | 138 | 139 | clojurewerkz 140 | propertied 141 | 1.2.0 142 | 143 | 144 | 145 | 146 | org.clojure 147 | clojure 148 | 1.6.0 149 | 150 | 151 | reply 152 | reply 153 | 0.3.5 154 | 155 | 156 | org.clojure 157 | clojure 158 | 159 | 160 | 161 | 162 | jline 163 | jline 164 | 2.12 165 | 166 | 167 | org.thnetos 168 | cd-client 169 | 0.3.6 170 | 171 | 172 | clj-stacktrace 173 | clj-stacktrace 174 | 0.2.7 175 | 176 | 177 | org.clojure 178 | tools.nrepl 179 | 0.2.6 180 | 181 | 182 | org.clojure 183 | clojure 184 | 185 | 186 | 187 | 188 | org.clojure 189 | tools.cli 190 | 0.3.1 191 | 192 | 193 | com.cemerick 194 | drawbridge 195 | 0.0.6 196 | 197 | 198 | org.clojure 199 | tools.nrepl 200 | 201 | 202 | 203 | 204 | trptcolin 205 | versioneer 206 | 0.1.1 207 | 208 | 209 | clojure-complete 210 | clojure-complete 211 | 0.2.3 212 | 213 | 214 | org.clojure 215 | clojure 216 | 217 | 218 | 219 | 220 | net.cgrand 221 | sjacket 222 | 0.1.1 223 | 224 | 225 | org.clojure 226 | clojure 227 | 228 | 229 | 230 | 231 | commons-codec 232 | commons-codec 233 | 1.4 234 | 235 | 236 | 237 | 238 | 239 | Apache License 2.0 240 | http://www.apache.org/licenses/LICENSE-2.0.html 241 | repo 242 | 243 | 244 | 245 | 246 | 0.1.0-SNAPSHOT 247 | 248 | 249 | 250 | 251 | 252 | 253 | maven-assembly-plugin 254 | 2.3 255 | 256 | 257 | src/main/assembly/src.xml 258 | 259 | 260 | 261 | 262 | make-assembly 263 | package 264 | 265 | single 266 | 267 | 268 | 269 | 270 | 271 | 272 | 273 | -------------------------------------------------------------------------------- /freshet-job-package/src/main/assembly/src.xml: -------------------------------------------------------------------------------- 1 | 2 | 12 | 13 | 17 | dist 18 | 19 | tar.gz 20 | 21 | false 22 | 23 | 24 | ${basedir}/.. 25 | 26 | README* 27 | LICENSE* 28 | NOTICE* 29 | 30 | 31 | 32 | 33 | 34 | ${basedir}/src/main/resources/log4j.xml 35 | lib 36 | 37 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | bin 58 | 59 | org.apache.samza:samza-shell:tgz:dist:* 60 | 61 | 0744 62 | true 63 | 64 | 65 | lib 66 | 67 | org.apache.samza:samza-core_2.10 68 | org.apache.samza:samza-kafka_2.10 69 | org.apache.samza:samza-serializers_2.10 70 | org.apache.samza:samza-yarn_2.10 71 | org.apache.samza:samza-kv_2.10 72 | org.slf4j:slf4j-log4j12 73 | org.apache.kafka:kafka_2.10 74 | org.apache.hadoop:hadoop-hdfs 75 | org.pathirage.freshet:freshet-core 76 | org.pathirage.freshet:freshet-dsl 77 | com.esotericsoftware:kryo 78 | org.apache.avro:avro 79 | org.pathirage.freshet:freshet-core 80 | org.pathirage.freshet:freshet-dsl 81 | org.codehaus.jackson:jackson-jaxrs 82 | org.schwering:irclib 83 | clojurewerkz:propertied 84 | org.apache.hadoop:hadoop-yarn-client 85 | org.apache.hadoop:hadoop-yarn-common 86 | org.apache.hadoop:hadoop-common 87 | jline:jline 88 | org.thnetos:cd-client 89 | clj-stacktrace:clj-stacktrace 90 | org.clojure:tools.nrepl 91 | org.clojure:tools.cli 92 | com.cemerick:drawbridge 93 | org.clojure:clojure 94 | trptcolin:versioneer 95 | clojure-complete:clojure-complete 96 | net.cgrand:sjacket 97 | reply:reply 98 | commons-codec:commons-codec 99 | 100 | true 101 | 102 | 103 | 104 | -------------------------------------------------------------------------------- /freshet-job-package/src/main/resources/log4j.xml: -------------------------------------------------------------------------------- 1 | 2 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | -------------------------------------------------------------------------------- /freshet-shell/bin/fshell: -------------------------------------------------------------------------------- 1 | #!/bin/bash -e 2 | 3 | HOME_DIR=`pwd` 4 | BASE_DIR=$(dirname $0)/.. 5 | 6 | cd $BASE_DIR 7 | BASE_DIR=`pwd` 8 | cd $HOME_DIR 9 | 10 | export FRESHET_HOME=$BASE_DIR 11 | HADOOP_YARN_HOME="${HADOOP_YARN_HOME:-$base_dir/deploy/yarn}" 12 | HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-$HADOOP_YARN_HOME/etc/hadoop}" 13 | CP=$HADOOP_CONF_DIR 14 | DEFAULT_LOG4J_FILE=$base_dir/deploy/freshet/lib/log4j.xml 15 | 16 | if [ -z "$USER_CP" ]; then 17 | USER_CP="" 18 | fi 19 | 20 | # We don't need below in classpath at this stage. Every dependency is bundled into deploy/freshet/lib directory. 21 | 22 | #CP="$BASEDIR"/../src/clj/:\ 23 | #"$BASEDIR"/../classes/:\ 24 | #"$BASEDIR"/../target/classes/ 25 | 26 | for j in "$BASE_DIR"/deploy/freshet/lib/*.jar; do 27 | CP=$CP:$j 28 | done 29 | 30 | java -Dfile.encoding=UTF-8 -cp "$USER_CP":"$CP" reply.ReplyMain "$@" 31 | 32 | # for debugging: 33 | # java -Xdebug -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=n \ 34 | # -Djline.internal.Log.debug=true \ 35 | # -Dfile.encoding=UTF-8 -cp $CP reply.ReplyMain "$@" 36 | 37 | 38 | -------------------------------------------------------------------------------- /freshet-shell/bin/grid: -------------------------------------------------------------------------------- 1 | #!/bin/bash -e 2 | # Licensed to the Apache Software Foundation (ASF) under one 3 | # or more contributor license agreements. See the NOTICE file 4 | # distributed with this work for additional information 5 | # regarding copyright ownership. The ASF licenses this file 6 | # to you under the Apache License, Version 2.0 (the 7 | # "License"); you may not use this file except in compliance 8 | # with the License. You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, 13 | # software distributed under the License is distributed on an 14 | # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | # KIND, either express or implied. See the License for the 16 | # specific language governing permissions and limitations 17 | # under the License. 18 | 19 | # This script will download, setup, start, and stop servers for Kafka, YARN, and ZooKeeper, 20 | # as well as downloading, building and locally publishing Samza 21 | 22 | if [ -z "$JAVA_HOME" ]; then 23 | if [ -x /usr/libexec/java_home ]; then 24 | export JAVA_HOME="$(/usr/libexec/java_home)" 25 | else 26 | echo "JAVA_HOME not set. Exiting." 27 | exit 1 28 | fi 29 | fi 30 | 31 | DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" 32 | BASE_DIR=$(dirname $DIR) 33 | DEPLOY_ROOT_DIR=$BASE_DIR/deploy 34 | DOWNLOAD_CACHE_DIR=$HOME/.samza/download 35 | COMMAND=$1 36 | SYSTEM=$2 37 | 38 | DOWNLOAD_KAFKA=http://www.us.apache.org/dist/kafka/0.8.1.1/kafka_2.9.2-0.8.1.1.tgz 39 | DOWNLOAD_YARN=https://archive.apache.org/dist/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz 40 | DOWNLOAD_ZOOKEEPER=http://archive.apache.org/dist/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.tar.gz 41 | 42 | bootstrap() { 43 | echo "Bootstrapping the system..." 44 | setup_passphraseless_ssh 45 | stop_all 46 | rm -rf "$DEPLOY_ROOT_DIR" 47 | mkdir "$DEPLOY_ROOT_DIR" 48 | install_all 49 | start_all 50 | exit 0 51 | } 52 | 53 | setup_passphraseless_ssh() { 54 | if [ ! -f "$HOME/.ssh/id_dsa" ]; then 55 | ssh-keygen -t dsa -P '' -f $HOME/.ssh/id_dsa 56 | cat $HOME/.ssh/id_dsa.pub >> $HOME/.ssh/authorized_keys 57 | else 58 | cat $HOME/.ssh/id_dsa.pub >> $HOME/.ssh/authorized_keys 59 | fi 60 | } 61 | 62 | install_all() { 63 | $DIR/grid install zookeeper 64 | $DIR/grid install yarn 65 | $DIR/grid install kafka 66 | } 67 | 68 | install_zookeeper() { 69 | mkdir -p "$DEPLOY_ROOT_DIR" 70 | install zookeeper $DOWNLOAD_ZOOKEEPER zookeeper-3.4.3 71 | cp "$DEPLOY_ROOT_DIR/zookeeper/conf/zoo_sample.cfg" "$DEPLOY_ROOT_DIR/zookeeper/conf/zoo.cfg" 72 | } 73 | 74 | install_yarn() { 75 | mkdir -p "$DEPLOY_ROOT_DIR" 76 | install yarn $DOWNLOAD_YARN hadoop-2.2.0 77 | cp "$BASE_DIR/conf/yarn-site.xml" "$DEPLOY_ROOT_DIR/yarn/etc/hadoop/yarn-site.xml" 78 | cp "$BASE_DIR/conf/core-site.xml" "$DEPLOY_ROOT_DIR/yarn/etc/hadoop/core-site.xml" 79 | cp "$BASE_DIR/conf/hdfs-site.xml" "$DEPLOY_ROOT_DIR/yarn/etc/hadoop/hdfs-site.xml" 80 | 81 | if [ ! -f "$HOME/.samza/conf/yarn-site.xml" ]; then 82 | mkdir -p "$HOME/.samza/conf" 83 | cp "$BASE_DIR/conf/yarn-site.xml" "$HOME/.samza/conf/yarn-site.xml" 84 | fi 85 | 86 | if [ ! -f "$HOME/.samza/conf/core-site.xml" ]; then 87 | mkdir -p "$HOME/.samza/conf" 88 | cp "$BASE_DIR/conf/core-site.xml" "$HOME/.samza/conf/core-site.xml" 89 | fi 90 | if [ ! -f "$HOME/.samza/conf/hdfs-site.xml" ]; then 91 | mkdir -p "$HOME/.samza/conf" 92 | cp "$BASE_DIR/conf/hdfs-site.xml" "$HOME/.samza/conf/hdfs-site.xml" 93 | fi 94 | } 95 | 96 | install_kafka() { 97 | mkdir -p "$DEPLOY_ROOT_DIR" 98 | install kafka $DOWNLOAD_KAFKA kafka_2.9.2-0.8.1.1 99 | # have to use SIGTERM since nohup on appears to ignore SIGINT 100 | # and Kafka switched to SIGINT in KAFKA-1031. 101 | sed -i.bak 's/SIGINT/SIGTERM/g' $DEPLOY_ROOT_DIR/kafka/bin/kafka-server-stop.sh 102 | # in order to simplify the wikipedia-stats example job, set topic to have just 1 partition by default 103 | sed -i.bak 's/^num\.partitions *=.*/num.partitions=1/' $DEPLOY_ROOT_DIR/kafka/config/server.properties 104 | } 105 | 106 | install() { 107 | DESTINATION_DIR="$DEPLOY_ROOT_DIR/$1" 108 | DOWNLOAD_URL=$2 109 | PACKAGE_DIR="$DOWNLOAD_CACHE_DIR/$3" 110 | PACKAGE_FILE="$DOWNLOAD_CACHE_DIR/$(basename $DOWNLOAD_URL)" 111 | if [ -f "$PACKAGE_FILE" ]; then 112 | echo "Using previously downloaded file $PACKAGE_FILE" 113 | else 114 | echo "Downloading $(basename $DOWNLOAD_URL)..." 115 | mkdir -p $DOWNLOAD_CACHE_DIR 116 | curl "$DOWNLOAD_URL" > "${PACKAGE_FILE}.tmp" 117 | mv "${PACKAGE_FILE}.tmp" "$PACKAGE_FILE" 118 | fi 119 | rm -rf "$DESTINATION_DIR" "$PACKAGE_DIR" 120 | tar -xf "$PACKAGE_FILE" -C $DOWNLOAD_CACHE_DIR 121 | mv "$PACKAGE_DIR" "$DESTINATION_DIR" 122 | } 123 | 124 | start_all() { 125 | $DIR/grid start zookeeper 126 | $DIR/grid start yarn 127 | $DIR/grid start kafka 128 | } 129 | 130 | start_zookeeper() { 131 | if [ -f $DEPLOY_ROOT_DIR/$SYSTEM/bin/zkServer.sh ]; then 132 | cd $DEPLOY_ROOT_DIR/$SYSTEM 133 | bin/zkServer.sh start 134 | cd - > /dev/null 135 | else 136 | echo 'Zookeeper is not installed. Run: bin/grid install zookeeper' 137 | fi 138 | } 139 | 140 | start_yarn() { 141 | rm -rf /tmp/hadoop/data/name 142 | rm -rf /tmp/hadoop/data/data 143 | mkdir -p /tmp/hadoop/data/name 144 | mkdir -p /tmp/hadoop/data/data 145 | if [ -f $DEPLOY_ROOT_DIR/$SYSTEM/bin/hdfs ]; then 146 | $DEPLOY_ROOT_DIR/$SYSTEM/bin/hdfs namenode -format 147 | else 148 | echo 'YARN is not installed. Run: bin/grid install yarn' 149 | fi 150 | 151 | if [ -f $DEPLOY_ROOT_DIR/$SYSTEM/sbin/start-dfs.sh ]; then 152 | $DEPLOY_ROOT_DIR/$SYSTEM/sbin/start-dfs.sh 153 | else 154 | echo 'YARN is not installed. Run: bin/grid install yarn' 155 | fi 156 | 157 | if [ -f $DEPLOY_ROOT_DIR/$SYSTEM/sbin/yarn-daemon.sh ]; then 158 | $DEPLOY_ROOT_DIR/$SYSTEM/sbin/yarn-daemon.sh start resourcemanager 159 | $DEPLOY_ROOT_DIR/$SYSTEM/sbin/yarn-daemon.sh start nodemanager 160 | else 161 | echo 'YARN is not installed. Run: bin/grid install yarn' 162 | fi 163 | } 164 | 165 | 166 | 167 | start_kafka() { 168 | if [ -f $DEPLOY_ROOT_DIR/$SYSTEM/bin/kafka-server-start.sh ]; then 169 | mkdir -p $DEPLOY_ROOT_DIR/$SYSTEM/logs 170 | cd $DEPLOY_ROOT_DIR/$SYSTEM 171 | nohup bin/kafka-server-start.sh config/server.properties > logs/kafka.log 2>&1 & 172 | cd - > /dev/null 173 | else 174 | echo 'Kafka is not installed. Run: bin/grid install kafka' 175 | fi 176 | } 177 | 178 | stop_all() { 179 | $DIR/grid stop kafka 180 | $DIR/grid stop yarn 181 | $DIR/grid stop zookeeper 182 | } 183 | 184 | stop_zookeeper() { 185 | if [ -f $DEPLOY_ROOT_DIR/$SYSTEM/bin/zkServer.sh ]; then 186 | cd $DEPLOY_ROOT_DIR/$SYSTEM 187 | bin/zkServer.sh stop 188 | cd - > /dev/null 189 | else 190 | echo 'Zookeeper is not installed. Run: bin/grid install zookeeper' 191 | fi 192 | } 193 | 194 | stop_hdfs() { 195 | $DEPLOY_ROOT_DIR/$SYSTEM/sbin/stop-dfs.sh 196 | rm -rf /tmp/hadoop/data/name 197 | rm -rf /tmp/hadoop/data/data 198 | } 199 | 200 | stop_yarn() { 201 | if [ -f $DEPLOY_ROOT_DIR/$SYSTEM/sbin/yarn-daemon.sh ]; then 202 | $DEPLOY_ROOT_DIR/$SYSTEM/sbin/yarn-daemon.sh stop resourcemanager 203 | $DEPLOY_ROOT_DIR/$SYSTEM/sbin/yarn-daemon.sh stop nodemanager 204 | else 205 | echo 'YARN is not installed. Run: bin/grid install yarn' 206 | fi 207 | 208 | if [ -f $DEPLOY_ROOT_DIR/$SYSTEM/sbin/stop-dfs.sh ]; then 209 | $DEPLOY_ROOT_DIR/$SYSTEM/sbin/stop-dfs.sh 210 | else 211 | echo 'YARN is not installed. Run: bin/grid install yarn' 212 | fi 213 | } 214 | 215 | stop_kafka() { 216 | if [ -f $DEPLOY_ROOT_DIR/$SYSTEM/bin/kafka-server-stop.sh ]; then 217 | cd $DEPLOY_ROOT_DIR/$SYSTEM 218 | bin/kafka-server-stop.sh || true # tolerate nonzero exit status if Kafka isn't running 219 | cd - > /dev/null 220 | else 221 | echo 'Kafka is not installed. Run: bin/grid install kafka' 222 | fi 223 | } 224 | 225 | # Check arguments 226 | if [ "$COMMAND" == "bootstrap" ] && test -z "$SYSTEM"; then 227 | bootstrap 228 | exit 0 229 | elif (test -z "$COMMAND" && test -z "$SYSTEM") \ 230 | || ( [ "$COMMAND" == "help" ] || test -z "$COMMAND" || test -z "$SYSTEM"); then 231 | echo 232 | echo " Usage.." 233 | echo 234 | echo " $ grid" 235 | echo " $ grid bootstrap" 236 | echo " $ grid install [yarn|kafka|zookeeper|all]" 237 | echo " $ grid start [hdfs|yarn|kafka|zookeeper|all]" 238 | echo " $ grid stop [hdfs|yarn|kafka|zookeeper|all]" 239 | echo 240 | exit 1 241 | else 242 | echo "EXECUTING: $COMMAND $SYSTEM" 243 | 244 | "$COMMAND"_"$SYSTEM" 245 | fi -------------------------------------------------------------------------------- /freshet-shell/bin/log4j-console.xml: -------------------------------------------------------------------------------- 1 | 2 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | -------------------------------------------------------------------------------- /freshet-shell/bin/run-class.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Licensed to the Apache Software Foundation (ASF) under one 3 | # or more contributor license agreements. See the NOTICE file 4 | # distributed with this work for additional information 5 | # regarding copyright ownership. The ASF licenses this file 6 | # to you under the Apache License, Version 2.0 (the 7 | # "License"); you may not use this file except in compliance 8 | # with the License. You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, 13 | # software distributed under the License is distributed on an 14 | # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | # KIND, either express or implied. See the License for the 16 | # specific language governing permissions and limitations 17 | # under the License. 18 | 19 | if [ $# -lt 1 ]; 20 | then 21 | echo "USAGE: $0 classname [opts]" 22 | exit 1 23 | fi 24 | 25 | home_dir=`pwd` 26 | base_dir=$(dirname $0)/.. 27 | cd $base_dir 28 | base_dir=`pwd` 29 | cd $home_dir 30 | 31 | if [ ! -d "$base_dir/lib" ]; then 32 | echo "Unable to find $base_dir/lib, which is required to run." 33 | exit 1 34 | fi 35 | 36 | HADOOP_YARN_HOME="${HADOOP_YARN_HOME:-$base_dir/deploy/yarn}" 37 | HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-$HADOOP_YARN_HOME/etc/hadoop}" 38 | CLASSPATH=$HADOOP_CONF_DIR 39 | GC_LOG_ROTATION_OPTS="-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10241024" 40 | DEFAULT_LOG4J_FILE=$base_dir/lib/log4j.xml 41 | 42 | for file in $base_dir/lib/*.[jw]ar; 43 | do 44 | CLASSPATH=$CLASSPATH:$file 45 | done 46 | 47 | if [ -z "$JAVA_HOME" ]; then 48 | JAVA="java" 49 | else 50 | JAVA="$JAVA_HOME/bin/java" 51 | fi 52 | 53 | if [ -z "$SAMZA_LOG_DIR" ]; then 54 | SAMZA_LOG_DIR="$base_dir" 55 | fi 56 | 57 | # add usercache directory 58 | mkdir -p $base_dir/tmp 59 | JAVA_TEMP_DIR=$base_dir/tmp 60 | 61 | # Check whether the JVM supports GC Log rotation, and enable it if so. 62 | function check_and_enable_gc_log_rotation { 63 | `$JAVA -Xloggc:/dev/null $GC_LOG_ROTATION_OPTS -version 2> /dev/null` 64 | if [ $? -eq 0 ] ; then 65 | JAVA_OPTS="$JAVA_OPTS $GC_LOG_ROTATION_OPTS" 66 | fi 67 | } 68 | 69 | # Try and use 64-bit mode if available in JVM_OPTS 70 | function check_and_enable_64_bit_mode { 71 | `$JAVA -d64 -version` 72 | if [ $? -eq 0 ] ; then 73 | JAVA_OPTS="$JAVA_OPTS -d64" 74 | fi 75 | } 76 | 77 | ### Inherit JVM_OPTS from task.opts configuration, and initialize defaults ### 78 | 79 | # Check if log4j configuration is specified. If not - set to lib/log4j.xml 80 | [[ $JAVA_OPTS != *-Dlog4j.configuration* && -f $DEFAULT_LOG4J_FILE ]] && JAVA_OPTS="$JAVA_OPTS -Dlog4j.configuration=file:$DEFAULT_LOG4J_FILE" 81 | 82 | # Check if samza.log.dir is specified. If not - set to environment variable if it is set 83 | [[ $JAVA_OPTS != *-Dsamza.log.dir* && ! -z "$SAMZA_LOG_DIR" ]] && JAVA_OPTS="$JAVA_OPTS -Dsamza.log.dir=$SAMZA_LOG_DIR" 84 | 85 | # Check if java.io.tmpdir is specified. If not - set to tmp in the base_dir 86 | [[ $JAVA_OPTS != *-Djava.io.tmpdir* ]] && JAVA_OPTS="$JAVA_OPTS -Djava.io.tmpdir=$JAVA_TEMP_DIR" 87 | 88 | # Check if a max-heap size is specified. If not - set a 768M heap 89 | [[ $JAVA_OPTS != *-Xmx* ]] && JAVA_OPTS="$JAVA_OPTS -Xmx768M" 90 | 91 | # Check if the GC related flags are specified. If not - add the respective flags to JVM_OPTS. 92 | [[ $JAVA_OPTS != *PrintGCDateStamps* && $JAVA_OPTS != *-Xloggc* ]] && JAVA_OPTS="$JAVA_OPTS -XX:+PrintGCDateStamps -Xloggc:$SAMZA_LOG_DIR/gc.log" 93 | 94 | # Check if GC log rotation is already enabled. If not - add the respective flags to JVM_OPTS 95 | [[ $JAVA_OPTS != *UseGCLogFileRotation* ]] && check_and_enable_gc_log_rotation 96 | 97 | # Check if 64 bit is set. If not - try and set it if it's supported 98 | [[ $JAVA_OPTS != *-d64* ]] && check_and_enable_64_bit_mode 99 | 100 | echo $JAVA $JAVA_OPTS -cp $CLASSPATH $@ 101 | exec $JAVA $JAVA_OPTS -cp $CLASSPATH $@ -------------------------------------------------------------------------------- /freshet-shell/bin/run-job.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Licensed to the Apache Software Foundation (ASF) under one 3 | # or more contributor license agreements. See the NOTICE file 4 | # distributed with this work for additional information 5 | # regarding copyright ownership. The ASF licenses this file 6 | # to you under the Apache License, Version 2.0 (the 7 | # "License"); you may not use this file except in compliance 8 | # with the License. You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, 13 | # software distributed under the License is distributed on an 14 | # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | # KIND, either express or implied. See the License for the 16 | # specific language governing permissions and limitations 17 | # under the License. 18 | 19 | [[ $JAVA_OPTS != *-Dlog4j.configuration* ]] && export JAVA_OPTS="$JAVA_OPTS -Dlog4j.configuration=file:$(dirname $0)/log4j-console.xml" 20 | 21 | exec $(dirname $0)/run-class.sh org.apache.samza.job.JobRunner $@ -------------------------------------------------------------------------------- /freshet-shell/bin/setup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | freshet_version=0.1.0-SNAPSHOT 4 | 5 | home_dir=`pwd` 6 | 7 | 8 | base_dir=$(dirname $0)/.. 9 | 10 | 11 | cd $base_dir 12 | base_dir=`pwd` 13 | cd $home_dir 14 | 15 | username=$(whoami) 16 | COMMAND=$1 17 | 18 | freshet_job_package_parent="$(dirname $(readlink -e $base_dir/../freshet-job-package))/$(basename $base_dir/../freshet-job-package)" 19 | freshet_job_package="$freshet_job_package_parent/target/freshet-job-package-$freshet_version-dist.tar.gz" 20 | 21 | echo "Parent directory of Freshet Job Package: $freshet_job_package_parent" 22 | echo "Freshet Job Package Path: $freshet_job_package" 23 | 24 | function install() { 25 | 26 | # Setting up YARN, Kafka and Zookeeper 27 | $base_dir/bin/grid bootstrap 28 | 29 | if [ ! -f "$freshet_job_package" ]; then 30 | cd $freshet_job_package_parent 31 | mvn clean package 32 | fi 33 | 34 | if [ -f "$freshet_job_package" ]; then 35 | # Extracting Job Package to 'deploy/freshet' 36 | mkdir -p $base_dir/deploy/freshet 37 | tar xvf $freshet_job_package -C $base_dir/deploy/freshet 38 | 39 | # Upload the Freshet Job Package to HDFS 40 | #$base_dir/deploy/yarn/bin/hdfs dfs -mkdir /user 41 | #$base_dir/deploy/yarn/bin/hdfs dfs -mkdir /user/$username 42 | $base_dir/deploy/yarn/bin/hdfs dfs -mkdir /freshet 43 | 44 | # TODO: Do this programmatically at shell startup 45 | $base_dir/deploy/yarn/bin/hdfs dfs -put $freshet_job_package /freshet 46 | 47 | mkdir -p $base_dir/deploy/freshet/conf 48 | 49 | echo "freshet.yarn.package.path=hdfs://localhost:9000/freshet/$(basename $freshet_job_package)" > $base_dir/deploy/freshet/conf/freshet.conf 50 | echo "freshet.kafka.zookeeper.connect=localhost:2181/" >> $base_dir/deploy/freshet/conf/freshet.conf 51 | echo "freshet.kafka.broker.list=localhost:9092" >> $base_dir/deploy/freshet/conf/freshet.conf 52 | else 53 | echo "Cannot find $freshet_job_package. Looks like freshet-job-package build failed." 54 | fi 55 | } 56 | 57 | if [ "$COMMAND" == "local" ]; then 58 | install 59 | exit 0 60 | else 61 | echo "Unknown command: $COMMAND" 62 | fi 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | -------------------------------------------------------------------------------- /freshet-shell/conf/core-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | fs.defaultFS 4 | hdfs://localhost:9000 5 | 6 | -------------------------------------------------------------------------------- /freshet-shell/conf/freshet.conf: -------------------------------------------------------------------------------- 1 | yarn.package.path=hdfs://localhost:9000/freshet/freshet-job-package-0.1.0-SNAPSHOT-dist.tar.gz 2 | -------------------------------------------------------------------------------- /freshet-shell/conf/hdfs-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | dfs.replication 4 | 1 5 | 6 | 7 | dfs.namenode.name.dir 8 | /tmp/hadoop/data/name 9 | 10 | 11 | dfs.datanode.data.dir 12 | /tmp/hadoop/data/data 13 | 14 | -------------------------------------------------------------------------------- /freshet-shell/conf/yarn-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 20 | 21 | 22 | yarn.resourcemanager.scheduler.class 23 | org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler 24 | 25 | 26 | yarn.nodemanager.vmem-pmem-ratio 27 | 10 28 | 29 | 30 | yarn.resourcemanager.hostname 31 | 127.0.0.1 32 | 33 | -------------------------------------------------------------------------------- /freshet-shell/doc/intro.md: -------------------------------------------------------------------------------- 1 | # Introduction to freshet-shell 2 | 3 | TODO: write [great documentation](http://jacobian.org/writing/what-to-write/) 4 | -------------------------------------------------------------------------------- /freshet-shell/doc/shell-design.md: -------------------------------------------------------------------------------- 1 | # Freshet Shell Design 2 | 3 | ## Workflow 4 | 5 | ### Single Node YARN 6 | 7 | - User clone/download Freshet code 8 | - Run 'setup.sh local' inside freshet-shell/bin. 'setup.sh local' takes care of running 'grid bootstrap' from freshet-shell/bin. 9 | - Run fshell inside freshet-shell/bin and start writing queries 10 | 11 | 12 | ### Multiple Node YARN 13 | 14 | - User clone/download Freshet code 15 | - Copy yarn-site.xml, core-site.xml and hdfs-site.xml to freshet-shell/conf 16 | - Run 'setup.sh remote' insie freshet-shell/bin 17 | - Run fshell inside freshet-shell/bin and start writing queries 18 | -------------------------------------------------------------------------------- /freshet-shell/project.clj: -------------------------------------------------------------------------------- 1 | (defproject org.pathirage.freshet/freshet-shell "0.1.0-SNAPSHOT" 2 | :description "Freshet Shell: REPL for querying and interacting with streams using Freshet." 3 | :url "http://github.com/milinda/Freshet" 4 | :license {:name "Apache License, Version 2.0" 5 | :url "http://www.apache.org/licenses/LICENSE-2.0.html"} 6 | :dependencies [[org.clojure/clojure "1.6.0"] 7 | [org.pathirage.freshet/freshet-core "0.1.0-SNAPSHOT"] 8 | [org.pathirage.freshet/freshet-dsl "0.1.0-SNAPSHOT"] 9 | [reply "0.3.5" :exclusions [org.clojure/clojure]]] 10 | :source-paths ["src/clojure"] 11 | :java-source-paths ["src/java"] 12 | :test-paths ["test/clojure" "test/java"]) 13 | -------------------------------------------------------------------------------- /references/art%3A10.1007%2Fs002360050095.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milinda/Freshet-Old/3f387e55f3fe62cc9dd4adc8287abdbecf292991/references/art%3A10.1007%2Fs002360050095.pdf -------------------------------------------------------------------------------- /references/atc14-paper-hu.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milinda/Freshet-Old/3f387e55f3fe62cc9dd4adc8287abdbecf292991/references/atc14-paper-hu.pdf -------------------------------------------------------------------------------- /references/bockermann_2014b.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milinda/Freshet-Old/3f387e55f3fe62cc9dd4adc8287abdbecf292991/references/bockermann_2014b.pdf -------------------------------------------------------------------------------- /references/paper_199.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milinda/Freshet-Old/3f387e55f3fe62cc9dd4adc8287abdbecf292991/references/paper_199.pdf -------------------------------------------------------------------------------- /references/rc25401.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milinda/Freshet-Old/3f387e55f3fe62cc9dd4adc8287abdbecf292991/references/rc25401.pdf -------------------------------------------------------------------------------- /references/sacmat68-xing.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milinda/Freshet-Old/3f387e55f3fe62cc9dd4adc8287abdbecf292991/references/sacmat68-xing.pdf -------------------------------------------------------------------------------- /references/secret_vldbj13.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milinda/Freshet-Old/3f387e55f3fe62cc9dd4adc8287abdbecf292991/references/secret_vldbj13.pdf --------------------------------------------------------------------------------