├── .gitattributes ├── .github └── FUNDING.yml ├── .gitignore ├── README.md ├── build.gradle ├── collect-stream-logs ├── README.md ├── collect-stream-logs-flow.xml ├── dashboard │ ├── draw.html │ ├── log.html │ ├── reqscanvas.js │ └── testEB.html ├── data │ ├── in │ │ └── example.log │ └── out │ │ └── .gitkeep ├── log-generator │ ├── build.gradle │ └── src │ │ └── main │ │ ├── groovy │ │ └── com.crossbusiness.loggen │ │ │ ├── AccessLogGenerator.groovy │ │ │ └── AppLogGenerator.groovy │ │ ├── java │ │ └── .gitkeep │ │ └── resources │ │ ├── example.log │ │ ├── logback-access.xml │ │ └── logback.xml ├── logs-demo.png └── logs-flow.png ├── csv-to-json ├── README.md └── csv-to-json-flow.xml ├── decompression ├── README.md └── decompression-circular-flow.xml ├── gradle.properties ├── gradle └── wrapper │ ├── gradle-wrapper.jar │ └── gradle-wrapper.properties ├── gradlew ├── gradlew.bat ├── http-get-route ├── README.md └── simple-httpget-route-flow.xml ├── invoke-http-route ├── README.md └── invokeHttp-and-route-original-on-status-flow.xml ├── iot-activity-tracker ├── README.md ├── dashboard │ └── heartrate.html ├── iot-demo.png ├── iot-flow.png └── iot-flow.xml ├── oltp-cdc-olap ├── Dockerfile ├── README.md ├── cdc-architecture.jpg ├── cdc-flow.png ├── cdc-flow.xml ├── kafka │ ├── README.md │ ├── kafka │ │ └── .gitignore │ ├── server.properties │ ├── zookeeper.properties │ └── zookeeper │ │ └── .gitignore ├── maxwell │ ├── .gitignore │ └── config.properties └── mysql │ ├── README.md │ ├── data │ └── .gitignore │ └── my.cnf ├── retry ├── README.md └── retry-count-loop.xml ├── settings.gradle ├── split-route ├── README.md ├── data │ ├── in │ │ └── sample-input.txt │ └── out │ │ └── .gitkeep └── split-route-merge-flow.xml ├── twitter-garden-hose ├── README.md └── pull-from-twitter-garden-hose-flow.xml └── twitter-solr ├── README.md └── twitter-solr-flow.xml /.gitattributes: -------------------------------------------------------------------------------- 1 | * text=auto 2 | CONTRIBUTING.md export-ignore -------------------------------------------------------------------------------- /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | # These are supported funding model platforms 2 | 3 | github: [xmlking] 4 | open_collective: xmlking 5 | 6 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Created by https://www.gitignore.io/api/gradle,osx,windows 2 | 3 | ## App ## 4 | logs 5 | *.log 6 | NOTES.md 7 | 8 | ### Gradle ### 9 | .gradle 10 | build/ 11 | 12 | # Ignore Gradle GUI config 13 | gradle-app.setting 14 | 15 | # Avoid ignoring Gradle wrapper jar file (.jar files are usually ignored) 16 | !gradle-wrapper.jar 17 | 18 | 19 | ### Intellij ### 20 | # Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio 21 | 22 | *.iml 23 | 24 | ## Directory-based project format: 25 | .idea/ 26 | # if you remove the above rule, at least ignore the following: 27 | 28 | # User-specific stuff: 29 | # .idea/workspace.xml 30 | # .idea/tasks.xml 31 | # .idea/dictionaries 32 | 33 | # Sensitive or high-churn files: 34 | # .idea/dataSources.ids 35 | # .idea/dataSources.xml 36 | # .idea/sqlDataSources.xml 37 | # .idea/dynamic.xml 38 | # .idea/uiDesigner.xml 39 | 40 | # Gradle: 41 | # .idea/gradle.xml 42 | # .idea/libraries 43 | 44 | # Mongo Explorer plugin: 45 | # .idea/mongoSettings.xml 46 | 47 | ## File-based project format: 48 | *.ipr 49 | *.iws 50 | 51 | ## Plugin-specific files: 52 | 53 | # IntelliJ 54 | /out/ 55 | 56 | # mpeltonen/sbt-idea plugin 57 | .idea_modules/ 58 | 59 | # JIRA plugin 60 | atlassian-ide-plugin.xml 61 | 62 | # Crashlytics plugin (for Android Studio and IntelliJ) 63 | com_crashlytics_export_strings.xml 64 | crashlytics.properties 65 | crashlytics-build.properties 66 | 67 | 68 | ### OSX ### 69 | .DS_Store 70 | .AppleDouble 71 | .LSOverride 72 | 73 | # Icon must end with two \r 74 | Icon 75 | 76 | 77 | # Thumbnails 78 | ._* 79 | 80 | # Files that might appear in the root of a volume 81 | .DocumentRevisions-V100 82 | .fseventsd 83 | .Spotlight-V100 84 | .TemporaryItems 85 | .Trashes 86 | .VolumeIcon.icns 87 | 88 | # Directories potentially created on remote AFP share 89 | .AppleDB 90 | .AppleDesktop 91 | Network Trash Folder 92 | Temporary Items 93 | .apdisk 94 | 95 | 96 | ### Windows ### 97 | # Windows image file caches 98 | Thumbs.db 99 | ehthumbs.db 100 | 101 | # Folder config file 102 | Desktop.ini 103 | 104 | # Recycle Bin used on file shares 105 | $RECYCLE.BIN/ 106 | 107 | # Windows Installer files 108 | *.cab 109 | *.msi 110 | *.msm 111 | *.msp 112 | 113 | # Windows shortcuts 114 | *.lnk -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | NiFi Examples 2 | ================= 3 | 4 | Apache NiFi example flows. 5 | 6 | #### collect-stream-logs 7 | 8 | This [flow](./collect-stream-logs/) shows workflow for log collection, aggregation, store and display. 9 | 10 | 1. Ingest logs from folders. 11 | 2. Listen for syslogs on UDP port. 12 | 3. Merge syslogs and drop-in logs and persist merged logs to Solr for historical search. 13 | 4. Dashboard: stream real-time log events to dashboard and enable cross-filter search on historical logs data. 14 | 15 | 16 | #### iot-activity-tracker 17 | 18 | This [flow](./iot-activity-tracker/) shows how to bring IoT data into Enterprise. 19 | 20 | 1. Ingest IoT data over WebSocket and HTTP 21 | 3. Store all data to Hadoop(HDFS) and summary data to NoSQL(MarkLogic) for historical data search. 22 | 4. Route data based on pre-set thresholds (vital signs like `pulse rate` and `blood pressure`) to alert users and physicians. 23 | 5. Inactivity Reporting 24 | 25 | 26 | #### oltp-to-olap 27 | 28 | A low latency *Change Data Capture* [flow](./oltp-cdc-olap/) to continuously replicate data from OLTP(MySQL) to OLAP(NoSQL) systems with no impact to the source. 29 | 30 | 1. Multi-tenant: can contain data from many different databases, support multiple consumers. 31 | 2. Flexible CDC: Capture changes from many data sources and types. 32 | 1. Source consistency preservation. No impact to the source. 33 | 2. Both DML (INSERT/UPDATE/DELETE) and DDL (ALTER/CREATE/DROP) are captured non invasively. 34 | 3. Produce Logical Change Records (LCR) in JSON format. 35 | 4. Commits at the source are grouped by transaction. 36 | 3. Flexible Consumer Dataflows: consumer dataflows can be implemented in Apache NiFi, Flink, Spark or Apex 37 | 1. Parallel processing data filtering, transformation and loading. 38 | 4. Flexible Databus: store LCRs in **Kafka** streams for durability and pub-sub semantics. 39 | 1. Use *only* Kafka as input for all consumer dataflows. 40 | 1. Feed data to many client types (real-time, slow/catch-up, full bootstrap). 41 | 2. Consumption from an arbitrary time point in the change stream including full bootstrap capability of the entire data. 42 | 3. Guaranteed in-commit-order and at-least-once delivery. 43 | 4. Partitioned consumption (partitioned data to different Kafka topics based on database name, table or any field of LCR) 44 | 5. Both batch and near real time delivery. 45 | 46 | 47 | #### csv-to-json 48 | 49 | This [flow](./csv-to-json/) shows how to convert a CSV entry to a JSON document using ExtractText and ReplaceText. 50 | 51 | #### decompression 52 | 53 | This [flow](./decompression/) demonstrates taking an archive that is created with several levels of compression and then continuously 54 | decompressing it using a loop until the archived file is extracted out. 55 | 56 | #### http-get-route 57 | 58 | his [flow](./http-get-route/) pulls from a web service (example is nifi itself), extracts text from a specific section, makes a routing decision 59 | on that extracted value, prepares to write to disk using PutFile. 60 | 61 | #### invoke-http-route 62 | 63 | This [flow](./invoke-http-route/) demonstrates how to call an HTTP service based on an incoming FlowFile, and route the original FlowFile 64 | based on the status code returned from the invocation. In this example, every 30 seconds a FlowFile is produced, 65 | an attribute is added to the FlowFile that sets q=nifi, the google.com is invoked for that FlowFile, and any response 66 | with a 200 is routed to a relationship called 200. 67 | 68 | #### retry-count-loop 69 | 70 | This [process group](./retry/) can be used to maintain a count of how many times a flowfile goes through it. If it reaches some 71 | configured threshold it will route to a 'Limit Exceeded' relationship otherwise it will route to 'retry'. 72 | Great for processes which you only want to run X number of times before you give up. 73 | 74 | #### split-route 75 | 76 | This [flow](./split-route/) demonstrates splitting a file on line boundaries, routing the splits based on a regex in the content, 77 | merging the less important files together for storage somewhere, and sending the higher priority files down 78 | another path to take immediate action. 79 | 80 | #### twitter-garden-hose 81 | 82 | This [flow](./twitter-garden-hose/) pulls from Twitter using the garden hose setting; it pulls out some basic attributes from the Json and 83 | then routes only those items that are actually tweets. 84 | 85 | #### twitter-solr 86 | 87 | This [flow](./twitter-solr/) shows how to index tweets with Solr using NiFi. Pre-requisites for this flow are NiFi 0.3.0 or later, 88 | the creation of a Twitter application, and a running instance of Solr 5.1 or later with a tweets collection: 89 | 90 | 91 | ### Install NiFi 92 | 1. Manual: Download [Apache NiFi](https://nifi.apache.org/download.html) binaries and unpack to a folder. 93 | 2. On Mac: `brew install nifi` 94 | 95 | ### Run NiFi 96 | ```bash 97 | cd /Developer/Applications/nifi 98 | ./bin/nifi.sh start 99 | ./bin/nifi.sh stop 100 | ``` 101 | On Mac 102 | ```bash 103 | # nifi start|stop|run|restart|status|dump|install 104 | nifi start 105 | nifi status 106 | nifi stop 107 | # Working Directory: /usr/local/Cellar/nifi/0.3.0/libexec 108 | ``` -------------------------------------------------------------------------------- /build.gradle: -------------------------------------------------------------------------------- 1 | allprojects { 2 | repositories { 3 | mavenCentral() 4 | } 5 | version = rootVersion 6 | } 7 | 8 | subprojects { 9 | 10 | } 11 | 12 | task wrapper(type: Wrapper) { 13 | description = 'Generates gradlew[.bat] scripts' 14 | gradleVersion = '2.7' 15 | } 16 | -------------------------------------------------------------------------------- /collect-stream-logs/README.md: -------------------------------------------------------------------------------- 1 | collect-stream-logs 2 | =================== 3 | 4 | 1. Ingest logs from folders. 5 | 2. Listen for syslogs on UDP port. 6 | 3. Merge syslogs and drop-in logs and persist merged logs to Solr for historical search. 7 | 4. Dashboard: stream real-time log events to dashboard and enable cross-filter search on historical logs data. 8 | 9 | Note: this flow depends on **nifi-websocket** module, download [nar](https://github.com/xmlking/nifi-websocket/releases/download/0.1.0/nifi-websocket-0.1.0-SNAPSHOT.nar) and copy to `$NIFI_HOME/lib` 10 | 11 | ### Run log generator 12 | ```bash 13 | gradle :collect-stream-logs:log-generator:run 14 | ``` 15 | 16 | ### Flow 17 |  18 | 19 | ### Demo 20 |  21 | 22 | ### Reference 23 | 1. [Collecting Logs with Apache NiFi](http://bryanbende.com/development/2015/05/17/collecting-logs-with-apache-nifi/) -------------------------------------------------------------------------------- /collect-stream-logs/dashboard/draw.html: -------------------------------------------------------------------------------- 1 | 2 | 3 |
4 | 5 |