├── README.md
├── pom.xml
└── src
├── main
└── java
│ └── com
│ └── urey
│ └── flume
│ ├── MultiLineExecSource.java
│ └── MultiLineExecSourceConfigurationConstants.java
└── test
└── java
└── com
└── urey
└── flume
└── AppTest.java
/README.md:
--------------------------------------------------------------------------------
1 | # Flume Plugin: MultiLineExecSource
2 |
3 | Flume-NG 's ExecSource is aimed at collecting every line in xxx.log as a flume event. The line is ended with '$' by default. But in some situations, one log is multiline, for instance, the error logs are mostly multiline because of stacktrace. So I have developed a MultiLineExecSource which based on ExecSource.
4 |
5 | **NOTE 1: MultiLineExecSource plugin is built for Flume-NG and will not work on Flume-OG**
6 |
7 | **NOTE 2: It lacks comprehensive test coverage. Of course contributions are welcome to make its more stable and useful**
8 |
9 | ## Compilation
10 |
11 | The project is maintained by [Maven](http://maven.apache.org/).
12 |
13 | ## Installation instructions
14 |
15 | After your compilation, you should ship the target jar `flume-source-plugin-1.0-SNAPSHOT.jar` to the `$FLUME_HOME/flume-ng/lib/`. Then you can edit flume.conf to use the MultiLineExecSource instead of the default ExecSource.
16 |
17 | Now follows a brief overview of MultiLineExecSource with usage instructions.
18 |
19 | ## Sources
20 |
21 | ### MultiLineExecSource
22 |
23 | The MultiLineExecSource is used for generating one Flume event which is composed of multiple lines in the log. It will inspect every line to see whether it is starting with a symbol which means a new line. The symbol is satisfying some kind of regex.
24 |
25 | For instance, the HDFS's datanode log is usually starting with '2016-03-18 17:53:40,278'. It can be expressed with regex '\s?\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d,\d\d\d'. So MultiLineExecSource will distinguish every line with this regex. If a line starts with it, it is a new line. Otherwise, it belongs to the previous line.
26 |
27 | The MultiLineExecSource is based on the regular exec source and includes the same parameters. It also adds one additional one:
28 |
29 | * **lineStartRegex**: It is used to distinguish every line.
30 |
31 |
32 | Example config:
33 |
34 | ```
35 | agent.sources.hdfs_namenode_src.type = com.urey.flume.MultiLineExecSource
36 | agent.sources.hdfs_namenode_src.lineStartRegex = \\s?\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d,\\d\\d\\d
37 | ```
38 |
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 |
47 |
48 |
--------------------------------------------------------------------------------
/pom.xml:
--------------------------------------------------------------------------------
1 |
50 | * A {@link org.apache.flume.Source} implementation that executes a Unix process and turns each 51 | * line of text into an event. 52 | *
53 | *54 | * This source runs a given Unix command on start up and expects that process to 55 | * continuously produce data on standard out (stderr ignored by default). Unless 56 | * told to restart, if the process exits for any reason, the source also exits and 57 | * will produce no further data. This means configurations such as cat [named pipe] 58 | * or tail -F [file] are going to produce the desired results where as 59 | * date will probably not - the former two commands produce streams of 60 | * data where as the latter produces a single event and exits. 61 | *
62 | *63 | * The ExecSource is meant for situations where one must integrate with 64 | * existing systems without modifying code. It is a compatibility gateway built 65 | * to allow simple, stop-gap integration and doesn't necessarily offer all of 66 | * the benefits or guarantees of native integration with Flume. If one has the 67 | * option of using the AvroSource, for instance, that would be greatly 68 | * preferred to this source as it (and similarly implemented sources) can 69 | * maintain the transactional guarantees that exec can not. 70 | *
71 | *72 | * Why doesn't ExecSource offer transactional guarantees? 73 | *
74 | *75 | * The problem with ExecSource and other asynchronous sources is that 76 | * the source can not guarantee that if there is a failure to put the event into 77 | * the {@link org.apache.flume.Channel} the client knows about it. As a for instance, one of the 78 | * most commonly requested features is the tail -F [file]-like use case 79 | * where an application writes to a log file on disk and Flume tails the file, 80 | * sending each line as an event. While this is possible, there's an obvious 81 | * problem; what happens if the channel fills up and Flume can't send an event? 82 | * Flume has no way of indicating to the application writing the log file that 83 | * it needs to retain the log or that the event hasn't been sent, for some 84 | * reason. If this doesn't make sense, you need only know this: Your 85 | * application can never guarantee data has been received when using a 86 | * unidirectional asynchronous interface such as ExecSource! As an extension 87 | * of this warning - and to be completely clear - there is absolutely zero 88 | * guarantee of event delivery when using this source. You have been warned. 89 | *
90 | *91 | * Configuration options 92 | *
93 | *Parameter | 96 | *Description | 97 | *Unit / Type | 98 | *Default | 99 | *
---|---|---|---|
command | 102 | *The command to execute | 103 | *String | 104 | *none (required) | 105 | *
restart | 108 | *Whether to restart the command when it exits | 109 | *Boolean | 110 | *false | 111 | *
restartThrottle | 114 | *How long in milliseconds to wait before restarting the command | 115 | *Long | 116 | *10000 | 117 | *
logStderr | 120 | *Whether to log or discard the standard error stream of the command | 121 | *Boolean | 122 | *false | 123 | *
batchSize | 126 | *The number of events to commit to channel at a time. | 127 | *integer | 128 | *20 | 129 | *
batchTimeout | 132 | *Amount of time (in milliseconds) to wait, if the buffer size was not reached, before data is pushed downstream. | 133 | *long | 134 | *3000 | 135 | *
138 | * Metrics 139 | *
140 | *141 | * TODO 142 | *
143 | */ 144 | /** 145 | * Created by ureyqiao on 2016/3/21. 146 | * contact me: qiaowei@pku.edu.cn 147 | */ 148 | public class MultiLineExecSource extends AbstractSource implements EventDrivenSource, Configurable { 149 | 150 | private static final Logger logger = LoggerFactory.getLogger(MultiLineExecSource.class); 151 | 152 | private String shell; 153 | private String command; 154 | private SourceCounter sourceCounter; 155 | private ExecutorService executor; 156 | private Future> runnerFuture; 157 | private long restartThrottle; 158 | private boolean restart; 159 | private boolean logStderr; 160 | private Integer bufferCount; 161 | private long batchTimeout; 162 | private ExecRunnable runner; 163 | private Charset charset; 164 | 165 | private String regex; 166 | 167 | @Override 168 | public void start() { 169 | logger.info("Exec source starting with command:{}", command); 170 | 171 | executor = Executors.newSingleThreadExecutor(); 172 | 173 | runner = new ExecRunnable(shell, command, getChannelProcessor(), sourceCounter, 174 | restart, restartThrottle, logStderr, bufferCount, batchTimeout, charset, regex); 175 | 176 | // FIXME: Use a callback-like executor / future to signal us upon failure. 177 | runnerFuture = executor.submit(runner); 178 | 179 | /* 180 | * NB: This comes at the end rather than the beginning of the method because 181 | * it sets our state to running. We want to make sure the executor is alive 182 | * and well first. 183 | */ 184 | sourceCounter.start(); 185 | super.start(); 186 | 187 | logger.debug("Exec source started"); 188 | } 189 | 190 | @Override 191 | public void stop() { 192 | logger.info("Stopping exec source with command:{}", command); 193 | if(runner != null) { 194 | runner.setRestart(false); 195 | runner.kill(); 196 | } 197 | 198 | if (runnerFuture != null) { 199 | logger.debug("Stopping exec runner"); 200 | runnerFuture.cancel(true); 201 | logger.debug("Exec runner stopped"); 202 | } 203 | executor.shutdown(); 204 | 205 | while (!executor.isTerminated()) { 206 | logger.debug("Waiting for exec executor service to stop"); 207 | try { 208 | executor.awaitTermination(500, TimeUnit.MILLISECONDS); 209 | } catch (InterruptedException e) { 210 | logger.debug("Interrupted while waiting for exec executor service " 211 | + "to stop. Just exiting."); 212 | Thread.currentThread().interrupt(); 213 | } 214 | } 215 | 216 | sourceCounter.stop(); 217 | super.stop(); 218 | 219 | logger.debug("Exec source with command:{} stopped. Metrics:{}", command, 220 | sourceCounter); 221 | } 222 | 223 | @Override 224 | public void configure(Context context) { 225 | command = context.getString("command"); 226 | 227 | Preconditions.checkState(command != null, 228 | "The parameter command must be specified"); 229 | 230 | restartThrottle = context.getLong(ExecSourceConfigurationConstants.CONFIG_RESTART_THROTTLE, 231 | ExecSourceConfigurationConstants.DEFAULT_RESTART_THROTTLE); 232 | 233 | restart = context.getBoolean(ExecSourceConfigurationConstants.CONFIG_RESTART, 234 | ExecSourceConfigurationConstants.DEFAULT_RESTART); 235 | 236 | logStderr = context.getBoolean(ExecSourceConfigurationConstants.CONFIG_LOG_STDERR, 237 | ExecSourceConfigurationConstants.DEFAULT_LOG_STDERR); 238 | 239 | bufferCount = context.getInteger(ExecSourceConfigurationConstants.CONFIG_BATCH_SIZE, 240 | ExecSourceConfigurationConstants.DEFAULT_BATCH_SIZE); 241 | 242 | batchTimeout = context.getLong(ExecSourceConfigurationConstants.CONFIG_BATCH_TIME_OUT, 243 | ExecSourceConfigurationConstants.DEFAULT_BATCH_TIME_OUT); 244 | 245 | charset = Charset.forName(context.getString(ExecSourceConfigurationConstants.CHARSET, 246 | ExecSourceConfigurationConstants.DEFAULT_CHARSET)); 247 | 248 | shell = context.getString(ExecSourceConfigurationConstants.CONFIG_SHELL, null); 249 | 250 | regex = context.getString(MultiLineExecSourceConfigurationConstants.REGEX, MultiLineExecSourceConfigurationConstants.DEFAULT_REGEX); 251 | 252 | if (sourceCounter == null) { 253 | sourceCounter = new SourceCounter(getName()); 254 | } 255 | } 256 | 257 | private static class ExecRunnable implements Runnable { 258 | 259 | public ExecRunnable(String shell, String command, ChannelProcessor channelProcessor, 260 | SourceCounter sourceCounter, boolean restart, long restartThrottle, 261 | boolean logStderr, int bufferCount, long batchTimeout, Charset charset, String regex) { 262 | this.command = command; 263 | this.channelProcessor = channelProcessor; 264 | this.sourceCounter = sourceCounter; 265 | this.restartThrottle = restartThrottle; 266 | this.bufferCount = bufferCount; 267 | this.batchTimeout = batchTimeout; 268 | this.restart = restart; 269 | this.logStderr = logStderr; 270 | this.charset = charset; 271 | this.shell = shell; 272 | this.regex = regex; 273 | this.pattern = Pattern.compile(regex); 274 | } 275 | 276 | private final String shell; 277 | private final String command; 278 | private final ChannelProcessor channelProcessor; 279 | private final SourceCounter sourceCounter; 280 | private volatile boolean restart; 281 | private final long restartThrottle; 282 | private final int bufferCount; 283 | private long batchTimeout; 284 | private final boolean logStderr; 285 | private final Charset charset; 286 | private Process process = null; 287 | private SystemClock systemClock = new SystemClock(); 288 | private Long lastPushToChannel = systemClock.currentTimeMillis(); 289 | ScheduledExecutorService timedFlushService; 290 | ScheduledFuture> future; 291 | ///multiline setting start 292 | private String regex; 293 | private Pattern pattern; 294 | List