├── .gitignore ├── src ├── main │ └── java │ │ └── org │ │ └── apache │ │ └── storm │ │ └── hdfs │ │ ├── common │ │ ├── rotation │ │ │ ├── RotationAction.java │ │ │ └── MoveFileAction.java │ │ └── security │ │ │ └── HdfsSecurityUtil.java │ │ ├── trident │ │ ├── HdfsUpdater.java │ │ ├── HdfsStateFactory.java │ │ ├── format │ │ │ ├── SequenceFormat.java │ │ │ ├── RecordFormat.java │ │ │ ├── DefaultSequenceFormat.java │ │ │ ├── FileNameFormat.java │ │ │ ├── DefaultFileNameFormat.java │ │ │ └── DelimitedRecordFormat.java │ │ ├── rotation │ │ │ ├── NoRotationPolicy.java │ │ │ ├── TimedRotationPolicy.java │ │ │ ├── FileRotationPolicy.java │ │ │ └── FileSizeRotationPolicy.java │ │ ├── sync │ │ │ ├── SyncPolicy.java │ │ │ └── CountSyncPolicy.java │ │ └── HdfsState.java │ │ └── bolt │ │ ├── format │ │ ├── SequenceFormat.java │ │ ├── RecordFormat.java │ │ ├── DefaultSequenceFormat.java │ │ ├── FileNameFormat.java │ │ ├── DefaultFileNameFormat.java │ │ └── DelimitedRecordFormat.java │ │ ├── rotation │ │ ├── NoRotationPolicy.java │ │ ├── TimedRotationPolicy.java │ │ ├── FileRotationPolicy.java │ │ └── FileSizeRotationPolicy.java │ │ ├── sync │ │ ├── SyncPolicy.java │ │ └── CountSyncPolicy.java │ │ ├── HdfsBolt.java │ │ ├── SequenceFileBolt.java │ │ └── AbstractHdfsBolt.java └── test │ └── java │ └── org │ └── apache │ └── storm │ └── hdfs │ ├── trident │ ├── FixedBatchSpout.java │ ├── TridentFileTopology.java │ └── TridentSequenceTopology.java │ └── bolt │ ├── SequenceFileTopology.java │ └── HdfsFileTopology.java ├── pom.xml ├── LICENSE └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | target/ 2 | *.iml 3 | *.ipr 4 | *.iws -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/common/rotation/RotationAction.java: -------------------------------------------------------------------------------- 1 | package org.apache.storm.hdfs.common.rotation; 2 | 3 | 4 | import org.apache.hadoop.fs.FileSystem; 5 | import org.apache.hadoop.fs.Path; 6 | 7 | import java.io.IOException; 8 | import java.io.Serializable; 9 | 10 | public interface RotationAction extends Serializable { 11 | void execute(FileSystem fileSystem, Path filePath) throws IOException; 12 | } 13 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/HdfsUpdater.java: -------------------------------------------------------------------------------- 1 | package org.apache.storm.hdfs.trident; 2 | 3 | import storm.trident.operation.TridentCollector; 4 | import storm.trident.state.BaseStateUpdater; 5 | import storm.trident.tuple.TridentTuple; 6 | 7 | import java.util.List; 8 | 9 | public class HdfsUpdater extends BaseStateUpdater{ 10 | @Override 11 | public void updateState(HdfsState state, List tuples, TridentCollector collector) { 12 | state.updateState(tuples, collector); 13 | } 14 | } 15 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/common/rotation/MoveFileAction.java: -------------------------------------------------------------------------------- 1 | package org.apache.storm.hdfs.common.rotation; 2 | 3 | import org.apache.hadoop.fs.FileSystem; 4 | import org.apache.hadoop.fs.Path; 5 | import org.slf4j.Logger; 6 | import org.slf4j.LoggerFactory; 7 | 8 | import java.io.IOException; 9 | 10 | public class MoveFileAction implements RotationAction { 11 | private static final Logger LOG = LoggerFactory.getLogger(MoveFileAction.class); 12 | 13 | private String destination; 14 | 15 | public MoveFileAction toDestination(String destDir){ 16 | destination = destDir; 17 | return this; 18 | } 19 | 20 | @Override 21 | public void execute(FileSystem fileSystem, Path filePath) throws IOException { 22 | Path destPath = new Path(destination, filePath.getName()); 23 | LOG.info("Moving file {} to {}", filePath, destPath); 24 | boolean success = fileSystem.rename(filePath, destPath); 25 | return; 26 | } 27 | } 28 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/HdfsStateFactory.java: -------------------------------------------------------------------------------- 1 | package org.apache.storm.hdfs.trident; 2 | 3 | import backtype.storm.task.IMetricsContext; 4 | import org.slf4j.Logger; 5 | import org.slf4j.LoggerFactory; 6 | import storm.trident.state.State; 7 | import storm.trident.state.StateFactory; 8 | 9 | import java.util.Map; 10 | 11 | public class HdfsStateFactory implements StateFactory { 12 | private static final Logger LOG = LoggerFactory.getLogger(HdfsStateFactory.class); 13 | private HdfsState.Options options; 14 | 15 | public HdfsStateFactory(){} 16 | 17 | public HdfsStateFactory withOptions(HdfsState.Options options){ 18 | this.options = options; 19 | return this; 20 | } 21 | 22 | @Override 23 | public State makeState(Map conf, IMetricsContext metrics, int partitionIndex, int numPartitions) { 24 | LOG.info("makeState(partitonIndex={}, numpartitions={}", partitionIndex, numPartitions); 25 | HdfsState state = new HdfsState(this.options); 26 | state.prepare(conf, metrics, partitionIndex, numPartitions); 27 | return state; 28 | } 29 | } 30 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/format/SequenceFormat.java: -------------------------------------------------------------------------------- 1 | package org.apache.storm.hdfs.bolt.format; 2 | 3 | import backtype.storm.tuple.Tuple; 4 | import org.apache.hadoop.io.Writable; 5 | 6 | import java.io.Serializable; 7 | 8 | /** 9 | * Interface for converting Tuple objects to HDFS sequence file key-value pairs. 10 | * 11 | */ 12 | public interface SequenceFormat extends Serializable { 13 | /** 14 | * Key class used by implementation (e.g. IntWritable.class, etc.) 15 | * 16 | * @return 17 | */ 18 | Class keyClass(); 19 | 20 | /** 21 | * Value class used by implementation (e.g. Text.class, etc.) 22 | * @return 23 | */ 24 | Class valueClass(); 25 | 26 | /** 27 | * Given a tuple, return the key that should be written to the sequence file. 28 | * 29 | * @param tuple 30 | * @return 31 | */ 32 | Writable key(Tuple tuple); 33 | 34 | /** 35 | * Given a tuple, return the value that should be written to the sequence file. 36 | * @param tuple 37 | * @return 38 | */ 39 | Writable value(Tuple tuple); 40 | } 41 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/format/SequenceFormat.java: -------------------------------------------------------------------------------- 1 | package org.apache.storm.hdfs.trident.format; 2 | 3 | import org.apache.hadoop.io.Writable; 4 | import storm.trident.tuple.TridentTuple; 5 | 6 | import java.io.Serializable; 7 | 8 | /** 9 | * Interface for converting TridentTuple objects to HDFS sequence file key-value pairs. 10 | * 11 | */ 12 | public interface SequenceFormat extends Serializable { 13 | /** 14 | * Key class used by implementation (e.g. IntWritable.class, etc.) 15 | * 16 | * @return 17 | */ 18 | Class keyClass(); 19 | 20 | /** 21 | * Value class used by implementation (e.g. Text.class, etc.) 22 | * @return 23 | */ 24 | Class valueClass(); 25 | 26 | /** 27 | * Given a tuple, return the key that should be written to the sequence file. 28 | * 29 | * @param tuple 30 | * @return 31 | */ 32 | Writable key(TridentTuple tuple); 33 | 34 | /** 35 | * Given a tuple, return the value that should be written to the sequence file. 36 | * @param tuple 37 | * @return 38 | */ 39 | Writable value(TridentTuple tuple); 40 | } 41 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/format/RecordFormat.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.bolt.format; 19 | 20 | 21 | import backtype.storm.tuple.Tuple; 22 | 23 | import java.io.Serializable; 24 | 25 | /** 26 | * Formats a Tuple object into a byte array 27 | * that will be written to HDFS. 28 | * 29 | */ 30 | public interface RecordFormat extends Serializable { 31 | byte[] format(Tuple tuple); 32 | } 33 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/format/RecordFormat.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.trident.format; 19 | 20 | 21 | import storm.trident.tuple.TridentTuple; 22 | 23 | import java.io.Serializable; 24 | 25 | /** 26 | * Formats a Tuple object into a byte array 27 | * that will be written to HDFS. 28 | * 29 | */ 30 | public interface RecordFormat extends Serializable { 31 | byte[] format(TridentTuple tuple); 32 | } 33 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/rotation/NoRotationPolicy.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.bolt.rotation; 19 | 20 | import backtype.storm.tuple.Tuple; 21 | 22 | /** 23 | * File rotation policy that will never rotate... 24 | * Just one big file. Intended for testing purposes. 25 | */ 26 | public class NoRotationPolicy implements FileRotationPolicy { 27 | @Override 28 | public boolean mark(Tuple tuple, long offset) { 29 | return false; 30 | } 31 | 32 | @Override 33 | public void reset() { 34 | } 35 | } 36 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/rotation/NoRotationPolicy.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.trident.rotation; 19 | 20 | import storm.trident.tuple.TridentTuple; 21 | 22 | /** 23 | * File rotation policy that will never rotate... 24 | * Just one big file. Intended for testing purposes. 25 | */ 26 | public class NoRotationPolicy implements FileRotationPolicy { 27 | @Override 28 | public boolean mark(TridentTuple tuple, long offset) { 29 | return false; 30 | } 31 | 32 | @Override 33 | public void reset() { 34 | } 35 | } 36 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/rotation/TimedRotationPolicy.java: -------------------------------------------------------------------------------- 1 | package org.apache.storm.hdfs.bolt.rotation; 2 | 3 | import backtype.storm.tuple.Tuple; 4 | 5 | public class TimedRotationPolicy implements FileRotationPolicy { 6 | 7 | public static enum TimeUnit { 8 | 9 | SECONDS((long)1000), 10 | MINUTES((long)1000*60), 11 | HOURS((long)1000*60*60), 12 | DAYS((long)1000*60*60*24); 13 | 14 | private long milliSeconds; 15 | 16 | private TimeUnit(long milliSeconds){ 17 | this.milliSeconds = milliSeconds; 18 | } 19 | 20 | public long getMilliSeconds(){ 21 | return milliSeconds; 22 | } 23 | } 24 | 25 | private long interval; 26 | 27 | public TimedRotationPolicy(float count, TimeUnit units){ 28 | this.interval = (long)(count * units.getMilliSeconds()); 29 | } 30 | 31 | 32 | /** 33 | * Called for every tuple the HdfsBolt executes. 34 | * 35 | * @param tuple The tuple executed. 36 | * @param offset current offset of file being written 37 | * @return true if a file rotation should be performed 38 | */ 39 | @Override 40 | public boolean mark(Tuple tuple, long offset) { 41 | return false; 42 | } 43 | 44 | /** 45 | * Called after the HdfsBolt rotates a file. 46 | */ 47 | @Override 48 | public void reset() { 49 | 50 | } 51 | 52 | public long getInterval(){ 53 | return this.interval; 54 | } 55 | } 56 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/rotation/TimedRotationPolicy.java: -------------------------------------------------------------------------------- 1 | package org.apache.storm.hdfs.trident.rotation; 2 | 3 | import storm.trident.tuple.TridentTuple; 4 | 5 | 6 | public class TimedRotationPolicy implements FileRotationPolicy { 7 | 8 | public static enum TimeUnit { 9 | 10 | SECONDS((long)1000), 11 | MINUTES((long)1000*60), 12 | HOURS((long)1000*60*60), 13 | DAYS((long)1000*60*60*24); 14 | 15 | private long milliSeconds; 16 | 17 | private TimeUnit(long milliSeconds){ 18 | this.milliSeconds = milliSeconds; 19 | } 20 | 21 | public long getMilliSeconds(){ 22 | return milliSeconds; 23 | } 24 | } 25 | 26 | private long interval; 27 | 28 | public TimedRotationPolicy(float count, TimeUnit units){ 29 | this.interval = (long)(count * units.getMilliSeconds()); 30 | } 31 | /** 32 | * Called for every tuple the HdfsBolt executes. 33 | * 34 | * @param tuple The tuple executed. 35 | * @param offset current offset of file being written 36 | * @return true if a file rotation should be performed 37 | */ 38 | @Override 39 | public boolean mark(TridentTuple tuple, long offset) { 40 | return false; 41 | } 42 | 43 | /** 44 | * Called after the HdfsBolt rotates a file. 45 | */ 46 | @Override 47 | public void reset() { 48 | 49 | } 50 | 51 | public long getInterval(){ 52 | return this.interval; 53 | } 54 | } 55 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/format/DefaultSequenceFormat.java: -------------------------------------------------------------------------------- 1 | package org.apache.storm.hdfs.bolt.format; 2 | 3 | import backtype.storm.tuple.Tuple; 4 | import org.apache.hadoop.io.LongWritable; 5 | import org.apache.hadoop.io.Text; 6 | import org.apache.hadoop.io.Writable; 7 | 8 | /** 9 | * Basic SequenceFormat implementation that uses 10 | * LongWritable for keys and Text for values. 11 | * 12 | */ 13 | public class DefaultSequenceFormat implements SequenceFormat { 14 | private transient LongWritable key; 15 | private transient Text value; 16 | 17 | private String keyField; 18 | private String valueField; 19 | 20 | public DefaultSequenceFormat(String keyField, String valueField){ 21 | this.keyField = keyField; 22 | this.valueField = valueField; 23 | } 24 | 25 | @Override 26 | public Class keyClass() { 27 | return LongWritable.class; 28 | } 29 | 30 | @Override 31 | public Class valueClass() { 32 | return Text.class; 33 | } 34 | 35 | @Override 36 | public Writable key(Tuple tuple) { 37 | if(this.key == null){ 38 | this.key = new LongWritable(); 39 | } 40 | this.key.set(tuple.getLongByField(this.keyField)); 41 | return this.key; 42 | } 43 | 44 | @Override 45 | public Writable value(Tuple tuple) { 46 | if(this.value == null){ 47 | this.value = new Text(); 48 | } 49 | this.value.set(tuple.getStringByField(this.valueField)); 50 | return this.value; 51 | } 52 | } 53 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/format/DefaultSequenceFormat.java: -------------------------------------------------------------------------------- 1 | package org.apache.storm.hdfs.trident.format; 2 | 3 | import org.apache.hadoop.io.LongWritable; 4 | import org.apache.hadoop.io.Text; 5 | import org.apache.hadoop.io.Writable; 6 | import storm.trident.tuple.TridentTuple; 7 | 8 | /** 9 | * Basic SequenceFormat implementation that uses 10 | * LongWritable for keys and Text for values. 11 | * 12 | */ 13 | public class DefaultSequenceFormat implements SequenceFormat { 14 | private transient LongWritable key; 15 | private transient Text value; 16 | 17 | private String keyField; 18 | private String valueField; 19 | 20 | public DefaultSequenceFormat(String keyField, String valueField){ 21 | this.keyField = keyField; 22 | this.valueField = valueField; 23 | } 24 | 25 | 26 | 27 | @Override 28 | public Class keyClass() { 29 | return LongWritable.class; 30 | } 31 | 32 | @Override 33 | public Class valueClass() { 34 | return Text.class; 35 | } 36 | 37 | @Override 38 | public Writable key(TridentTuple tuple) { 39 | if(this.key == null){ 40 | this.key = new LongWritable(); 41 | } 42 | this.key.set(tuple.getLongByField(this.keyField)); 43 | return this.key; 44 | } 45 | 46 | @Override 47 | public Writable value(TridentTuple tuple) { 48 | if(this.value == null){ 49 | this.value = new Text(); 50 | } 51 | this.value.set(tuple.getStringByField(this.valueField)); 52 | return this.value; 53 | } 54 | } 55 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/format/FileNameFormat.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.trident.format; 19 | 20 | import java.io.Serializable; 21 | import java.util.Map; 22 | 23 | /** 24 | * Formatter interface for determining HDFS file names. 25 | * 26 | */ 27 | public interface FileNameFormat extends Serializable { 28 | 29 | void prepare(Map conf, int partitionIndex, int numPartitions); 30 | 31 | /** 32 | * Returns the filename the HdfsBolt will create. 33 | * @param rotation the current file rotation number (incremented on every rotation) 34 | * @param timeStamp current time in milliseconds when the rotation occurs 35 | * @return 36 | */ 37 | String getName(long rotation, long timeStamp); 38 | 39 | String getPath(); 40 | } 41 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/sync/SyncPolicy.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.bolt.sync; 19 | 20 | import backtype.storm.tuple.Tuple; 21 | 22 | import java.io.Serializable; 23 | 24 | /** 25 | * Interface for controlling when the HdfsBolt 26 | * syncs and flushes the filesystem. 27 | * 28 | */ 29 | public interface SyncPolicy extends Serializable { 30 | /** 31 | * Called for every tuple the HdfsBolt executes. 32 | * 33 | * @param tuple The tuple executed. 34 | * @param offset current offset for the file being written 35 | * @return true if a sync should be performed 36 | */ 37 | boolean mark(Tuple tuple, long offset); 38 | 39 | 40 | /** 41 | * Called after the HdfsBolt performs a sync. 42 | * 43 | */ 44 | void reset(); 45 | 46 | } 47 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/sync/CountSyncPolicy.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.bolt.sync; 19 | 20 | 21 | import backtype.storm.tuple.Tuple; 22 | 23 | /** 24 | * SyncPolicy implementation that will trigger a 25 | * file system sync after a certain number of tuples 26 | * have been processed. 27 | */ 28 | public class CountSyncPolicy implements SyncPolicy { 29 | private int count; 30 | private int executeCount = 0; 31 | 32 | public CountSyncPolicy(int count){ 33 | this.count = count; 34 | } 35 | 36 | @Override 37 | public boolean mark(Tuple tuple, long offset) { 38 | this.executeCount++; 39 | return this.executeCount >= this.count; 40 | } 41 | 42 | @Override 43 | public void reset() { 44 | this.executeCount = 0; 45 | } 46 | } 47 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/format/FileNameFormat.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.bolt.format; 19 | 20 | import backtype.storm.task.TopologyContext; 21 | 22 | import java.io.Serializable; 23 | import java.util.Map; 24 | 25 | /** 26 | * Formatter interface for determining HDFS file names. 27 | * 28 | */ 29 | public interface FileNameFormat extends Serializable { 30 | 31 | void prepare(Map conf, TopologyContext topologyContext); 32 | 33 | /** 34 | * Returns the filename the HdfsBolt will create. 35 | * @param rotation the current file rotation number (incremented on every rotation) 36 | * @param timeStamp current time in milliseconds when the rotation occurs 37 | * @return 38 | */ 39 | String getName(long rotation, long timeStamp); 40 | 41 | String getPath(); 42 | } 43 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/sync/SyncPolicy.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.trident.sync; 19 | 20 | import storm.trident.tuple.TridentTuple; 21 | 22 | import java.io.Serializable; 23 | 24 | /** 25 | * Interface for controlling when the HdfsBolt 26 | * syncs and flushes the filesystem. 27 | * 28 | */ 29 | public interface SyncPolicy extends Serializable { 30 | /** 31 | * Called for every tuple the HdfsBolt executes. 32 | * 33 | * @param tuple The tuple executed. 34 | * @param offset current offset for the file being written 35 | * @return true if a sync should be performed 36 | */ 37 | boolean mark(TridentTuple tuple, long offset); 38 | 39 | 40 | /** 41 | * Called after the HdfsBolt performs a sync. 42 | * 43 | */ 44 | void reset(); 45 | 46 | } 47 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/sync/CountSyncPolicy.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.trident.sync; 19 | 20 | 21 | import storm.trident.tuple.TridentTuple; 22 | 23 | /** 24 | * SyncPolicy implementation that will trigger a 25 | * file system sync after a certain number of tuples 26 | * have been processed. 27 | */ 28 | public class CountSyncPolicy implements SyncPolicy { 29 | private int count; 30 | private int executeCount = 0; 31 | 32 | public CountSyncPolicy(int count){ 33 | this.count = count; 34 | } 35 | 36 | @Override 37 | public boolean mark(TridentTuple tuple, long offset) { 38 | this.executeCount++; 39 | return this.executeCount >= this.count; 40 | } 41 | 42 | @Override 43 | public void reset() { 44 | this.executeCount = 0; 45 | } 46 | } 47 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/rotation/FileRotationPolicy.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.bolt.rotation; 19 | 20 | 21 | import backtype.storm.tuple.Tuple; 22 | 23 | import java.io.Serializable; 24 | 25 | /** 26 | * Used by the HdfsBolt to decide when to rotate files. 27 | * 28 | * The HdfsBolt will call the mark() method for every 29 | * tuple received. If the mark() method returns 30 | * true the HdfsBolt will perform a file rotation. 31 | * 32 | * After file rotation, the HdfsBolt will call the reset() 33 | * method. 34 | */ 35 | public interface FileRotationPolicy extends Serializable { 36 | /** 37 | * Called for every tuple the HdfsBolt executes. 38 | * 39 | * @param tuple The tuple executed. 40 | * @param offset current offset of file being written 41 | * @return true if a file rotation should be performed 42 | */ 43 | boolean mark(Tuple tuple, long offset); 44 | 45 | 46 | /** 47 | * Called after the HdfsBolt rotates a file. 48 | * 49 | */ 50 | void reset(); 51 | } 52 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/rotation/FileRotationPolicy.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.trident.rotation; 19 | 20 | import storm.trident.tuple.TridentTuple; 21 | 22 | import java.io.Serializable; 23 | 24 | /** 25 | * Used by the HdfsBolt to decide when to rotate files. 26 | * 27 | * The HdfsBolt will call the mark() method for every 28 | * tuple received. If the mark() method returns 29 | * true the HdfsBolt will perform a file rotation. 30 | * 31 | * After file rotation, the HdfsBolt will call the reset() 32 | * method. 33 | */ 34 | public interface FileRotationPolicy extends Serializable { 35 | /** 36 | * Called for every tuple the HdfsBolt executes. 37 | * 38 | * @param tuple The tuple executed. 39 | * @param offset current offset of file being written 40 | * @return true if a file rotation should be performed 41 | */ 42 | boolean mark(TridentTuple tuple, long offset); 43 | 44 | 45 | /** 46 | * Called after the HdfsBolt rotates a file. 47 | * 48 | */ 49 | void reset(); 50 | } 51 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/common/security/HdfsSecurityUtil.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.common.security; 19 | 20 | import org.apache.hadoop.conf.Configuration; 21 | import org.apache.hadoop.security.SecurityUtil; 22 | import org.apache.hadoop.security.UserGroupInformation; 23 | 24 | import java.io.IOException; 25 | import java.util.Map; 26 | 27 | /** 28 | * This class provides util methods for storm-hdfs connector communicating 29 | * with secured HDFS. 30 | */ 31 | public class HdfsSecurityUtil { 32 | public static final String STORM_KEYTAB_FILE_KEY = "hdfs.keytab.file"; 33 | public static final String STORM_USER_NAME_KEY = "hdfs.kerberos.principal"; 34 | 35 | public static void login(Map conf, Configuration hdfsConfig) throws IOException { 36 | if (UserGroupInformation.isSecurityEnabled()) { 37 | String keytab = (String) conf.get(STORM_KEYTAB_FILE_KEY); 38 | if (keytab != null) { 39 | hdfsConfig.set(STORM_KEYTAB_FILE_KEY, keytab); 40 | } 41 | String userName = (String) conf.get(STORM_USER_NAME_KEY); 42 | if (userName != null) { 43 | hdfsConfig.set(STORM_USER_NAME_KEY, userName); 44 | } 45 | SecurityUtil.login(hdfsConfig, STORM_KEYTAB_FILE_KEY, STORM_USER_NAME_KEY); 46 | } 47 | } 48 | } 49 | -------------------------------------------------------------------------------- /src/test/java/org/apache/storm/hdfs/trident/FixedBatchSpout.java: -------------------------------------------------------------------------------- 1 | package org.apache.storm.hdfs.trident; 2 | 3 | import backtype.storm.Config; 4 | import backtype.storm.task.TopologyContext; 5 | import backtype.storm.tuple.Fields; 6 | import storm.trident.operation.TridentCollector; 7 | import storm.trident.spout.IBatchSpout; 8 | 9 | import java.util.ArrayList; 10 | import java.util.HashMap; 11 | import java.util.List; 12 | import java.util.Map; 13 | 14 | public class FixedBatchSpout implements IBatchSpout { 15 | 16 | Fields fields; 17 | List[] outputs; 18 | int maxBatchSize; 19 | HashMap>> batches = new HashMap>>(); 20 | 21 | public FixedBatchSpout(Fields fields, int maxBatchSize, List... outputs) { 22 | this.fields = fields; 23 | this.outputs = outputs; 24 | this.maxBatchSize = maxBatchSize; 25 | } 26 | 27 | int index = 0; 28 | boolean cycle = false; 29 | 30 | public void setCycle(boolean cycle) { 31 | this.cycle = cycle; 32 | } 33 | 34 | @Override 35 | public void open(Map conf, TopologyContext context) { 36 | index = 0; 37 | } 38 | 39 | @Override 40 | public void emitBatch(long batchId, TridentCollector collector) { 41 | List> batch = this.batches.get(batchId); 42 | if(batch == null){ 43 | batch = new ArrayList>(); 44 | if(index>=outputs.length && cycle) { 45 | index = 0; 46 | } 47 | for(int i=0; i < maxBatchSize; index++, i++) { 48 | if(index == outputs.length){ 49 | index=0; 50 | } 51 | batch.add(outputs[index]); 52 | } 53 | this.batches.put(batchId, batch); 54 | } 55 | for(List list : batch){ 56 | collector.emit(list); 57 | } 58 | } 59 | 60 | @Override 61 | public void ack(long batchId) { 62 | this.batches.remove(batchId); 63 | } 64 | 65 | @Override 66 | public void close() { 67 | } 68 | 69 | @Override 70 | public Map getComponentConfiguration() { 71 | Config conf = new Config(); 72 | conf.setMaxTaskParallelism(1); 73 | return conf; 74 | } 75 | 76 | @Override 77 | public Fields getOutputFields() { 78 | return fields; 79 | } 80 | } -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/format/DefaultFileNameFormat.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.trident.format; 19 | 20 | import java.util.Map; 21 | 22 | 23 | /** 24 | * Creates file names with the following format: 25 | *
26 |  *     {prefix}-{partitionId}-{rotationNum}-{timestamp}{extension}
27 |  * 
28 | * For example: 29 | *
30 |  *     MyBolt-5-7-1390579837830.txt
31 |  * 
32 | * 33 | * By default, prefix is empty and extenstion is ".txt". 34 | * 35 | */ 36 | public class DefaultFileNameFormat implements FileNameFormat { 37 | private int partitionIndex; 38 | private String path = "/storm"; 39 | private String prefix = ""; 40 | private String extension = ".txt"; 41 | 42 | /** 43 | * Overrides the default prefix. 44 | * 45 | * @param prefix 46 | * @return 47 | */ 48 | public DefaultFileNameFormat withPrefix(String prefix){ 49 | this.prefix = prefix; 50 | return this; 51 | } 52 | 53 | /** 54 | * Overrides the default file extension. 55 | * 56 | * @param extension 57 | * @return 58 | */ 59 | public DefaultFileNameFormat withExtension(String extension){ 60 | this.extension = extension; 61 | return this; 62 | } 63 | 64 | public DefaultFileNameFormat withPath(String path){ 65 | this.path = path; 66 | return this; 67 | } 68 | 69 | @Override 70 | public void prepare(Map conf, int partitionIndex, int numPartitions) { 71 | this.partitionIndex = partitionIndex; 72 | 73 | } 74 | 75 | @Override 76 | public String getName(long rotation, long timeStamp) { 77 | return this.prefix + "-" + this.partitionIndex + "-" + rotation + "-" + timeStamp + this.extension; 78 | } 79 | 80 | public String getPath(){ 81 | return this.path; 82 | } 83 | } 84 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/rotation/FileSizeRotationPolicy.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.bolt.rotation; 19 | 20 | 21 | import backtype.storm.tuple.Tuple; 22 | import org.slf4j.Logger; 23 | import org.slf4j.LoggerFactory; 24 | 25 | /** 26 | * File rotation policy that will rotate files when a certain 27 | * file size is reached. 28 | * 29 | * For example: 30 | *
31 |  *     // rotate when files reach 5MB
32 |  *     FileSizeRotationPolicy policy =
33 |  *          new FileSizeRotationPolicy(5.0, Units.MB);
34 |  * 
35 | * 36 | */ 37 | public class FileSizeRotationPolicy implements FileRotationPolicy { 38 | private static final Logger LOG = LoggerFactory.getLogger(FileSizeRotationPolicy.class); 39 | 40 | public static enum Units { 41 | 42 | KB((long)Math.pow(2, 10)), 43 | MB((long)Math.pow(2, 20)), 44 | GB((long)Math.pow(2, 30)), 45 | TB((long)Math.pow(2, 40)); 46 | 47 | private long byteCount; 48 | 49 | private Units(long byteCount){ 50 | this.byteCount = byteCount; 51 | } 52 | 53 | public long getByteCount(){ 54 | return byteCount; 55 | } 56 | } 57 | 58 | private long maxBytes; 59 | 60 | private long lastOffset = 0; 61 | private long currentBytesWritten = 0; 62 | 63 | public FileSizeRotationPolicy(float count, Units units){ 64 | this.maxBytes = (long)(count * units.getByteCount()); 65 | } 66 | 67 | @Override 68 | public boolean mark(Tuple tuple, long offset) { 69 | long diff = offset - this.lastOffset; 70 | this.currentBytesWritten += diff; 71 | this.lastOffset = offset; 72 | return this.currentBytesWritten >= this.maxBytes; 73 | } 74 | 75 | @Override 76 | public void reset() { 77 | this.currentBytesWritten = 0; 78 | this.lastOffset = 0; 79 | } 80 | 81 | } 82 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/rotation/FileSizeRotationPolicy.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.trident.rotation; 19 | 20 | 21 | import org.slf4j.Logger; 22 | import org.slf4j.LoggerFactory; 23 | import storm.trident.tuple.TridentTuple; 24 | 25 | /** 26 | * File rotation policy that will rotate files when a certain 27 | * file size is reached. 28 | * 29 | * For example: 30 | *
31 |  *     // rotate when files reach 5MB
32 |  *     FileSizeRotationPolicy policy =
33 |  *          new FileSizeRotationPolicy(5.0, Units.MB);
34 |  * 
35 | * 36 | */ 37 | public class FileSizeRotationPolicy implements FileRotationPolicy { 38 | private static final Logger LOG = LoggerFactory.getLogger(FileSizeRotationPolicy.class); 39 | 40 | public static enum Units { 41 | 42 | KB((long)Math.pow(2, 10)), 43 | MB((long)Math.pow(2, 20)), 44 | GB((long)Math.pow(2, 30)), 45 | TB((long)Math.pow(2, 40)); 46 | 47 | private long byteCount; 48 | 49 | private Units(long byteCount){ 50 | this.byteCount = byteCount; 51 | } 52 | 53 | public long getByteCount(){ 54 | return byteCount; 55 | } 56 | } 57 | 58 | private long maxBytes; 59 | 60 | private long lastOffset = 0; 61 | private long currentBytesWritten = 0; 62 | 63 | public FileSizeRotationPolicy(float count, Units units){ 64 | this.maxBytes = (long)(count * units.getByteCount()); 65 | } 66 | 67 | @Override 68 | public boolean mark(TridentTuple tuple, long offset) { 69 | long diff = offset - this.lastOffset; 70 | this.currentBytesWritten += diff; 71 | this.lastOffset = offset; 72 | return this.currentBytesWritten >= this.maxBytes; 73 | } 74 | 75 | @Override 76 | public void reset() { 77 | this.currentBytesWritten = 0; 78 | this.lastOffset = 0; 79 | } 80 | 81 | } 82 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/format/DefaultFileNameFormat.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.bolt.format; 19 | 20 | import backtype.storm.task.TopologyContext; 21 | 22 | import java.util.Map; 23 | 24 | 25 | /** 26 | * Creates file names with the following format: 27 | *
28 |  *     {prefix}{componentId}-{taskId}-{rotationNum}-{timestamp}{extension}
29 |  * 
30 | * For example: 31 | *
32 |  *     MyBolt-5-7-1390579837830.txt
33 |  * 
34 | * 35 | * By default, prefix is empty and extenstion is ".txt". 36 | * 37 | */ 38 | public class DefaultFileNameFormat implements FileNameFormat { 39 | private String componentId; 40 | private int taskId; 41 | private String path = "/storm"; 42 | private String prefix = ""; 43 | private String extension = ".txt"; 44 | 45 | /** 46 | * Overrides the default prefix. 47 | * 48 | * @param prefix 49 | * @return 50 | */ 51 | public DefaultFileNameFormat withPrefix(String prefix){ 52 | this.prefix = prefix; 53 | return this; 54 | } 55 | 56 | /** 57 | * Overrides the default file extension. 58 | * 59 | * @param extension 60 | * @return 61 | */ 62 | public DefaultFileNameFormat withExtension(String extension){ 63 | this.extension = extension; 64 | return this; 65 | } 66 | 67 | public DefaultFileNameFormat withPath(String path){ 68 | this.path = path; 69 | return this; 70 | } 71 | 72 | @Override 73 | public void prepare(Map conf, TopologyContext topologyContext) { 74 | this.componentId = topologyContext.getThisComponentId(); 75 | this.taskId = topologyContext.getThisTaskId(); 76 | } 77 | 78 | @Override 79 | public String getName(long rotation, long timeStamp) { 80 | return this.prefix + this.componentId + "-" + this.taskId + "-" + rotation + "-" + timeStamp + this.extension; 81 | } 82 | 83 | public String getPath(){ 84 | return this.path; 85 | } 86 | } 87 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/format/DelimitedRecordFormat.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.trident.format; 19 | 20 | import backtype.storm.tuple.Fields; 21 | import storm.trident.tuple.TridentTuple; 22 | 23 | /** 24 | * RecordFormat implementation that uses field and record delimiters. 25 | * By default uses a comma (",") as the field delimiter and a 26 | * newline ("\n") as the record delimiter. 27 | * 28 | * 29 | */ 30 | public class DelimitedRecordFormat implements RecordFormat { 31 | public static final String DEFAULT_FIELD_DELIMITER = ","; 32 | public static final String DEFAULT_RECORD_DELIMITER = "\n"; 33 | private String fieldDelimiter = DEFAULT_FIELD_DELIMITER; 34 | private String recordDelimiter = DEFAULT_RECORD_DELIMITER; 35 | private Fields fields = null; 36 | 37 | /** 38 | * Only output the specified fields. 39 | * 40 | * @param fields 41 | * @return 42 | */ 43 | public DelimitedRecordFormat withFields(Fields fields){ 44 | this.fields = fields; 45 | return this; 46 | } 47 | 48 | /** 49 | * Overrides the default field delimiter. 50 | * 51 | * @param delimiter 52 | * @return 53 | */ 54 | public DelimitedRecordFormat withFieldDelimiter(String delimiter){ 55 | this.fieldDelimiter = delimiter; 56 | return this; 57 | } 58 | 59 | /** 60 | * Overrides the default record delimiter. 61 | * 62 | * @param delimiter 63 | * @return 64 | */ 65 | public DelimitedRecordFormat withRecordDelimiter(String delimiter){ 66 | this.recordDelimiter = delimiter; 67 | return this; 68 | } 69 | 70 | @Override 71 | public byte[] format(TridentTuple tuple) { 72 | StringBuilder sb = new StringBuilder(); 73 | int size = this.fields.size(); 74 | for(int i = 0; i < size; i++){ 75 | sb.append(tuple.getValueByField(fields.get(i))); 76 | if(i != size - 1){ 77 | sb.append(this.fieldDelimiter); 78 | } 79 | } 80 | sb.append(this.recordDelimiter); 81 | return sb.toString().getBytes(); 82 | } 83 | } 84 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/format/DelimitedRecordFormat.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.bolt.format; 19 | 20 | import backtype.storm.tuple.Fields; 21 | import backtype.storm.tuple.Tuple; 22 | 23 | /** 24 | * RecordFormat implementation that uses field and record delimiters. 25 | * By default uses a comma (",") as the field delimiter and a 26 | * newline ("\n") as the record delimiter. 27 | * 28 | * Also by default, this implementation will output all the 29 | * field values in the tuple in the order they were declared. To 30 | * override this behavior, call withFields() to 31 | * specify which tuple fields to output. 32 | * 33 | */ 34 | public class DelimitedRecordFormat implements RecordFormat { 35 | public static final String DEFAULT_FIELD_DELIMITER = ","; 36 | public static final String DEFAULT_RECORD_DELIMITER = "\n"; 37 | private String fieldDelimiter = DEFAULT_FIELD_DELIMITER; 38 | private String recordDelimiter = DEFAULT_RECORD_DELIMITER; 39 | private Fields fields = null; 40 | 41 | /** 42 | * Only output the specified fields. 43 | * 44 | * @param fields 45 | * @return 46 | */ 47 | public DelimitedRecordFormat withFields(Fields fields){ 48 | this.fields = fields; 49 | return this; 50 | } 51 | 52 | /** 53 | * Overrides the default field delimiter. 54 | * 55 | * @param delimiter 56 | * @return 57 | */ 58 | public DelimitedRecordFormat withFieldDelimiter(String delimiter){ 59 | this.fieldDelimiter = delimiter; 60 | return this; 61 | } 62 | 63 | /** 64 | * Overrides the default record delimiter. 65 | * 66 | * @param delimiter 67 | * @return 68 | */ 69 | public DelimitedRecordFormat withRecordDelimiter(String delimiter){ 70 | this.recordDelimiter = delimiter; 71 | return this; 72 | } 73 | 74 | @Override 75 | public byte[] format(Tuple tuple) { 76 | StringBuilder sb = new StringBuilder(); 77 | Fields fields = this.fields == null ? tuple.getFields() : this.fields; 78 | int size = fields.size(); 79 | for(int i = 0; i < size; i++){ 80 | sb.append(tuple.getValueByField(fields.get(i))); 81 | if(i != size - 1){ 82 | sb.append(this.fieldDelimiter); 83 | } 84 | } 85 | sb.append(this.recordDelimiter); 86 | return sb.toString().getBytes(); 87 | } 88 | } 89 | -------------------------------------------------------------------------------- /src/test/java/org/apache/storm/hdfs/trident/TridentFileTopology.java: -------------------------------------------------------------------------------- 1 | package org.apache.storm.hdfs.trident; 2 | 3 | import backtype.storm.Config; 4 | import backtype.storm.LocalCluster; 5 | import backtype.storm.StormSubmitter; 6 | import backtype.storm.generated.StormTopology; 7 | import backtype.storm.tuple.Fields; 8 | import backtype.storm.tuple.Values; 9 | import org.apache.storm.hdfs.common.rotation.MoveFileAction; 10 | import org.apache.storm.hdfs.trident.format.*; 11 | import org.apache.storm.hdfs.trident.rotation.FileRotationPolicy; 12 | import org.apache.storm.hdfs.trident.rotation.FileSizeRotationPolicy; 13 | import storm.trident.Stream; 14 | import storm.trident.TridentState; 15 | import storm.trident.TridentTopology; 16 | import storm.trident.operation.BaseFunction; 17 | import storm.trident.operation.TridentCollector; 18 | import storm.trident.state.StateFactory; 19 | import storm.trident.tuple.TridentTuple; 20 | 21 | public class TridentFileTopology { 22 | 23 | public static StormTopology buildTopology(String hdfsUrl){ 24 | FixedBatchSpout spout = new FixedBatchSpout(new Fields("sentence", "key"), 1000, new Values("the cow jumped over the moon", 1l), 25 | new Values("the man went to the store and bought some candy", 2l), new Values("four score and seven years ago", 3l), 26 | new Values("how many apples can you eat", 4l), new Values("to be or not to be the person", 5l)); 27 | spout.setCycle(true); 28 | 29 | TridentTopology topology = new TridentTopology(); 30 | Stream stream = topology.newStream("spout1", spout); 31 | 32 | Fields hdfsFields = new Fields("sentence", "key"); 33 | 34 | FileNameFormat fileNameFormat = new DefaultFileNameFormat() 35 | .withPath("/trident") 36 | .withPrefix("trident") 37 | .withExtension(".txt"); 38 | 39 | RecordFormat recordFormat = new DelimitedRecordFormat() 40 | .withFields(hdfsFields); 41 | 42 | FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, FileSizeRotationPolicy.Units.MB); 43 | 44 | HdfsState.Options options = new HdfsState.HdfsFileOptions() 45 | .withFileNameFormat(fileNameFormat) 46 | .withRecordFormat(recordFormat) 47 | .withRotationPolicy(rotationPolicy) 48 | .withFsUrl(hdfsUrl); 49 | 50 | StateFactory factory = new HdfsStateFactory().withOptions(options); 51 | 52 | TridentState state = stream 53 | .partitionPersist(factory, hdfsFields, new HdfsUpdater(), new Fields()); 54 | 55 | return topology.build(); 56 | } 57 | 58 | public static void main(String[] args) throws Exception { 59 | Config conf = new Config(); 60 | conf.setMaxSpoutPending(5); 61 | if (args.length == 1) { 62 | LocalCluster cluster = new LocalCluster(); 63 | cluster.submitTopology("wordCounter", conf, buildTopology(args[0])); 64 | Thread.sleep(120 * 1000); 65 | } 66 | else if(args.length == 2) { 67 | conf.setNumWorkers(3); 68 | StormSubmitter.submitTopology(args[1], conf, buildTopology(args[0])); 69 | } else{ 70 | System.out.println("Usage: TridentFileTopology [topology name]"); 71 | } 72 | } 73 | } 74 | -------------------------------------------------------------------------------- /pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 18 | 19 | 4.0.0 20 | 21 | 22 | org.sonatype.oss 23 | oss-parent 24 | 7 25 | 26 | 27 | com.github.ptgoetz 28 | storm-hdfs 29 | 0.1.3-SNAPSHOT 30 | 31 | 32 | 33 | The Apache Software License, Version 2.0 34 | http://www.apache.org/licenses/LICENSE-2.0.txt 35 | 36 | 37 | 38 | scm:git:git@github.com:ptgoetz/storm-hdfs.git 39 | scm:git:git@github.com:ptgoetz/storm-hdfs.git 40 | :git@github.com:ptgoetz/storm-hdfs.git 41 | 42 | 43 | 44 | 45 | ptgoetz 46 | P. Taylor Goetz 47 | ptgoetz@gmail.com 48 | 49 | 50 | 51 | 52 | 53 | org.apache.storm 54 | storm-core 55 | 0.9.1-incubating 56 | provided 57 | 58 | 59 | org.apache.hadoop 60 | hadoop-client 61 | 2.2.0 62 | 63 | 64 | org.slf4j 65 | slf4j-log4j12 66 | 67 | 68 | 69 | 70 | org.apache.hadoop 71 | hadoop-hdfs 72 | 2.2.0 73 | 74 | 75 | org.slf4j 76 | slf4j-log4j12 77 | 78 | 79 | 80 | 81 | -------------------------------------------------------------------------------- /src/test/java/org/apache/storm/hdfs/trident/TridentSequenceTopology.java: -------------------------------------------------------------------------------- 1 | package org.apache.storm.hdfs.trident; 2 | 3 | import backtype.storm.Config; 4 | import backtype.storm.LocalCluster; 5 | import backtype.storm.StormSubmitter; 6 | import backtype.storm.generated.StormTopology; 7 | import backtype.storm.tuple.Fields; 8 | import backtype.storm.tuple.Values; 9 | import org.apache.storm.hdfs.common.rotation.MoveFileAction; 10 | import org.apache.storm.hdfs.trident.format.*; 11 | import org.apache.storm.hdfs.trident.rotation.FileRotationPolicy; 12 | import org.apache.storm.hdfs.trident.rotation.FileSizeRotationPolicy; 13 | import storm.trident.Stream; 14 | import storm.trident.TridentState; 15 | import storm.trident.TridentTopology; 16 | import storm.trident.operation.BaseFunction; 17 | import storm.trident.operation.TridentCollector; 18 | import storm.trident.state.StateFactory; 19 | import storm.trident.tuple.TridentTuple; 20 | 21 | public class TridentSequenceTopology { 22 | 23 | public static StormTopology buildTopology(String hdfsUrl){ 24 | FixedBatchSpout spout = new FixedBatchSpout(new Fields("sentence", "key"), 1000, new Values("the cow jumped over the moon", 1l), 25 | new Values("the man went to the store and bought some candy", 2l), new Values("four score and seven years ago", 3l), 26 | new Values("how many apples can you eat", 4l), new Values("to be or not to be the person", 5l)); 27 | spout.setCycle(true); 28 | 29 | TridentTopology topology = new TridentTopology(); 30 | Stream stream = topology.newStream("spout1", spout); 31 | 32 | Fields hdfsFields = new Fields("sentence", "key"); 33 | 34 | FileNameFormat fileNameFormat = new DefaultFileNameFormat() 35 | .withPath("/trident") 36 | .withPrefix("trident") 37 | .withExtension(".seq"); 38 | 39 | FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, FileSizeRotationPolicy.Units.MB); 40 | 41 | HdfsState.Options seqOpts = new HdfsState.SequenceFileOptions() 42 | .withFileNameFormat(fileNameFormat) 43 | .withSequenceFormat(new DefaultSequenceFormat("key", "sentence")) 44 | .withRotationPolicy(rotationPolicy) 45 | .withFsUrl(hdfsUrl) 46 | .addRotationAction(new MoveFileAction().toDestination("/dest2/")); 47 | 48 | StateFactory factory = new HdfsStateFactory().withOptions(seqOpts); 49 | 50 | TridentState state = stream 51 | .partitionPersist(factory, hdfsFields, new HdfsUpdater(), new Fields()); 52 | 53 | return topology.build(); 54 | } 55 | 56 | public static void main(String[] args) throws Exception { 57 | Config conf = new Config(); 58 | conf.setMaxSpoutPending(5); 59 | if (args.length == 1) { 60 | LocalCluster cluster = new LocalCluster(); 61 | cluster.submitTopology("wordCounter", conf, buildTopology(args[0])); 62 | Thread.sleep(120 * 1000); 63 | } 64 | else if(args.length == 2) { 65 | conf.setNumWorkers(3); 66 | StormSubmitter.submitTopology(args[1], conf, buildTopology(args[0])); 67 | } else{ 68 | System.out.println("Usage: TridentFileTopology [topology name]"); 69 | } 70 | } 71 | } 72 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/HdfsBolt.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.bolt; 19 | 20 | import backtype.storm.task.OutputCollector; 21 | import backtype.storm.task.TopologyContext; 22 | import backtype.storm.tuple.Tuple; 23 | import org.apache.hadoop.fs.FSDataOutputStream; 24 | import org.apache.hadoop.fs.FileSystem; 25 | import org.apache.hadoop.fs.Path; 26 | import org.apache.hadoop.hdfs.client.HdfsDataOutputStream; 27 | import org.apache.hadoop.hdfs.client.HdfsDataOutputStream.SyncFlag; 28 | import org.apache.storm.hdfs.bolt.format.FileNameFormat; 29 | import org.apache.storm.hdfs.bolt.format.RecordFormat; 30 | import org.apache.storm.hdfs.bolt.rotation.FileRotationPolicy; 31 | import org.apache.storm.hdfs.bolt.sync.SyncPolicy; 32 | import org.apache.storm.hdfs.common.rotation.RotationAction; 33 | import org.slf4j.Logger; 34 | import org.slf4j.LoggerFactory; 35 | 36 | import java.io.IOException; 37 | import java.net.URI; 38 | import java.util.EnumSet; 39 | import java.util.Map; 40 | 41 | public class HdfsBolt extends AbstractHdfsBolt{ 42 | private static final Logger LOG = LoggerFactory.getLogger(HdfsBolt.class); 43 | 44 | private transient FSDataOutputStream out; 45 | private RecordFormat format; 46 | private long offset = 0; 47 | 48 | public HdfsBolt withFsUrl(String fsUrl){ 49 | this.fsUrl = fsUrl; 50 | return this; 51 | } 52 | 53 | public HdfsBolt withConfigKey(String configKey){ 54 | this.configKey = configKey; 55 | return this; 56 | } 57 | 58 | public HdfsBolt withFileNameFormat(FileNameFormat fileNameFormat){ 59 | this.fileNameFormat = fileNameFormat; 60 | return this; 61 | } 62 | 63 | public HdfsBolt withRecordFormat(RecordFormat format){ 64 | this.format = format; 65 | return this; 66 | } 67 | 68 | public HdfsBolt withSyncPolicy(SyncPolicy syncPolicy){ 69 | this.syncPolicy = syncPolicy; 70 | return this; 71 | } 72 | 73 | public HdfsBolt withRotationPolicy(FileRotationPolicy rotationPolicy){ 74 | this.rotationPolicy = rotationPolicy; 75 | return this; 76 | } 77 | 78 | public HdfsBolt addRotationAction(RotationAction action){ 79 | this.rotationActions.add(action); 80 | return this; 81 | } 82 | 83 | @Override 84 | public void doPrepare(Map conf, TopologyContext topologyContext, OutputCollector collector) throws IOException { 85 | LOG.info("Preparing HDFS Bolt..."); 86 | this.fs = FileSystem.get(URI.create(this.fsUrl), hdfsConfig); 87 | } 88 | 89 | @Override 90 | public void execute(Tuple tuple) { 91 | try { 92 | byte[] bytes = this.format.format(tuple); 93 | synchronized (this.writeLock) { 94 | out.write(bytes); 95 | this.offset += bytes.length; 96 | 97 | if (this.syncPolicy.mark(tuple, this.offset)) { 98 | if (this.out instanceof HdfsDataOutputStream) { 99 | ((HdfsDataOutputStream) this.out).hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH)); 100 | } else { 101 | this.out.hsync(); 102 | } 103 | this.syncPolicy.reset(); 104 | } 105 | } 106 | 107 | this.collector.ack(tuple); 108 | 109 | if(this.rotationPolicy.mark(tuple, this.offset)){ 110 | rotateOutputFile(); // synchronized 111 | this.offset = 0; 112 | this.rotationPolicy.reset(); 113 | } 114 | } catch (IOException e) { 115 | LOG.warn("write/sync failed.", e); 116 | this.collector.fail(tuple); 117 | } 118 | } 119 | 120 | @Override 121 | void closeOutputFile() throws IOException { 122 | this.out.close(); 123 | } 124 | 125 | @Override 126 | Path createOutputFile() throws IOException { 127 | Path path = new Path(this.fileNameFormat.getPath(), this.fileNameFormat.getName(this.rotation, System.currentTimeMillis())); 128 | this.out = this.fs.create(path); 129 | return path; 130 | } 131 | } 132 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/SequenceFileBolt.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.bolt; 19 | 20 | import backtype.storm.task.OutputCollector; 21 | import backtype.storm.task.TopologyContext; 22 | import backtype.storm.tuple.Tuple; 23 | import org.apache.hadoop.fs.FileSystem; 24 | import org.apache.hadoop.fs.Path; 25 | import org.apache.hadoop.io.SequenceFile; 26 | import org.apache.hadoop.io.compress.CompressionCodecFactory; 27 | import org.apache.storm.hdfs.bolt.format.FileNameFormat; 28 | import org.apache.storm.hdfs.bolt.format.SequenceFormat; 29 | import org.apache.storm.hdfs.bolt.rotation.FileRotationPolicy; 30 | import org.apache.storm.hdfs.bolt.sync.SyncPolicy; 31 | import org.apache.storm.hdfs.common.rotation.RotationAction; 32 | import org.slf4j.Logger; 33 | import org.slf4j.LoggerFactory; 34 | 35 | import java.io.IOException; 36 | import java.net.URI; 37 | import java.util.Map; 38 | 39 | public class SequenceFileBolt extends AbstractHdfsBolt { 40 | private static final Logger LOG = LoggerFactory.getLogger(SequenceFileBolt.class); 41 | 42 | private SequenceFormat format; 43 | private SequenceFile.CompressionType compressionType = SequenceFile.CompressionType.RECORD; 44 | private transient SequenceFile.Writer writer; 45 | 46 | private String compressionCodec = "default"; 47 | private transient CompressionCodecFactory codecFactory; 48 | 49 | public SequenceFileBolt() { 50 | } 51 | 52 | public SequenceFileBolt withCompressionCodec(String codec){ 53 | this.compressionCodec = codec; 54 | return this; 55 | } 56 | 57 | public SequenceFileBolt withFsUrl(String fsUrl) { 58 | this.fsUrl = fsUrl; 59 | return this; 60 | } 61 | 62 | public SequenceFileBolt withConfigKey(String configKey){ 63 | this.configKey = configKey; 64 | return this; 65 | } 66 | 67 | public SequenceFileBolt withFileNameFormat(FileNameFormat fileNameFormat) { 68 | this.fileNameFormat = fileNameFormat; 69 | return this; 70 | } 71 | 72 | public SequenceFileBolt withSequenceFormat(SequenceFormat format) { 73 | this.format = format; 74 | return this; 75 | } 76 | 77 | public SequenceFileBolt withSyncPolicy(SyncPolicy syncPolicy) { 78 | this.syncPolicy = syncPolicy; 79 | return this; 80 | } 81 | 82 | public SequenceFileBolt withRotationPolicy(FileRotationPolicy rotationPolicy) { 83 | this.rotationPolicy = rotationPolicy; 84 | return this; 85 | } 86 | 87 | public SequenceFileBolt withCompressionType(SequenceFile.CompressionType compressionType){ 88 | this.compressionType = compressionType; 89 | return this; 90 | } 91 | 92 | public SequenceFileBolt addRotationAction(RotationAction action){ 93 | this.rotationActions.add(action); 94 | return this; 95 | } 96 | 97 | @Override 98 | public void doPrepare(Map conf, TopologyContext topologyContext, OutputCollector collector) throws IOException { 99 | LOG.info("Preparing Sequence File Bolt..."); 100 | if (this.format == null) throw new IllegalStateException("SequenceFormat must be specified."); 101 | 102 | this.fs = FileSystem.get(URI.create(this.fsUrl), hdfsConfig); 103 | this.codecFactory = new CompressionCodecFactory(hdfsConfig); 104 | } 105 | 106 | @Override 107 | public void execute(Tuple tuple) { 108 | try { 109 | long offset; 110 | synchronized (this.writeLock) { 111 | this.writer.append(this.format.key(tuple), this.format.value(tuple)); 112 | offset = this.writer.getLength(); 113 | 114 | if (this.syncPolicy.mark(tuple, offset)) { 115 | this.writer.hsync(); 116 | this.syncPolicy.reset(); 117 | } 118 | } 119 | 120 | this.collector.ack(tuple); 121 | if (this.rotationPolicy.mark(tuple, offset)) { 122 | rotateOutputFile(); // synchronized 123 | this.rotationPolicy.reset(); 124 | } 125 | } catch (IOException e) { 126 | LOG.warn("write/sync failed.", e); 127 | this.collector.fail(tuple); 128 | } 129 | 130 | } 131 | 132 | Path createOutputFile() throws IOException { 133 | Path p = new Path(this.fsUrl + this.fileNameFormat.getPath(), this.fileNameFormat.getName(this.rotation, System.currentTimeMillis())); 134 | this.writer = SequenceFile.createWriter( 135 | this.hdfsConfig, 136 | SequenceFile.Writer.file(p), 137 | SequenceFile.Writer.keyClass(this.format.keyClass()), 138 | SequenceFile.Writer.valueClass(this.format.valueClass()), 139 | SequenceFile.Writer.compression(this.compressionType, this.codecFactory.getCodecByName(this.compressionCodec)) 140 | ); 141 | return p; 142 | } 143 | 144 | void closeOutputFile() throws IOException { 145 | this.writer.close(); 146 | } 147 | 148 | 149 | } 150 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/bolt/AbstractHdfsBolt.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.bolt; 19 | 20 | import backtype.storm.task.OutputCollector; 21 | import backtype.storm.task.TopologyContext; 22 | import backtype.storm.topology.OutputFieldsDeclarer; 23 | import backtype.storm.topology.base.BaseRichBolt; 24 | import org.apache.hadoop.conf.Configuration; 25 | import org.apache.hadoop.fs.FileSystem; 26 | import org.apache.hadoop.fs.Path; 27 | import org.apache.storm.hdfs.bolt.format.FileNameFormat; 28 | import org.apache.storm.hdfs.bolt.rotation.FileRotationPolicy; 29 | import org.apache.storm.hdfs.bolt.rotation.TimedRotationPolicy; 30 | import org.apache.storm.hdfs.bolt.sync.SyncPolicy; 31 | import org.apache.storm.hdfs.common.rotation.RotationAction; 32 | import org.apache.storm.hdfs.common.security.HdfsSecurityUtil; 33 | import org.slf4j.Logger; 34 | import org.slf4j.LoggerFactory; 35 | 36 | import java.io.IOException; 37 | import java.util.ArrayList; 38 | import java.util.Map; 39 | import java.util.Timer; 40 | import java.util.TimerTask; 41 | 42 | public abstract class AbstractHdfsBolt extends BaseRichBolt { 43 | private static final Logger LOG = LoggerFactory.getLogger(AbstractHdfsBolt.class); 44 | 45 | protected ArrayList rotationActions = new ArrayList(); 46 | private Path currentFile; 47 | protected OutputCollector collector; 48 | protected transient FileSystem fs; 49 | protected SyncPolicy syncPolicy; 50 | protected FileRotationPolicy rotationPolicy; 51 | protected FileNameFormat fileNameFormat; 52 | protected int rotation = 0; 53 | protected String fsUrl; 54 | protected String configKey; 55 | protected transient Object writeLock; 56 | protected transient Timer rotationTimer; // only used for TimedRotationPolicy 57 | 58 | protected transient Configuration hdfsConfig; 59 | 60 | protected void rotateOutputFile() throws IOException { 61 | LOG.info("Rotating output file..."); 62 | long start = System.currentTimeMillis(); 63 | synchronized (this.writeLock) { 64 | closeOutputFile(); 65 | this.rotation++; 66 | 67 | Path newFile = createOutputFile(); 68 | LOG.info("Performing {} file rotation actions.", this.rotationActions.size()); 69 | for (RotationAction action : this.rotationActions) { 70 | action.execute(this.fs, this.currentFile); 71 | } 72 | this.currentFile = newFile; 73 | } 74 | long time = System.currentTimeMillis() - start; 75 | LOG.info("File rotation took {} ms.", time); 76 | } 77 | 78 | /** 79 | * Marked as final to prevent override. Subclasses should implement the doPrepare() method. 80 | * @param conf 81 | * @param topologyContext 82 | * @param collector 83 | */ 84 | public final void prepare(Map conf, TopologyContext topologyContext, OutputCollector collector){ 85 | this.writeLock = new Object(); 86 | if (this.syncPolicy == null) throw new IllegalStateException("SyncPolicy must be specified."); 87 | if (this.rotationPolicy == null) throw new IllegalStateException("RotationPolicy must be specified."); 88 | if (this.fsUrl == null) { 89 | throw new IllegalStateException("File system URL must be specified."); 90 | } 91 | 92 | this.collector = collector; 93 | this.fileNameFormat.prepare(conf, topologyContext); 94 | this.hdfsConfig = new Configuration(); 95 | Map map = (Map)conf.get(this.configKey); 96 | if(map != null){ 97 | for(String key : map.keySet()){ 98 | this.hdfsConfig.set(key, String.valueOf(map.get(key))); 99 | } 100 | } 101 | 102 | 103 | try{ 104 | HdfsSecurityUtil.login(conf, hdfsConfig); 105 | doPrepare(conf, topologyContext, collector); 106 | this.currentFile = createOutputFile(); 107 | 108 | } catch (Exception e){ 109 | throw new RuntimeException("Error preparing HdfsBolt: " + e.getMessage(), e); 110 | } 111 | 112 | if(this.rotationPolicy instanceof TimedRotationPolicy){ 113 | long interval = ((TimedRotationPolicy)this.rotationPolicy).getInterval(); 114 | this.rotationTimer = new Timer(true); 115 | TimerTask task = new TimerTask() { 116 | @Override 117 | public void run() { 118 | try { 119 | rotateOutputFile(); 120 | } catch(IOException e){ 121 | LOG.warn("IOException during scheduled file rotation.", e); 122 | } 123 | } 124 | }; 125 | this.rotationTimer.scheduleAtFixedRate(task, interval, interval); 126 | } 127 | } 128 | 129 | @Override 130 | public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) { 131 | } 132 | 133 | abstract void closeOutputFile() throws IOException; 134 | 135 | abstract Path createOutputFile() throws IOException; 136 | 137 | abstract void doPrepare(Map conf, TopologyContext topologyContext, OutputCollector collector) throws IOException; 138 | 139 | } 140 | -------------------------------------------------------------------------------- /src/test/java/org/apache/storm/hdfs/bolt/SequenceFileTopology.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.bolt; 19 | 20 | import backtype.storm.Config; 21 | import backtype.storm.LocalCluster; 22 | import backtype.storm.StormSubmitter; 23 | import backtype.storm.spout.SpoutOutputCollector; 24 | import backtype.storm.task.OutputCollector; 25 | import backtype.storm.task.TopologyContext; 26 | import backtype.storm.topology.OutputFieldsDeclarer; 27 | import backtype.storm.topology.TopologyBuilder; 28 | import backtype.storm.topology.base.BaseRichBolt; 29 | import backtype.storm.topology.base.BaseRichSpout; 30 | import backtype.storm.tuple.Fields; 31 | import backtype.storm.tuple.Tuple; 32 | import backtype.storm.tuple.Values; 33 | import org.apache.storm.hdfs.bolt.format.*; 34 | import org.apache.storm.hdfs.bolt.rotation.FileRotationPolicy; 35 | import org.apache.storm.hdfs.bolt.rotation.FileSizeRotationPolicy; 36 | import org.apache.storm.hdfs.bolt.rotation.FileSizeRotationPolicy.Units; 37 | import org.apache.storm.hdfs.bolt.sync.CountSyncPolicy; 38 | import org.apache.storm.hdfs.bolt.sync.SyncPolicy; 39 | import org.apache.storm.hdfs.common.rotation.MoveFileAction; 40 | 41 | import org.apache.hadoop.io.SequenceFile; 42 | 43 | import java.util.HashMap; 44 | import java.util.Map; 45 | import java.util.UUID; 46 | import java.util.concurrent.ConcurrentHashMap; 47 | 48 | public class SequenceFileTopology { 49 | static final String SENTENCE_SPOUT_ID = "sentence-spout"; 50 | static final String BOLT_ID = "my-bolt"; 51 | static final String TOPOLOGY_NAME = "test-topology"; 52 | 53 | public static void main(String[] args) throws Exception { 54 | Config config = new Config(); 55 | config.setNumWorkers(1); 56 | 57 | SentenceSpout spout = new SentenceSpout(); 58 | 59 | // sync the filesystem after every 1k tuples 60 | SyncPolicy syncPolicy = new CountSyncPolicy(1000); 61 | 62 | // rotate files when they reach 5MB 63 | FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, Units.MB); 64 | 65 | FileNameFormat fileNameFormat = new DefaultFileNameFormat() 66 | .withPath("/source/") 67 | .withExtension(".seq"); 68 | 69 | // create sequence format instance. 70 | DefaultSequenceFormat format = new DefaultSequenceFormat("timestamp", "sentence"); 71 | 72 | SequenceFileBolt bolt = new SequenceFileBolt() 73 | .withFsUrl(args[0]) 74 | .withFileNameFormat(fileNameFormat) 75 | .withSequenceFormat(format) 76 | .withRotationPolicy(rotationPolicy) 77 | .withSyncPolicy(syncPolicy) 78 | .withCompressionType(SequenceFile.CompressionType.RECORD) 79 | .withCompressionCodec("deflate") 80 | .addRotationAction(new MoveFileAction().toDestination("/dest/")); 81 | 82 | 83 | 84 | 85 | TopologyBuilder builder = new TopologyBuilder(); 86 | 87 | builder.setSpout(SENTENCE_SPOUT_ID, spout, 1); 88 | // SentenceSpout --> MyBolt 89 | builder.setBolt(BOLT_ID, bolt, 4) 90 | .shuffleGrouping(SENTENCE_SPOUT_ID); 91 | 92 | 93 | if (args.length == 1) { 94 | LocalCluster cluster = new LocalCluster(); 95 | 96 | cluster.submitTopology(TOPOLOGY_NAME, config, builder.createTopology()); 97 | waitForSeconds(120); 98 | cluster.killTopology(TOPOLOGY_NAME); 99 | cluster.shutdown(); 100 | System.exit(0); 101 | } else if(args.length == 2) { 102 | StormSubmitter.submitTopology(args[1], config, builder.createTopology()); 103 | } 104 | } 105 | 106 | public static void waitForSeconds(int seconds) { 107 | try { 108 | Thread.sleep(seconds * 1000); 109 | } catch (InterruptedException e) { 110 | } 111 | } 112 | 113 | 114 | public static class SentenceSpout extends BaseRichSpout { 115 | 116 | 117 | private ConcurrentHashMap pending; 118 | private SpoutOutputCollector collector; 119 | private String[] sentences = { 120 | "my dog has fleas", 121 | "i like cold beverages", 122 | "the dog ate my homework", 123 | "don't have a cow man", 124 | "i don't think i like fleas" 125 | }; 126 | private int index = 0; 127 | private int count = 0; 128 | private long total = 0L; 129 | 130 | public void declareOutputFields(OutputFieldsDeclarer declarer) { 131 | declarer.declare(new Fields("sentence", "timestamp")); 132 | } 133 | 134 | public void open(Map config, TopologyContext context, 135 | SpoutOutputCollector collector) { 136 | this.collector = collector; 137 | this.pending = new ConcurrentHashMap(); 138 | } 139 | 140 | public void nextTuple() { 141 | Values values = new Values(sentences[index], System.currentTimeMillis()); 142 | UUID msgId = UUID.randomUUID(); 143 | this.pending.put(msgId, values); 144 | this.collector.emit(values, msgId); 145 | index++; 146 | if (index >= sentences.length) { 147 | index = 0; 148 | } 149 | count++; 150 | total++; 151 | if(count > 20000){ 152 | count = 0; 153 | System.out.println("Pending count: " + this.pending.size() + ", total: " + this.total); 154 | } 155 | Thread.yield(); 156 | } 157 | 158 | public void ack(Object msgId) { 159 | // System.out.println("ACK"); 160 | this.pending.remove(msgId); 161 | } 162 | 163 | public void fail(Object msgId) { 164 | System.out.println("**** RESENDING FAILED TUPLE"); 165 | this.collector.emit(this.pending.get(msgId), msgId); 166 | } 167 | } 168 | 169 | 170 | public static class MyBolt extends BaseRichBolt { 171 | 172 | private HashMap counts = null; 173 | private OutputCollector collector; 174 | 175 | public void prepare(Map config, TopologyContext context, OutputCollector collector) { 176 | this.counts = new HashMap(); 177 | this.collector = collector; 178 | } 179 | 180 | public void execute(Tuple tuple) { 181 | collector.ack(tuple); 182 | } 183 | 184 | 185 | public void declareOutputFields(OutputFieldsDeclarer declarer) { 186 | // this bolt does not emit anything 187 | } 188 | 189 | @Override 190 | public void cleanup() { 191 | } 192 | } 193 | } 194 | -------------------------------------------------------------------------------- /src/test/java/org/apache/storm/hdfs/bolt/HdfsFileTopology.java: -------------------------------------------------------------------------------- 1 | /** 2 | * Licensed to the Apache Software Foundation (ASF) under one 3 | * or more contributor license agreements. See the NOTICE file 4 | * distributed with this work for additional information 5 | * regarding copyright ownership. The ASF licenses this file 6 | * to you under the Apache License, Version 2.0 (the 7 | * "License"); you may not use this file except in compliance 8 | * with the License. You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | */ 18 | package org.apache.storm.hdfs.bolt; 19 | 20 | import backtype.storm.Config; 21 | import backtype.storm.LocalCluster; 22 | import backtype.storm.StormSubmitter; 23 | import backtype.storm.spout.SpoutOutputCollector; 24 | import backtype.storm.task.OutputCollector; 25 | import backtype.storm.task.TopologyContext; 26 | import backtype.storm.topology.OutputFieldsDeclarer; 27 | import backtype.storm.topology.TopologyBuilder; 28 | import backtype.storm.topology.base.BaseRichBolt; 29 | import backtype.storm.topology.base.BaseRichSpout; 30 | import backtype.storm.tuple.Fields; 31 | import backtype.storm.tuple.Tuple; 32 | import backtype.storm.tuple.Values; 33 | import org.apache.storm.hdfs.bolt.format.DefaultFileNameFormat; 34 | import org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat; 35 | import org.apache.storm.hdfs.bolt.format.FileNameFormat; 36 | import org.apache.storm.hdfs.bolt.format.RecordFormat; 37 | import org.apache.storm.hdfs.bolt.rotation.FileRotationPolicy; 38 | import org.apache.storm.hdfs.bolt.rotation.FileSizeRotationPolicy; 39 | import org.apache.storm.hdfs.bolt.rotation.FileSizeRotationPolicy.Units; 40 | import org.apache.storm.hdfs.bolt.rotation.TimedRotationPolicy; 41 | import org.apache.storm.hdfs.bolt.sync.CountSyncPolicy; 42 | import org.apache.storm.hdfs.bolt.sync.SyncPolicy; 43 | import org.apache.storm.hdfs.common.rotation.MoveFileAction; 44 | import org.yaml.snakeyaml.Yaml; 45 | 46 | import java.io.FileInputStream; 47 | import java.io.InputStream; 48 | import java.util.HashMap; 49 | import java.util.Map; 50 | import java.util.UUID; 51 | import java.util.concurrent.ConcurrentHashMap; 52 | 53 | public class HdfsFileTopology { 54 | static final String SENTENCE_SPOUT_ID = "sentence-spout"; 55 | static final String BOLT_ID = "my-bolt"; 56 | static final String TOPOLOGY_NAME = "test-topology"; 57 | 58 | public static void main(String[] args) throws Exception { 59 | Config config = new Config(); 60 | config.setNumWorkers(1); 61 | 62 | SentenceSpout spout = new SentenceSpout(); 63 | 64 | // sync the filesystem after every 1k tuples 65 | SyncPolicy syncPolicy = new CountSyncPolicy(1000); 66 | 67 | // rotate files when they reach 5MB 68 | FileRotationPolicy rotationPolicy = new TimedRotationPolicy(1.0f, TimedRotationPolicy.TimeUnit.MINUTES); 69 | 70 | FileNameFormat fileNameFormat = new DefaultFileNameFormat() 71 | .withPath("/foo/") 72 | .withExtension(".txt"); 73 | 74 | 75 | 76 | // use "|" instead of "," for field delimiter 77 | RecordFormat format = new DelimitedRecordFormat() 78 | .withFieldDelimiter("|"); 79 | 80 | Yaml yaml = new Yaml(); 81 | InputStream in = new FileInputStream(args[1]); 82 | Map yamlConf = (Map) yaml.load(in); 83 | in.close(); 84 | config.put("hdfs.config", yamlConf); 85 | 86 | HdfsBolt bolt = new HdfsBolt() 87 | .withConfigKey("hdfs.config") 88 | .withFsUrl(args[0]) 89 | .withFileNameFormat(fileNameFormat) 90 | .withRecordFormat(format) 91 | .withRotationPolicy(rotationPolicy) 92 | .withSyncPolicy(syncPolicy) 93 | .addRotationAction(new MoveFileAction().toDestination("/dest2/")); 94 | 95 | TopologyBuilder builder = new TopologyBuilder(); 96 | 97 | builder.setSpout(SENTENCE_SPOUT_ID, spout, 1); 98 | // SentenceSpout --> MyBolt 99 | builder.setBolt(BOLT_ID, bolt, 4) 100 | .shuffleGrouping(SENTENCE_SPOUT_ID); 101 | 102 | if (args.length == 2) { 103 | LocalCluster cluster = new LocalCluster(); 104 | 105 | cluster.submitTopology(TOPOLOGY_NAME, config, builder.createTopology()); 106 | waitForSeconds(120); 107 | cluster.killTopology(TOPOLOGY_NAME); 108 | cluster.shutdown(); 109 | System.exit(0); 110 | } else if (args.length == 3) { 111 | StormSubmitter.submitTopology(args[0], config, builder.createTopology()); 112 | } else{ 113 | System.out.println("Usage: HdfsFileTopology [topology name] "); 114 | } 115 | } 116 | 117 | public static void waitForSeconds(int seconds) { 118 | try { 119 | Thread.sleep(seconds * 1000); 120 | } catch (InterruptedException e) { 121 | } 122 | } 123 | 124 | public static class SentenceSpout extends BaseRichSpout { 125 | private ConcurrentHashMap pending; 126 | private SpoutOutputCollector collector; 127 | private String[] sentences = { 128 | "my dog has fleas", 129 | "i like cold beverages", 130 | "the dog ate my homework", 131 | "don't have a cow man", 132 | "i don't think i like fleas" 133 | }; 134 | private int index = 0; 135 | private int count = 0; 136 | private long total = 0L; 137 | 138 | public void declareOutputFields(OutputFieldsDeclarer declarer) { 139 | declarer.declare(new Fields("sentence", "timestamp")); 140 | } 141 | 142 | public void open(Map config, TopologyContext context, 143 | SpoutOutputCollector collector) { 144 | this.collector = collector; 145 | this.pending = new ConcurrentHashMap(); 146 | } 147 | 148 | public void nextTuple() { 149 | Values values = new Values(sentences[index], System.currentTimeMillis()); 150 | UUID msgId = UUID.randomUUID(); 151 | this.pending.put(msgId, values); 152 | this.collector.emit(values, msgId); 153 | index++; 154 | if (index >= sentences.length) { 155 | index = 0; 156 | } 157 | count++; 158 | total++; 159 | if(count > 20000){ 160 | count = 0; 161 | System.out.println("Pending count: " + this.pending.size() + ", total: " + this.total); 162 | } 163 | Thread.yield(); 164 | } 165 | 166 | public void ack(Object msgId) { 167 | this.pending.remove(msgId); 168 | } 169 | 170 | public void fail(Object msgId) { 171 | System.out.println("**** RESENDING FAILED TUPLE"); 172 | this.collector.emit(this.pending.get(msgId), msgId); 173 | } 174 | } 175 | 176 | public static class MyBolt extends BaseRichBolt { 177 | 178 | private HashMap counts = null; 179 | private OutputCollector collector; 180 | 181 | public void prepare(Map config, TopologyContext context, OutputCollector collector) { 182 | this.counts = new HashMap(); 183 | this.collector = collector; 184 | } 185 | 186 | public void execute(Tuple tuple) { 187 | collector.ack(tuple); 188 | } 189 | 190 | public void declareOutputFields(OutputFieldsDeclarer declarer) { 191 | // this bolt does not emit anything 192 | } 193 | 194 | @Override 195 | public void cleanup() { 196 | } 197 | } 198 | } 199 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {yyyy} {name of copyright owner} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | 203 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Storm HDFS 2 | 3 | Storm components for interacting with HDFS file systems 4 | 5 | 6 | ## Usage 7 | The following example will write pipe("|")-delimited files to the HDFS path hdfs://localhost:54310/foo. After every 8 | 1,000 tuples it will sync filesystem, making that data visible to other HDFS clients. It will rotate files when they 9 | reach 5 megabytes in size. 10 | 11 | ```java 12 | // use "|" instead of "," for field delimiter 13 | RecordFormat format = new DelimitedRecordFormat() 14 | .withFieldDelimiter("|"); 15 | 16 | // sync the filesystem after every 1k tuples 17 | SyncPolicy syncPolicy = new CountSyncPolicy(1000); 18 | 19 | // rotate files when they reach 5MB 20 | FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, Units.MB); 21 | 22 | FileNameFormat fileNameFormat = new DefaultFileNameFormat() 23 | .withPath("/foo/"); 24 | 25 | HdfsBolt bolt = new HdfsBolt() 26 | .withFsUrl("hdfs://localhost:54310") 27 | .withFileNameFormat(fileNameFormat) 28 | .withRecordFormat(format) 29 | .withRotationPolicy(rotationPolicy) 30 | .withSyncPolicy(syncPolicy); 31 | ``` 32 | 33 | ### Packaging a Topology 34 | When packaging your topology, it's important that you use the [maven-shade-plugin]() as opposed to the 35 | [maven-assembly-plugin](). 36 | 37 | The shade plugin provides facilities for merging JAR manifest entries, which the hadoop client leverages for URL scheme 38 | resolution. 39 | 40 | If you experience errors such as the following: 41 | 42 | ``` 43 | java.lang.RuntimeException: Error preparing HdfsBolt: No FileSystem for scheme: hdfs 44 | ``` 45 | 46 | it's an indication that your topology jar file isn't packaged properly. 47 | 48 | If you are using maven to create your topology jar, you should use the following `maven-shade-plugin` configuration to 49 | create your topology jar: 50 | 51 | ```xml 52 | 53 | org.apache.maven.plugins 54 | maven-shade-plugin 55 | 1.4 56 | 57 | true 58 | 59 | 60 | 61 | package 62 | 63 | shade 64 | 65 | 66 | 67 | 69 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | ``` 80 | 81 | ### Specifying a Hadoop Version 82 | By default, storm-hdfs uses the following Hadoop dependencies: 83 | 84 | ```xml 85 | 86 | org.apache.hadoop 87 | hadoop-client 88 | 2.2.0 89 | 90 | 91 | org.slf4j 92 | slf4j-log4j12 93 | 94 | 95 | 96 | 97 | org.apache.hadoop 98 | hadoop-hdfs 99 | 2.2.0 100 | 101 | 102 | org.slf4j 103 | slf4j-log4j12 104 | 105 | 106 | 107 | ``` 108 | 109 | If you are using a different version of Hadoop, you should exclude the Hadoop libraries from the storm-hdfs dependency 110 | and add the dependencies for your preferred version in your pom. 111 | 112 | Hadoop client version incompatibilites can manifest as errors like: 113 | 114 | ``` 115 | com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero) 116 | ``` 117 | 118 | ## Customization 119 | 120 | ### Record Formats 121 | Record format can be controlled by providing an implementation of the `org.apache.storm.hdfs.format.RecordFormat` 122 | interface: 123 | 124 | ```java 125 | public interface RecordFormat extends Serializable { 126 | byte[] format(Tuple tuple); 127 | } 128 | ``` 129 | 130 | The provided `org.apache.storm.hdfs.format.DelimitedRecordFormat` is capable of producing formats such as CSV and 131 | tab-delimited files. 132 | 133 | 134 | ### File Naming 135 | File naming can be controlled by providing an implementation of the `org.apache.storm.hdfs.format.FileNameFormat` 136 | interface: 137 | 138 | ```java 139 | public interface FileNameFormat extends Serializable { 140 | void prepare(Map conf, TopologyContext topologyContext); 141 | String getName(long rotation, long timeStamp); 142 | String getPath(); 143 | } 144 | ``` 145 | 146 | The provided `org.apache.storm.hdfs.format.DefaultFileNameFormat` will create file names with the following format: 147 | 148 | {prefix}{componentId}-{taskId}-{rotationNum}-{timestamp}{extension} 149 | 150 | For example: 151 | 152 | MyBolt-5-7-1390579837830.txt 153 | 154 | By default, prefix is empty and extenstion is ".txt". 155 | 156 | 157 | 158 | ### Sync Policies 159 | Sync policies allow you to control when buffered data is flushed to the underlying filesystem (thus making it available 160 | to clients reading the data) by implementing the `org.apache.storm.hdfs.sync.SyncPolicy` interface: 161 | 162 | ```java 163 | public interface SyncPolicy extends Serializable { 164 | boolean mark(Tuple tuple, long offset); 165 | void reset(); 166 | } 167 | ``` 168 | The `HdfsBolt` will call the `mark()` method for every tuple it processes. Returning `true` will trigger the `HdfsBolt` 169 | to perform a sync/flush, after which it will call the `reset()` method. 170 | 171 | The `org.apache.storm.hdfs.sync.CountSyncPolicy` class simply triggers a sync after the specified number of tuples have 172 | been processed. 173 | 174 | ### File Rotation Policies 175 | Similar to sync policies, file rotation policies allow you to control when data files are rotated by providing a 176 | `org.apache.storm.hdfs.rotation.FileRotation` interface: 177 | 178 | ```java 179 | public interface FileRotationPolicy extends Serializable { 180 | boolean mark(Tuple tuple, long offset); 181 | void reset(); 182 | } 183 | ``` 184 | 185 | The `org.apache.storm.hdfs.rotation.FileSizeRotationPolicy` implementation allows you to trigger file rotation when 186 | data files reach a specific file size: 187 | 188 | ```java 189 | FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, Units.MB); 190 | ``` 191 | 192 | ### File Rotation Actions 193 | Both the HDFS bolt and Trident State implementation allow you to register any number of `RotationAction`s. 194 | What `RotationAction`s do is provide a hook to allow you to perform some action right after a file is rotated. For 195 | example, moving a file to a different location or renaming it. 196 | 197 | 198 | ```java 199 | public interface RotationAction extends Serializable { 200 | void execute(FileSystem fileSystem, Path filePath) throws IOException; 201 | } 202 | ``` 203 | 204 | Storm-HDFS includes a simple action that will move a file after rotation: 205 | 206 | ```java 207 | public class MoveFileAction implements RotationAction { 208 | private static final Logger LOG = LoggerFactory.getLogger(MoveFileAction.class); 209 | 210 | private String destination; 211 | 212 | public MoveFileAction withDestination(String destDir){ 213 | destination = destDir; 214 | return this; 215 | } 216 | 217 | @Override 218 | public void execute(FileSystem fileSystem, Path filePath) throws IOException { 219 | Path destPath = new Path(destination, filePath.getName()); 220 | LOG.info("Moving file {} to {}", filePath, destPath); 221 | boolean success = fileSystem.rename(filePath, destPath); 222 | return; 223 | } 224 | } 225 | ``` 226 | 227 | If you are using Trident and sequence files you can do something like this: 228 | 229 | ```java 230 | HdfsState.Options seqOpts = new HdfsState.SequenceFileOptions() 231 | .withFileNameFormat(fileNameFormat) 232 | .withSequenceFormat(new DefaultSequenceFormat("key", "data")) 233 | .withRotationPolicy(rotationPolicy) 234 | .withFsUrl("hdfs://localhost:54310") 235 | .addRotationAction(new MoveFileAction().withDestination("/dest2/")); 236 | ``` 237 | 238 | 239 | ## Support for HDFS Sequence Files 240 | 241 | The `org.apache.storm.hdfs.bolt.SequenceFileBolt` class allows you to write storm data to HDFS sequence files: 242 | 243 | ```java 244 | // sync the filesystem after every 1k tuples 245 | SyncPolicy syncPolicy = new CountSyncPolicy(1000); 246 | 247 | // rotate files when they reach 5MB 248 | FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, Units.MB); 249 | 250 | FileNameFormat fileNameFormat = new DefaultFileNameFormat() 251 | .withExtension(".seq") 252 | .withPath("/data/"); 253 | 254 | // create sequence format instance. 255 | DefaultSequenceFormat format = new DefaultSequenceFormat("timestamp", "sentence"); 256 | 257 | SequenceFileBolt bolt = new SequenceFileBolt() 258 | .withFsUrl("hdfs://localhost:54310") 259 | .withFileNameFormat(fileNameFormat) 260 | .withSequenceFormat(format) 261 | .withRotationPolicy(rotationPolicy) 262 | .withSyncPolicy(syncPolicy) 263 | .withCompressionType(SequenceFile.CompressionType.RECORD) 264 | .withCompressionCodec("deflate"); 265 | ``` 266 | 267 | The `SequenceFileBolt` requires that you provide a `org.apache.storm.hdfs.bolt.format.SequenceFormat` that maps tuples to 268 | key/value pairs: 269 | 270 | ```java 271 | public interface SequenceFormat extends Serializable { 272 | Class keyClass(); 273 | Class valueClass(); 274 | 275 | Writable key(Tuple tuple); 276 | Writable value(Tuple tuple); 277 | } 278 | ``` 279 | 280 | ## Trident API 281 | storm-hdfs also includes a Trident `state` implementation for writing data to HDFS, with an API that closely mirrors 282 | that of the bolts. 283 | 284 | ```java 285 | Fields hdfsFields = new Fields("field1", "field2"); 286 | 287 | FileNameFormat fileNameFormat = new DefaultFileNameFormat() 288 | .withPath("/trident") 289 | .withPrefix("trident") 290 | .withExtension(".txt"); 291 | 292 | RecordFormat recordFormat = new DelimitedRecordFormat() 293 | .withFields(hdfsFields); 294 | 295 | FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, FileSizeRotationPolicy.Units.MB); 296 | 297 | HdfsState.Options options = new HdfsState.HdfsFileOptions() 298 | .withFileNameFormat(fileNameFormat) 299 | .withRecordFormat(recordFormat) 300 | .withRotationPolicy(rotationPolicy) 301 | .withFsUrl("hdfs://localhost:54310"); 302 | 303 | StateFactory factory = new HdfsStateFactory().withOptions(options); 304 | 305 | TridentState state = stream 306 | .partitionPersist(factory, hdfsFields, new HdfsUpdater(), new Fields()); 307 | ``` 308 | 309 | To use the sequence file `State` implementation, use the `HdfsState.SequenceFileOptions`: 310 | 311 | ```java 312 | HdfsState.Options seqOpts = new HdfsState.SequenceFileOptions() 313 | .withFileNameFormat(fileNameFormat) 314 | .withSequenceFormat(new DefaultSequenceFormat("key", "data")) 315 | .withRotationPolicy(rotationPolicy) 316 | .withFsUrl("hdfs://localhost:54310") 317 | .addRotationAction(new MoveFileAction().toDestination("/dest2/")); 318 | ``` 319 | 320 | 321 | ## License 322 | 323 | Licensed to the Apache Software Foundation (ASF) under one 324 | or more contributor license agreements. See the NOTICE file 325 | distributed with this work for additional information 326 | regarding copyright ownership. The ASF licenses this file 327 | to you under the Apache License, Version 2.0 (the 328 | "License"); you may not use this file except in compliance 329 | with the License. You may obtain a copy of the License at 330 | 331 | http://www.apache.org/licenses/LICENSE-2.0 332 | 333 | Unless required by applicable law or agreed to in writing, 334 | software distributed under the License is distributed on an 335 | "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 336 | KIND, either express or implied. See the License for the 337 | specific language governing permissions and limitations 338 | under the License. 339 | -------------------------------------------------------------------------------- /src/main/java/org/apache/storm/hdfs/trident/HdfsState.java: -------------------------------------------------------------------------------- 1 | package org.apache.storm.hdfs.trident; 2 | 3 | import backtype.storm.task.IMetricsContext; 4 | import backtype.storm.topology.FailedException; 5 | import org.apache.hadoop.conf.Configuration; 6 | import org.apache.hadoop.fs.FSDataOutputStream; 7 | import org.apache.hadoop.fs.FileSystem; 8 | import org.apache.hadoop.fs.Path; 9 | import org.apache.hadoop.hdfs.client.HdfsDataOutputStream; 10 | import org.apache.hadoop.io.SequenceFile; 11 | import org.apache.hadoop.io.compress.CompressionCodecFactory; 12 | import org.apache.storm.hdfs.common.rotation.RotationAction; 13 | import org.apache.storm.hdfs.common.security.HdfsSecurityUtil; 14 | import org.apache.storm.hdfs.trident.format.FileNameFormat; 15 | import org.apache.storm.hdfs.trident.format.RecordFormat; 16 | import org.apache.storm.hdfs.trident.format.SequenceFormat; 17 | import org.apache.storm.hdfs.trident.rotation.FileRotationPolicy; 18 | 19 | import org.apache.storm.hdfs.trident.rotation.TimedRotationPolicy; 20 | import org.slf4j.Logger; 21 | import org.slf4j.LoggerFactory; 22 | import storm.trident.operation.TridentCollector; 23 | import storm.trident.state.State; 24 | import storm.trident.tuple.TridentTuple; 25 | 26 | import java.io.IOException; 27 | import java.io.Serializable; 28 | import java.net.URI; 29 | import java.util.*; 30 | 31 | public class HdfsState implements State { 32 | 33 | public static abstract class Options implements Serializable { 34 | 35 | protected String fsUrl; 36 | protected String configKey; 37 | protected transient FileSystem fs; 38 | private Path currentFile; 39 | protected FileRotationPolicy rotationPolicy; 40 | protected FileNameFormat fileNameFormat; 41 | protected int rotation = 0; 42 | protected transient Configuration hdfsConfig; 43 | protected ArrayList rotationActions = new ArrayList(); 44 | protected transient Object writeLock; 45 | protected transient Timer rotationTimer; 46 | 47 | abstract void closeOutputFile() throws IOException; 48 | 49 | abstract Path createOutputFile() throws IOException; 50 | 51 | abstract void execute(List tuples) throws IOException; 52 | 53 | abstract void doPrepare(Map conf, int partitionIndex, int numPartitions) throws IOException; 54 | 55 | protected void rotateOutputFile() throws IOException { 56 | LOG.info("Rotating output file..."); 57 | long start = System.currentTimeMillis(); 58 | synchronized (this.writeLock) { 59 | closeOutputFile(); 60 | this.rotation++; 61 | 62 | Path newFile = createOutputFile(); 63 | LOG.info("Performing {} file rotation actions.", this.rotationActions.size()); 64 | for (RotationAction action : this.rotationActions) { 65 | action.execute(this.fs, this.currentFile); 66 | } 67 | this.currentFile = newFile; 68 | } 69 | long time = System.currentTimeMillis() - start; 70 | LOG.info("File rotation took {} ms.", time); 71 | 72 | 73 | } 74 | 75 | void prepare(Map conf, int partitionIndex, int numPartitions){ 76 | this.writeLock = new Object(); 77 | if (this.rotationPolicy == null) throw new IllegalStateException("RotationPolicy must be specified."); 78 | if (this.fsUrl == null) { 79 | throw new IllegalStateException("File system URL must be specified."); 80 | } 81 | this.fileNameFormat.prepare(conf, partitionIndex, numPartitions); 82 | this.hdfsConfig = new Configuration(); 83 | Map map = (Map)conf.get(this.configKey); 84 | if(map != null){ 85 | for(String key : map.keySet()){ 86 | this.hdfsConfig.set(key, String.valueOf(map.get(key))); 87 | } 88 | } 89 | try{ 90 | HdfsSecurityUtil.login(conf, hdfsConfig); 91 | doPrepare(conf, partitionIndex, numPartitions); 92 | this.currentFile = createOutputFile(); 93 | 94 | } catch (Exception e){ 95 | throw new RuntimeException("Error preparing HdfsState: " + e.getMessage(), e); 96 | } 97 | 98 | if(this.rotationPolicy instanceof TimedRotationPolicy){ 99 | long interval = ((TimedRotationPolicy)this.rotationPolicy).getInterval(); 100 | this.rotationTimer = new Timer(true); 101 | TimerTask task = new TimerTask() { 102 | @Override 103 | public void run() { 104 | try { 105 | rotateOutputFile(); 106 | } catch(IOException e){ 107 | LOG.warn("IOException during scheduled file rotation.", e); 108 | } 109 | } 110 | }; 111 | this.rotationTimer.scheduleAtFixedRate(task, interval, interval); 112 | } 113 | } 114 | 115 | } 116 | 117 | public static class HdfsFileOptions extends Options { 118 | 119 | private transient FSDataOutputStream out; 120 | protected RecordFormat format; 121 | private long offset = 0; 122 | 123 | public HdfsFileOptions withFsUrl(String fsUrl){ 124 | this.fsUrl = fsUrl; 125 | return this; 126 | } 127 | 128 | public HdfsFileOptions withConfigKey(String configKey){ 129 | this.configKey = configKey; 130 | return this; 131 | } 132 | 133 | public HdfsFileOptions withFileNameFormat(FileNameFormat fileNameFormat){ 134 | this.fileNameFormat = fileNameFormat; 135 | return this; 136 | } 137 | 138 | public HdfsFileOptions withRecordFormat(RecordFormat format){ 139 | this.format = format; 140 | return this; 141 | } 142 | 143 | public HdfsFileOptions withRotationPolicy(FileRotationPolicy rotationPolicy){ 144 | this.rotationPolicy = rotationPolicy; 145 | return this; 146 | } 147 | 148 | public HdfsFileOptions addRotationAction(RotationAction action){ 149 | this.rotationActions.add(action); 150 | return this; 151 | } 152 | 153 | @Override 154 | void doPrepare(Map conf, int partitionIndex, int numPartitions) throws IOException { 155 | LOG.info("Preparing HDFS Bolt..."); 156 | this.fs = FileSystem.get(URI.create(this.fsUrl), hdfsConfig); 157 | } 158 | 159 | @Override 160 | void closeOutputFile() throws IOException { 161 | this.out.close(); 162 | } 163 | 164 | @Override 165 | Path createOutputFile() throws IOException { 166 | Path path = new Path(this.fileNameFormat.getPath(), this.fileNameFormat.getName(this.rotation, System.currentTimeMillis())); 167 | this.out = this.fs.create(path); 168 | return path; 169 | } 170 | 171 | @Override 172 | public void execute(List tuples) throws IOException { 173 | boolean rotated = false; 174 | synchronized (this.writeLock) { 175 | for (TridentTuple tuple : tuples) { 176 | byte[] bytes = this.format.format(tuple); 177 | out.write(bytes); 178 | this.offset += bytes.length; 179 | 180 | if (this.rotationPolicy.mark(tuple, this.offset)) { 181 | rotateOutputFile(); 182 | this.offset = 0; 183 | this.rotationPolicy.reset(); 184 | rotated = true; 185 | } 186 | } 187 | if (!rotated) { 188 | if (this.out instanceof HdfsDataOutputStream) { 189 | ((HdfsDataOutputStream) this.out).hsync(EnumSet.of(HdfsDataOutputStream.SyncFlag.UPDATE_LENGTH)); 190 | } else { 191 | this.out.hsync(); 192 | } 193 | } 194 | } 195 | } 196 | } 197 | 198 | public static class SequenceFileOptions extends Options { 199 | private SequenceFormat format; 200 | private SequenceFile.CompressionType compressionType = SequenceFile.CompressionType.RECORD; 201 | private transient SequenceFile.Writer writer; 202 | private String compressionCodec = "default"; 203 | private transient CompressionCodecFactory codecFactory; 204 | 205 | public SequenceFileOptions withCompressionCodec(String codec){ 206 | this.compressionCodec = codec; 207 | return this; 208 | } 209 | 210 | public SequenceFileOptions withFsUrl(String fsUrl) { 211 | this.fsUrl = fsUrl; 212 | return this; 213 | } 214 | 215 | public SequenceFileOptions withConfigKey(String configKey){ 216 | this.configKey = configKey; 217 | return this; 218 | } 219 | 220 | public SequenceFileOptions withFileNameFormat(FileNameFormat fileNameFormat) { 221 | this.fileNameFormat = fileNameFormat; 222 | return this; 223 | } 224 | 225 | public SequenceFileOptions withSequenceFormat(SequenceFormat format) { 226 | this.format = format; 227 | return this; 228 | } 229 | 230 | public SequenceFileOptions withRotationPolicy(FileRotationPolicy rotationPolicy) { 231 | this.rotationPolicy = rotationPolicy; 232 | return this; 233 | } 234 | 235 | public SequenceFileOptions withCompressionType(SequenceFile.CompressionType compressionType){ 236 | this.compressionType = compressionType; 237 | return this; 238 | } 239 | 240 | public SequenceFileOptions addRotationAction(RotationAction action){ 241 | this.rotationActions.add(action); 242 | return this; 243 | } 244 | 245 | @Override 246 | void doPrepare(Map conf, int partitionIndex, int numPartitions) throws IOException { 247 | LOG.info("Preparing Sequence File State..."); 248 | if (this.format == null) throw new IllegalStateException("SequenceFormat must be specified."); 249 | 250 | this.fs = FileSystem.get(URI.create(this.fsUrl), hdfsConfig); 251 | this.codecFactory = new CompressionCodecFactory(hdfsConfig); 252 | } 253 | 254 | @Override 255 | Path createOutputFile() throws IOException { 256 | Path p = new Path(this.fsUrl + this.fileNameFormat.getPath(), this.fileNameFormat.getName(this.rotation, System.currentTimeMillis())); 257 | this.writer = SequenceFile.createWriter( 258 | this.hdfsConfig, 259 | SequenceFile.Writer.file(p), 260 | SequenceFile.Writer.keyClass(this.format.keyClass()), 261 | SequenceFile.Writer.valueClass(this.format.valueClass()), 262 | SequenceFile.Writer.compression(this.compressionType, this.codecFactory.getCodecByName(this.compressionCodec)) 263 | ); 264 | return p; 265 | } 266 | 267 | @Override 268 | void closeOutputFile() throws IOException { 269 | this.writer.close(); 270 | } 271 | 272 | @Override 273 | public void execute(List tuples) throws IOException { 274 | long offset; 275 | for(TridentTuple tuple : tuples) { 276 | synchronized (this.writeLock) { 277 | this.writer.append(this.format.key(tuple), this.format.value(tuple)); 278 | offset = this.writer.getLength(); 279 | } 280 | 281 | if (this.rotationPolicy.mark(tuple, offset)) { 282 | rotateOutputFile(); 283 | this.rotationPolicy.reset(); 284 | } 285 | } 286 | } 287 | 288 | } 289 | 290 | public static final Logger LOG = LoggerFactory.getLogger(HdfsState.class); 291 | private Options options; 292 | 293 | HdfsState(Options options){ 294 | this.options = options; 295 | } 296 | 297 | void prepare(Map conf, IMetricsContext metrics, int partitionIndex, int numPartitions){ 298 | this.options.prepare(conf, partitionIndex, numPartitions); 299 | } 300 | 301 | @Override 302 | public void beginCommit(Long txId) { 303 | } 304 | 305 | @Override 306 | public void commit(Long txId) { 307 | } 308 | 309 | public void updateState(List tuples, TridentCollector tridentCollector){ 310 | try{ 311 | this.options.execute(tuples); 312 | } catch (IOException e){ 313 | LOG.warn("Failing batch due to IOException.", e); 314 | throw new FailedException(e); 315 | } 316 | } 317 | } 318 | --------------------------------------------------------------------------------