├── .nojekyll ├── en ├── case_study │ └── 1.md ├── images │ └── .gitkeep ├── configuration │ ├── input-plugin.md │ ├── filter-plugin.md │ ├── output-plugin.md │ ├── input-plugins │ │ ├── Socket.md │ │ ├── S3.md │ │ ├── Hdfs.md │ │ ├── File.md │ │ ├── MySQL.md │ │ ├── Jdbc.md │ │ ├── Alluxio.md │ │ └── RedisStream.md │ ├── filter-plugins │ │ ├── Remove.md │ │ ├── Uuid.md │ │ ├── Add.md │ │ ├── Repartition.md │ │ ├── Uppercase.md │ │ ├── Drop.md │ │ ├── Lowercase.md │ │ ├── Sample.md │ │ ├── Convert.md │ │ ├── Truncate.md │ │ ├── Watermark.md │ │ ├── Checksum.md │ │ ├── Join.md │ │ ├── Replace.md │ │ ├── Sql.md │ │ └── Split.md │ └── output-plugins │ │ ├── Stdout.md │ │ ├── MySQL.md │ │ ├── Kafka.md │ │ ├── Jdbc.md │ │ ├── File.md │ │ ├── S3.md │ │ └── Hdfs.md └── _sidebar.md ├── zh-cn ├── images │ ├── .gitkeep │ ├── hdfs2ch.jpg │ ├── hive-logo.png │ ├── sina-logo.png │ ├── wd-struct.png │ ├── flink │ │ ├── yarn.jpg │ │ ├── standalone.jpg │ │ └── flink-console.png │ ├── sougou-logo.png │ ├── wd-workflow.png │ ├── qutoutiao-logo.jpg │ ├── bytedance-logo.jpeg │ ├── fendan-keji-logo.jpeg │ ├── shuidichou-logo.jpg │ ├── yonghuiyunchuang-logo.png │ ├── wechat-qrcode │ │ ├── kid-xiong.jpeg │ │ ├── rickyhuo.jpeg │ │ └── garyelephant.jpeg │ └── zhejiang_lekong_xinxi_keji-logo.jpg ├── v1 │ ├── performance-tunning.md │ ├── roadmap.md │ ├── configuration │ │ ├── input-plugins │ │ │ ├── File.docs │ │ │ ├── S3Stream.docs │ │ │ ├── FileStream.docs │ │ │ ├── HdfsStream.docs │ │ │ ├── SocketStream.docs │ │ │ ├── Hive.docs │ │ │ ├── Kudu.docs │ │ │ ├── KafkaStream.docs │ │ │ ├── MongoDB.docs │ │ │ ├── FakeStream.docs │ │ │ ├── SocketStream.md │ │ │ ├── S3Stream.md │ │ │ ├── Kudu.md │ │ │ ├── FileStream.md │ │ │ ├── HdfsStream.md │ │ │ ├── Hive.md │ │ │ ├── Tidb.md │ │ │ ├── Redis.md │ │ │ ├── File.md │ │ │ ├── Elasticsearch.md │ │ │ ├── Hdfs.md │ │ │ ├── RedisStream.md │ │ │ ├── MongoDB.md │ │ │ └── MySQL.md │ │ ├── filter-plugins │ │ │ ├── Remove.docs │ │ │ ├── Repartition.docs │ │ │ ├── Uuid.docs │ │ │ ├── Add.docs │ │ │ ├── Convert.docs │ │ │ ├── Rename.docs │ │ │ ├── Sql.docs │ │ │ ├── Sample.docs │ │ │ ├── Drop.docs │ │ │ ├── Lowercase.docs │ │ │ ├── Uppercase.docs │ │ │ ├── Checksum.docs │ │ │ ├── Truncate.docs │ │ │ ├── Split.docs │ │ │ ├── Replace.docs │ │ │ ├── Json.docs │ │ │ ├── Grok.docs │ │ │ ├── Script.docs │ │ │ ├── Kv.docs │ │ │ ├── Date.docs │ │ │ ├── Remove.md │ │ │ ├── Repartition.md │ │ │ ├── Uuid.md │ │ │ ├── Add.md │ │ │ ├── Table.docs │ │ │ ├── Drop.md │ │ │ ├── Sample.md │ │ │ ├── Rename.md │ │ │ ├── Lowercase.md │ │ │ ├── Uppercase.md │ │ │ ├── Convert.md │ │ │ ├── Urldecode.md │ │ │ ├── Urlencode.md │ │ │ ├── Truncate.md │ │ │ ├── Checksum.md │ │ │ ├── Join.md │ │ │ ├── Replace.md │ │ │ ├── Sql.md │ │ │ ├── Watermark.md │ │ │ ├── Split.md │ │ │ ├── Table.md │ │ │ └── Script.md │ │ ├── output-plugins │ │ │ ├── Kafka.docs │ │ │ ├── MongoDB.docs │ │ │ ├── MySQL.docs │ │ │ ├── Kudu.docs │ │ │ ├── Elasticsearch.docs │ │ │ ├── Jdbc.docs │ │ │ ├── Clickhouse.docs │ │ │ ├── Stdout.md │ │ │ ├── Kudu.md │ │ │ ├── MySQL.md │ │ │ ├── File.md │ │ │ ├── S3.md │ │ │ ├── Tidb.md │ │ │ ├── Hdfs.md │ │ │ └── Hive.md │ │ ├── output-plugin.md │ │ ├── input-plugin.md │ │ └── filter-plugin.md │ ├── _sidebar.md │ ├── installation.md │ ├── internal.md │ ├── deployment.md │ └── contribution.md ├── v2 │ ├── flink │ │ ├── README.md │ │ ├── commands │ │ │ ├── README.md │ │ │ ├── _sidebar.md │ │ │ └── start-waterdrop-flink.sh.md │ │ ├── developing-plugin.md │ │ ├── configuration │ │ │ ├── transform-plugins │ │ │ │ ├── Sql.md │ │ │ │ ├── Split.md │ │ │ │ ├── _sidebar.md │ │ │ │ └── README.md │ │ │ ├── sink-plugins │ │ │ │ ├── Console.md │ │ │ │ ├── README.md │ │ │ │ ├── File.md │ │ │ │ ├── Kafka.md │ │ │ │ ├── _sidebar.md │ │ │ │ ├── Jdbc.md │ │ │ │ └── Elasticsearch.md │ │ │ ├── source-plugins │ │ │ │ ├── Fake.md │ │ │ │ ├── Socket.md │ │ │ │ ├── README.md │ │ │ │ ├── Jdbc.md │ │ │ │ ├── _sidebar.md │ │ │ │ └── File.md │ │ │ ├── _sidebar.md │ │ │ └── ConfigExamples.md │ │ ├── deployment.md │ │ ├── _sidebar.md │ │ └── installation.md │ ├── spark │ │ ├── README.md │ │ ├── commands │ │ │ ├── README.md │ │ │ ├── start-waterdrop-spark.sh.md │ │ │ └── _sidebar.md │ │ ├── developing-plugin.md │ │ ├── configuration │ │ │ ├── ConfigExamples.md │ │ │ ├── source-plugins │ │ │ │ ├── Fake.md │ │ │ │ ├── README.md │ │ │ │ ├── SocketStream.md │ │ │ │ ├── FakeStream.md │ │ │ │ ├── Phoenix.md │ │ │ │ ├── Hive.md │ │ │ │ ├── KafkaStream.md │ │ │ │ ├── Elasticsearch.md │ │ │ │ ├── _sidebar.md │ │ │ │ └── Jdbc.md │ │ │ ├── sink-plugins │ │ │ │ ├── README.md │ │ │ │ ├── Console.md │ │ │ │ ├── Phoenix.md │ │ │ │ ├── _sidebar.md │ │ │ │ ├── File.md │ │ │ │ └── Hdfs.md │ │ │ ├── _sidebar.md │ │ │ └── transform-plugins │ │ │ │ ├── _sidebar.md │ │ │ │ ├── README.md │ │ │ │ ├── Sql.md │ │ │ │ └── Split.md │ │ ├── installation.md │ │ ├── _sidebar.md │ │ └── deployment.md │ ├── roadmap.md │ ├── internal.md │ ├── _sidebar.md │ ├── contribution.md │ └── diff_v1_v2.md └── case_study │ ├── _sidebar.md │ └── README.md ├── start-doc.sh ├── .gitignore ├── _navbar.md ├── docsify_guide.md ├── _coverpage.md └── index.html /.nojekyll: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /en/case_study/1.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /en/images/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /zh-cn/images/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /start-doc.sh: -------------------------------------------------------------------------------- 1 | python -m SimpleHTTPServer 8090 -------------------------------------------------------------------------------- /zh-cn/v1/performance-tunning.md: -------------------------------------------------------------------------------- 1 | # 性能调优 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Intellij Idea files 2 | .idea/ 3 | *.iml -------------------------------------------------------------------------------- /zh-cn/v2/flink/README.md: -------------------------------------------------------------------------------- 1 | ## seatunnel v2.x For Flink 2 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/README.md: -------------------------------------------------------------------------------- 1 | ## seatunnel v2.x For Spark 2 | -------------------------------------------------------------------------------- /en/configuration/input-plugin.md: -------------------------------------------------------------------------------- 1 | # Input Plugin 2 | 3 | ## Introduction 4 | 5 | ## list 6 | -------------------------------------------------------------------------------- /en/configuration/filter-plugin.md: -------------------------------------------------------------------------------- 1 | # Filter Plugin 2 | 3 | ## Introduction 4 | 5 | ## list 6 | -------------------------------------------------------------------------------- /en/configuration/output-plugin.md: -------------------------------------------------------------------------------- 1 | # Output Plugin 2 | 3 | ## Introduction 4 | 5 | ## list 6 | -------------------------------------------------------------------------------- /zh-cn/images/hdfs2ch.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/hdfs2ch.jpg -------------------------------------------------------------------------------- /zh-cn/v2/flink/commands/README.md: -------------------------------------------------------------------------------- 1 | ## 命令使用说明 2 | 3 | 1. bin/start-seatunnel-flink.sh seatunnel 的 flink 启动命令 4 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/commands/README.md: -------------------------------------------------------------------------------- 1 | ## 命令使用说明 2 | 3 | 1. bin/start-seatunnel-spark.sh seatunnel 的 spark 启动命令 4 | -------------------------------------------------------------------------------- /zh-cn/images/hive-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/hive-logo.png -------------------------------------------------------------------------------- /zh-cn/images/sina-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/sina-logo.png -------------------------------------------------------------------------------- /zh-cn/images/wd-struct.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/wd-struct.png -------------------------------------------------------------------------------- /zh-cn/images/flink/yarn.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/flink/yarn.jpg -------------------------------------------------------------------------------- /zh-cn/images/sougou-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/sougou-logo.png -------------------------------------------------------------------------------- /zh-cn/images/wd-workflow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/wd-workflow.png -------------------------------------------------------------------------------- /_navbar.md: -------------------------------------------------------------------------------- 1 | - [v1.x 中文文档](/zh-cn/v1/) 2 | - [v2.x 中文文档](/zh-cn/v2/) 3 | - [v1.x En Docs](/en/) 4 | - [v2.x En Docs](/en/) 5 | -------------------------------------------------------------------------------- /zh-cn/images/qutoutiao-logo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/qutoutiao-logo.jpg -------------------------------------------------------------------------------- /zh-cn/images/bytedance-logo.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/bytedance-logo.jpeg -------------------------------------------------------------------------------- /zh-cn/images/fendan-keji-logo.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/fendan-keji-logo.jpeg -------------------------------------------------------------------------------- /zh-cn/images/flink/standalone.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/flink/standalone.jpg -------------------------------------------------------------------------------- /zh-cn/images/shuidichou-logo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/shuidichou-logo.jpg -------------------------------------------------------------------------------- /zh-cn/images/flink/flink-console.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/flink/flink-console.png -------------------------------------------------------------------------------- /zh-cn/images/yonghuiyunchuang-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/yonghuiyunchuang-logo.png -------------------------------------------------------------------------------- /docsify_guide.md: -------------------------------------------------------------------------------- 1 | ## docsify 使用技巧 2 | 3 | #### 如何实现多级目录折叠? 4 | 5 | #### 为什么要在每级目录下放 README.md? 6 | 7 | #### 如何实现path路由与navbar中的标题高亮相呼应? 8 | 9 | -------------------------------------------------------------------------------- /zh-cn/images/wechat-qrcode/kid-xiong.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/wechat-qrcode/kid-xiong.jpeg -------------------------------------------------------------------------------- /zh-cn/images/wechat-qrcode/rickyhuo.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/wechat-qrcode/rickyhuo.jpeg -------------------------------------------------------------------------------- /zh-cn/images/wechat-qrcode/garyelephant.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/wechat-qrcode/garyelephant.jpeg -------------------------------------------------------------------------------- /zh-cn/images/zhejiang_lekong_xinxi_keji-logo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InterestingLab/seatunnel-docs/HEAD/zh-cn/images/zhejiang_lekong_xinxi_keji-logo.jpg -------------------------------------------------------------------------------- /zh-cn/v2/flink/developing-plugin.md: -------------------------------------------------------------------------------- 1 | # 插件开发 2 | 3 | 4 | ## 插件体系介绍 5 | 6 | seatunnel 的Flink插件分为三部分,**Source**、**Transform**和**Sink**,近期将发布插件开发的文档,敬请期待。 7 | 8 | 9 | 10 | 11 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/developing-plugin.md: -------------------------------------------------------------------------------- 1 | # 插件开发 2 | 3 | 4 | ## 插件体系介绍 5 | 6 | seatunnel 的Spark插件分为三部分,**Source**、**Transform**和**Sink**,近期将发布插件开发的文档,敬请期待。 7 | 8 | 9 | 10 | 11 | -------------------------------------------------------------------------------- /zh-cn/v1/roadmap.md: -------------------------------------------------------------------------------- 1 | # Roadmap 2 | 3 | * 支持离线数据处理 4 | 5 | * 支持Apache Flink / Apache Beam 6 | 7 | * 支持更丰富的插件: 如国内的ipip.net IP库解析;输出数据到HBase, MongoDB的插件。 8 | 9 | * 支持流式机器学习 10 | 11 | * 性能优化 12 | 13 | * ... 14 | 15 | -------------------------------------------------------------------------------- /zh-cn/v2/roadmap.md: -------------------------------------------------------------------------------- 1 | # Roadmap 2 | 3 | * 支持交互式Shell或者WebUI,通过它们来配置和发布任务 4 | 5 | * 根据社区反馈,开发各种插件 6 | 7 | * 支持插件化体系化开发 8 | 9 | * 完善项目的CI/CD 10 | 11 | * 接入DSS或者EasyScheduler。 12 | 13 | * 接入Hudi或者Delta Lake 14 | 15 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/File.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup input 3 | @pluginName File 4 | @pluginDesc "从文件中读取原始数据" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption 10 | string path yes "文件路径" 11 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Remove.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Remove 4 | @pluginDesc "删除数据中的字段" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption array source_field yes "需要删除的字段列表" 10 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/S3Stream.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup input 3 | @pluginName S3 4 | @pluginDesc "从S3云存储上读取原始数据" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption 10 | string path yes "S3云存储路径" 11 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/ConfigExamples.md: -------------------------------------------------------------------------------- 1 | ## 完整配置文件案例 [Spark] 2 | 3 | [配置示例1 : Streaming 流式计算](https://github.com/InterestingLab/seatunnel/blob/wd-v2-baseline/config/spark.streaming.conf.template) 4 | 5 | [配置示例2 : Batch 离线批处理](https://github.com/InterestingLab/seatunnel/blob/wd-v2-baseline/config/spark.batch.conf.template) 6 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/FileStream.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup input 3 | @pluginName FileStream 4 | @pluginDesc "从文件中读取原始数据" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption 10 | string path yes "文件路径" 11 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/HdfsStream.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup input 3 | @pluginName Hdfs 4 | @pluginDesc "从HDFS中读取原始数据" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption 10 | string path yes "Hadoop集群上文件路径" 11 | -------------------------------------------------------------------------------- /_coverpage.md: -------------------------------------------------------------------------------- 1 | # seatunnel 2 | 3 | > 一个简单易用,高性能,能够应对海量数据的数据处理产品 4 | 5 | - 简单易用,灵活配置,无需开发 6 | - 实时流式处理 7 | - 高性能 8 | - 海量数据处理能力 9 | - 模块化和插件化,易于扩展 10 | - 支持利用SQL做数据处理和聚合 11 | - 支持spark 2.x 12 | 13 | 14 | [GitHub](https://github.com/InterestingLab/seatunnel/) 15 | [Get Started](/zh-cn/v1/) 16 | 17 | 18 | ![color](#C5EFF7) 19 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Repartition.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Repartition 4 | @pluginDesc "重新给Dataframe分区" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption number num_partitions="-" yes "分区个数" 10 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Uuid.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Uuid 4 | @pluginDesc "为原始数据集新增自增id字段" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string target_field="uuid" no "自增id字段,若不配置默认为'uuid'" 10 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Add.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Add 4 | @pluginDesc "在源数据中新增一个字段" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string target_field yes "新增的字段名" 10 | @pluginOption string value yes "新增字段的值" 11 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Convert.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Convert 4 | @pluginDesc "对指定字段进行类型转换" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string source_field yes "源字段" 10 | @pluginOption string new_type yes "需要转换的结果类型" 11 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Rename.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Remove 4 | @pluginDesc "重命名数据中的字段" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption array source_field yes "需要重命名的字段" 10 | @pluginOption array target_field yes "变更之后的字段名" 11 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Sql.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Sql 4 | @pluginDesc "在原始数据集Dataframe的基础上执行SQL" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string table="-" no "表名,可为任意字符串" 10 | @pluginOption string sql="-" yes "SQL语句" 11 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/Kafka.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup output 3 | @pluginName Kafka 4 | @pluginDesc "输出Dataframe到Kafka" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string topic="-" yes "Kafka Topic" 10 | @pluginOption string bootstrap.servers="-" yes "Kafka Brokers" 11 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/SocketStream.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup input 3 | @pluginName Socket 4 | @pluginDesc "Socket作为数据源" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string host="localhost" no "socket server hostname" 10 | @pluginOption number port="9999" no "socket server port" 11 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Sample.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Sample 4 | @pluginDesc "对原始数据集进行抽样" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption number fraction="0.1" no "数据采样的比例,例如fraction=0.8,就是抽取其中80%的数据" 10 | @pluginOption number limit="-1" no "数据采样后的条数,其中`-1`代表不限制" 11 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/Hive.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup input 3 | @pluginName Hive 4 | @pluginDesc "从hive读取原始数据" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption 10 | string pre_sql yes "进行预处理的sql, 如果不需要预处理,可以使用select * from hive_db.hive_table" 11 | string table_name yes "预处理sql的到数据注册成的临时表名" 12 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Drop.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Drop 4 | @pluginDesc "丢弃掉符合指定条件的Event" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string condition yes "条件表达式,符合此条件表达式的Event将被丢弃。条件表达式语法即sql中where条件中的条件表达式,如 `name = 'garyelephant'`, `status = '200' and resp_time > 100`" 10 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Lowercase.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Lowercase 4 | @pluginDesc "将指定字段内容全部转换为小写字母" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string source_field="raw_message" no "源字段,若不配置默认为'raw_message'" 10 | @pluginOption string target_field="lowercased" no "目标字段,若不配置默认为'lowercased'" 11 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Uppercase.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Uppercase 4 | @pluginDesc "将指定字段内容全部转换为大写字母" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string source_field="raw_message" no "源字段,若不配置默认为'raw_message'" 10 | @pluginOption string target_field="uppercased" no "目标字段,若不配置默认为'uppercased'" 11 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Checksum.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Checksum 4 | @pluginDesc "获取指定字段的校验码" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string source_field="raw_message" no "源字段" 10 | @pluginOption string target_field="checksum" no "转换后的字段" 11 | @pluginOption string method="SHA1" no "校验方法" 12 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/Kudu.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup input 3 | @pluginName Kudu 4 | @pluginDesc "从[Apache Kudu](https://kudu.apache.org) 表中读取数据" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption 10 | string kudu_master yes "kudu的master,多个master以逗号隔开" 11 | string kudu_table yes "kudu要读取的表名" 12 | string table_name yes "获取到数据注册成的临时表名" 13 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/MongoDB.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup output 3 | @pluginName MongoDB 4 | @pluginDesc "写入数据到[MongoDB](https://www.mongodb.com/)" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption 10 | string readConfig.uri yes "mongoDB uri" 11 | string readConfig.database yes "要写入的database" 12 | string readConfig.collection yes "要写入的collection" 13 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/KafkaStream.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup input 3 | @pluginName Kafka 4 | @pluginDesc "Kafka作为数据源" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string topic yes "Kafka Topic" 10 | @pluginOption string consumer.zookeeper.connect yes "Kafka zookeeper broker" 11 | @pluginOption string consumer.group.id yes "Kafka consumer group id" 12 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/MySQL.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup output 3 | @pluginName Mysql 4 | @pluginDesc "输出数据到MySQL" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string url="-" yes "" 10 | @pluginOption string table="-" yes "" 11 | @pluginOption string user="-" yes "" 12 | @pluginOption string password="-" yes "" 13 | @pluginOption string save_mode="append" no "" 14 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Truncate.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Truncate 4 | @pluginDesc "对指定字段进行字符串截取" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string source_field="raw_message" no "源字段,若不配置默认为'raw_message'" 10 | @pluginOption string target_field="truncated" no "目标字段,若不配置默认为'\_\_root\_\_'" 11 | @pluginOption number max_length="256" no "截取字符串的最大长度" 12 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/Kudu.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup output 3 | @pluginName Kudu 4 | @pluginDesc "写入数据到[Apache Kudu](https://kudu.apache.org)表中" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption 10 | string kudu_master yes "kudu的master,多个master以逗号隔开" 11 | string kudu_table yes "kudu中要写入的表名,表必须已经存在" 12 | string mode="insert" no "写入kudu模式 insert|update|upsert|insertIgnore" 13 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/MongoDB.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup input 3 | @pluginName MongoDB 4 | @pluginDesc "从[MongoDB](https://www.mongodb.com/)读取数据" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption 10 | string readConfig.uri yes "mongoDB uri" 11 | string readConfig.database yes "要读取的database" 12 | string readConfig.collection yes "要读取的collection" 13 | string table_name yes "读取数据注册成的临时表名" 14 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/Elasticsearch.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup output 3 | @pluginName Elasticsearch 4 | @pluginDesc "输出Dataframe到Elasticsearch" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption array hosts="-" yes "Elasticsearch集群地址,格式为host:port" 10 | @pluginOption string index="seatunnel" yes "Elasticsearch index" 11 | @pluginOption string index_type="log" yes "Elasticsearch index type" 12 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Split.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Split 4 | @pluginDesc "根据delimiter对字符串拆分" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string source_field="raw_message" no "源字段,若不配置默认为'raw_message'" 10 | @pluginOption string target_field="_ROOT_" no "目标字段,若不配置默认为'ROOT'" 11 | @pluginOption string delimiter yes "分隔符" 12 | @pluginOption list fields yes "分割后的字段" 13 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/source-plugins/Fake.md: -------------------------------------------------------------------------------- 1 | ## Source plugin : Fake [Spark] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | `Fake` 主要用于快速上手运行一个 seatunnel 应用 10 | 11 | ### Options 12 | 13 | ##### common options [string] 14 | 15 | `Source` 插件通用参数,详情参照 [Source Plugin](/zh-cn/v2/spark/configuration/source-plugins/) 16 | 17 | ### Examples 18 | 19 | ``` 20 | Fake { 21 | result_table_name = "my_dataset" 22 | } 23 | ``` 24 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/Jdbc.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup output 3 | @pluginName Jdbc 4 | @pluginDesc "通过JDBC输出数据到外部数据源" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string driver="-" yes "" 10 | @pluginOption string url="-" yes "" 11 | @pluginOption string table="-" yes "" 12 | @pluginOption string user="-" yes "" 13 | @pluginOption string password="-" yes "" 14 | @pluginOption string save_mode="append" no "" 15 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Replace.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Lowercase 4 | @pluginDesc "将指定字段内容根据正则表达式进行替换" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string source_field="raw_message" no "源字段,若不配置默认为'raw_message'" 10 | @pluginOption string target_field="replaced" no "目标字段,若不配置默认为'replaced'" 11 | @pluginOption string pattern="-" yes "正则表达式" 12 | @pluginOption string replacement="-" yes "替换的字符串" 13 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Json.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Json 4 | @pluginDesc "对原始数据集指定字段进行Json解析" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string source_field="raw_message" no "源字段,若不配置默认为'raw_message'" 10 | @pluginOption string target_field="__root__" no "目标字段,若不配置默认为'__root__'" 11 | @pluginOption string schema_dir="-" no "json schema文件夹路径" 12 | @pluginOption string schema_file="-" no "json schema文件名" 13 | -------------------------------------------------------------------------------- /en/configuration/input-plugins/Socket.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : Socket 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.0 6 | 7 | ### Description 8 | 9 | Read data over a TCP socket 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [host](#host-string) | string | no | localhost | 16 | | [port](#port-number) | number | no | 9999 | 17 | 18 | ##### host [string] 19 | 20 | Socket server hostname 21 | 22 | ##### port [number] 23 | 24 | Socket server port 25 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Grok.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Grok 4 | @pluginDesc "对指定字段进行正则解析" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string pattern="" yes "正则表达式" 10 | @pluginOption string patterns_dir="-" no "patterns文件路径" 11 | @pluginOption boolean named_captures_only="true" no "If true, only store named captures from grok." 12 | @pluginOption string source_field="raw_message" no "数据源字段" 13 | @pluginOption string target_field="__root__" no "目标字段" 14 | -------------------------------------------------------------------------------- /en/configuration/input-plugins/S3.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : S3 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.0 6 | 7 | ### Description 8 | 9 | Read raw data from AWS S3 storage. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [path](#path-string) | string | yes | - | 16 | 17 | ##### path [string] 18 | 19 | File path on S3, supported path formats are **s3://**, **s3a://**, **s3n://** 20 | 21 | ### Example 22 | 23 | ``` 24 | hdfs { 25 | path = "s3n://bucket/access.log" 26 | } 27 | ``` 28 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/FakeStream.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup input 3 | @pluginName Fake 4 | @pluginDesc "生成测试数据以供逻辑测试使用" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string data_format="text" no "测试数据类型,支持text以及json" 10 | @pluginOption string text_delimeter="," no "文本数据分隔符,当'data_format'为text时使用" 11 | @pluginOption array json_keys no "json数据key列表,当'data_format'为json时使用" 12 | @pluginOption number num_of_fields="10" no "字段个数,当'data_format'为text时使用" 13 | @pluginOption number rate="1" yes "每秒生成测试用例个数" 14 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Script.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Script 4 | @pluginDesc "解析并执行自定义脚本中逻辑" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string object_name="event" no "脚本内置JSONObject的引用名,不设置默认为'event'" 10 | @pluginOption string script_name yes "脚本名称" 11 | @pluginOption boolean errorList no "是否需要输出的错误信息List" 12 | @pluginOption boolean isCache no "是否使用Cache中的指令集" 13 | @pluginOption boolean isTrace no "是否输出所有的跟踪信息,同时还需要log级别是DEBUG级" 14 | @pluginOption boolean isPrecise no "是否需要高精度的计算" 15 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/Clickhouse.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup output 3 | @pluginName Clickhouse 4 | @pluginDesc "输出Row到Clickhouse" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption array fields="" yes "" 10 | @pluginOption string hostname="" yes "Clickhouse hostname" 11 | @pluginOption string database="" yes "Clickhouse database" 12 | @pluginOption string table="" yes "Clickhouse table" 13 | @pluginOption string username="" no "Clickhouse auth username" 14 | @pluginOption string password="" no "Clickhouse auth password" 15 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/source-plugins/README.md: -------------------------------------------------------------------------------- 1 | # Source Plugin 2 | 3 | ### Source Common Options 4 | 5 | | name | type | required | default value | 6 | | --- | --- | --- | --- | 7 | | [result_table_name](#result_table_name-string) | string | yes | - | 8 | 9 | ##### result_table_name [string] 10 | 11 | 不指定 `result_table_name`时 ,此插件处理后的数据,不会被注册为一个可供其他插件直接访问的数据集(dataset),或者被称为临时表(table); 12 | 13 | 指定 `result_table_name` 时,此插件处理后的数据,会被注册为一个可供其他插件直接访问的数据集(dataset),或者被称为临时表(table)。此处注册的数据集(dataset),其他插件可通过指定 `source_table_name` 来直接访问。 14 | 15 | 16 | ### Examples 17 | 18 | ``` 19 | fake { 20 | result_table_name = "view_table_2" 21 | } 22 | ``` 23 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugin.md: -------------------------------------------------------------------------------- 1 | # Output 插件 2 | 3 | ### Output插件通用参数 4 | 5 | | name | type | required | default value | 6 | | --- | --- | --- | --- | 7 | | [source_table_name](#source_table_name-string) | string | no | - | 8 | 9 | 10 | 11 | ##### source_table_name [string] 12 | 13 | 不指定 `source_table_name` 时,当前插件处理的就是配置文件中上一个插件输出的数据集(dataset); 14 | 15 | 指定 `source_table_name` 的时候,当前插件处理的就是此参数对应的数据集。 16 | 17 | 18 | ### 使用样例 19 | 20 | ``` 21 | stdout { 22 | source_table_name = "view_table_2" 23 | } 24 | ``` 25 | 26 | > 将名为 `view_table_2` 的临时表输出。 27 | 28 | ``` 29 | stdout {} 30 | ``` 31 | 32 | > 若不配置`source_table_name`, 将配置文件中最后一个 `Filter` 插件的处理结果输出 33 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Remove.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Remove 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Remove all specified field from Rows. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [source_field](#source_field-array) | array | yes | - | 16 | 17 | ##### source_field [array] 18 | 19 | Array of fields needed to be removed. 20 | 21 | ### Examples 22 | 23 | ``` 24 | remove { 25 | source_field = ["field1", "field2"] 26 | } 27 | ``` 28 | 29 | > Remove `field1` and `field2` from Rows. 30 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Uuid.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Uuid 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Using Spark function `monotonically_increasing_id()` to add a globally unique and auto incrementing UUID field. 10 | 11 | 12 | ### Options 13 | 14 | | name | type | required | default value | 15 | | --- | --- | --- | --- | 16 | | [target_field](#target_field-string) | string | no | uuid | 17 | 18 | ##### target_field [string] 19 | 20 | New field name, default is `uuid`. 21 | 22 | ### Example 23 | 24 | ``` 25 | uuid { 26 | target_field = "id" 27 | } 28 | ``` 29 | -------------------------------------------------------------------------------- /zh-cn/v1/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v1/) 2 | 3 | - [快速开始](/zh-cn/v1/quick-start) 4 | 5 | - [下载、安装](/zh-cn/v1/installation) 6 | 7 | - [行业应用案例](/zh-cn/case_study/) 8 | 9 | - [配置](/zh-cn/v1/configuration/base) 10 | - [通用配置](/zh-cn/v1/configuration/base) 11 | - [Input插件](/zh-cn/v1/configuration/input-plugin) 12 | - [Filter插件](/zh-cn/v1/configuration/filter-plugin) 13 | - [Output插件](/zh-cn/v1/configuration/output-plugin) 14 | 15 | - [部署与运行](/zh-cn/v1/deployment) 16 | 17 | - [监控](/zh-cn/v1/monitoring) 18 | 19 | - [插件开发](/zh-cn/v1/developing-plugin) 20 | 21 | - [深入seatunnel](/zh-cn/v1/internal) 22 | 23 | - [Roadmap](/zh-cn/v1/roadmap) 24 | 25 | - [贡献代码](/zh-cn/v1/contribution.md) 26 | -------------------------------------------------------------------------------- /en/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [Introduction](/en/README) 2 | - [Quick Stark](/en/quick-start) 3 | - [Installation](/en/installation) 4 | - [Configuration](/en/configuration/base) 5 | - [Base Configuration](/en/configuration/base) 6 | - [Input Plugin](/en/configuration/input-plugin) 7 | - [Filter Plugin](/en/configuration/filter-plugin) 8 | - [Output Plugin](/en/configuration/output-plugin) 9 | - [Serializer Plugin](/en/configuration/serializer-plugin) 10 | - [Deployment](/en/deployment) 11 | - [Monitoring](/en/monitoring) 12 | - [Performance Tunning](/en/performance-tunning) 13 | - [Developing Plugin](/en/developing-plugin) 14 | - [Roadmap](/en/roadmap) 15 | - [Contribution](/en/contribution.md) 16 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/sink-plugins/README.md: -------------------------------------------------------------------------------- 1 | # Sink Plugin 2 | 3 | ### Sink Common Options 4 | 5 | | name | type | required | default value | 6 | | --- | --- | --- | --- | 7 | | [source_table_name](#source_table_name-string) | string | no | - | 8 | 9 | 10 | 11 | ##### source_table_name [string] 12 | 13 | 不指定 `source_table_name` 时,当前插件处理的就是配置文件中上一个插件输出的数据集(dataset); 14 | 15 | 指定 `source_table_name` 的时候,当前插件处理的就是此参数对应的数据集。 16 | 17 | 18 | ### Examples 19 | 20 | ``` 21 | stdout { 22 | source_table_name = "view_table_2" 23 | } 24 | ``` 25 | 26 | > 将名为 `view_table_2` 的临时表输出。 27 | 28 | ``` 29 | stdout {} 30 | ``` 31 | 32 | > 若不配置`source_table_name`, 将配置文件中最后一个 `Filter` 插件的处理结果输出 33 | -------------------------------------------------------------------------------- /en/configuration/input-plugins/Hdfs.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : Hdfs 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.0 6 | 7 | ### Description 8 | 9 | Read raw data from HDFS. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [path](#path-string) | string | yes | - | 16 | 17 | ##### path [string] 18 | 19 | File path on Hadoop cluster. 20 | 21 | ### Example 22 | 23 | ``` 24 | hdfs { 25 | path = "hdfs:///access.log" 26 | } 27 | ``` 28 | 29 | or you can specify hdfs name service: 30 | 31 | ``` 32 | hdfs { 33 | path = "hdfs://m2:8022/access.log" 34 | } 35 | ``` 36 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/commands/start-waterdrop-spark.sh.md: -------------------------------------------------------------------------------- 1 | ## start-seatunnel-spark.sh 使用方法 2 | 3 | 4 | ### 使用说明 5 | 6 | ```bash 7 | bin/start-seatunnel-spark.sh -c config-path -m master -e deploy-mode -i city=beijing 8 | ``` 9 | 10 | > 使用 `-c` 或者 `--config`来指定配置文件的路径 11 | 12 | > 使用 `-m` 或者 `--master` 来指定集群管理器 13 | 14 | > 使用 `-e` 或者 `--deploy-mode` 来指定部署模式 15 | 16 | > 使用 `-i` 或者 `--variable` 来指定配置文件中的变量,可以配置多个 17 | 18 | 19 | ### 使用案例 20 | 21 | ``` 22 | # Yarn client 模式 23 | ./bin/start-seatunnel-spark.sh --master yarn --deploy-mode client --config ./config/application.conf 24 | 25 | # Yarn cluster 模式 26 | ./bin/start-seatunnel-spark.sh --master yarn --deploy-mode cluster --config ./config/application.conf 27 | ``` 28 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Add.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Add 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Add a field with fixed value to Rows. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [target_field](#target_field-string) | string | yes | - | 16 | | [value](#value-string) | string | yes | - | 17 | 18 | ##### target_field [string] 19 | 20 | New field name. 21 | 22 | ##### value [string] 23 | 24 | New field value. 25 | 26 | ### Examples 27 | 28 | ``` 29 | add { 30 | value = "1" 31 | } 32 | ``` 33 | 34 | > Add a field, the value is "1" 35 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Kv.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Kv 4 | @pluginDesc "提取指定字段所有的Key-Value, 常用于解析url参数中的key和value" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string field_split="&" no "字段分隔符" 10 | @pluginOption string value_split="=" no "字段值分隔符" 11 | @pluginOption string field_prefix="" no "字段指定前缀" 12 | @pluginOption string include_fields="[]" no "需要包括的字段" 13 | @pluginOption string exclude_fields="[]" no "不需要包括的字段" 14 | @pluginOption string source_field="raw_message" no "源字段,若不配置默认为'raw_message'" 15 | @pluginOption string target_field="\_\_root\_\_" no "目标字段,若不配置默认为'\_\_root\_\_'" 16 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/transform-plugins/Sql.md: -------------------------------------------------------------------------------- 1 | ## Transform plugin : SQL [Flink] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 使用SQL处理数据,使用的是flink的sql语法,支持其各种udf 9 | 10 | ### Options 11 | | name | type | required | default value | 12 | | --- | --- | --- | --- | 13 | | [sql](#sql-string) | string | yes | - | 14 | | [common-options](#common-options-string)| string | no | - | 15 | 16 | 17 | ##### common options [string] 18 | 19 | `Transform` 插件通用参数,详情参照 [Transform Plugin](/zh-cn/v2/flink/configuration/transform-plugins/) 20 | 21 | ### Examples 22 | 23 | ``` 24 | sql { 25 | sql = "select name, age from fake" 26 | } 27 | ``` 28 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Repartition.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Repartition 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Adjust the number of underlying spark rdd partition to increase or decrease degree of parallelism. This filter is mainly to adjust the data processing performance. 10 | 11 | 12 | ### Options 13 | 14 | | name | type | required | default value | 15 | | --- | --- | --- | --- | 16 | | [num_partitions](#num_partitions-number) | number | yes | - | 17 | 18 | ##### num_partitions [number] 19 | 20 | Target partition number. 21 | 22 | ### Examples 23 | 24 | ``` 25 | repartition { 26 | num_partitions = 8 27 | } 28 | ``` 29 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/sink-plugins/Console.md: -------------------------------------------------------------------------------- 1 | ## Sink plugin : Console [Flink] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 用于功能测试和和debug,结果将输出在taskManager的stdout选项卡 9 | 10 | ### Options 11 | | name | type | required | default value | 12 | | --- | --- | --- | --- | 13 | | [common-options](#common-options-string)| string | no | - | 14 | 15 | 16 | ##### common options [string] 17 | 18 | `Sink` 插件通用参数,详情参照 [Sink Plugin](/zh-cn/v2/flink/configuration/sink-plugins/) 19 | 20 | ### Examples 21 | 22 | ``` 23 | ConsoleSink{} 24 | ``` 25 | 26 | ### Note 27 | flink的console输出在flink的web UI 28 | ![flink_console](../../../../images/flink/flink-console.png) 29 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Date.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Date 4 | @pluginDesc "对指定字段进行时间格式转换" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string source_field="__ROOT__" no "源字段,若不配置将使用当前时间" 10 | @pluginOption string target_field="datetime" no "目标字段,若不配置默认为'datetime'" 11 | @pluginOption string source_time_format="UNIX_MS" no "源字段时间格式,当前支持UNIX、UNIX_MS以及'SimpleDateFormat'格式" 12 | @pluginOption string target_time_format="yyyy/MM/dd HH:mm:ss" no "目标字段时间格式" 13 | @pluginOption string time_zone="" no "时区" 14 | @pluginOption string default_value="${now}" no "如果日期转换失败将会使用当前时间生成指定格式的值" 15 | @pluginOption string locale="Locale.US" no "编码类型" 16 | -------------------------------------------------------------------------------- /zh-cn/v2/internal.md: -------------------------------------------------------------------------------- 1 | ## 深入seatunnel 2 | 3 | #### 基本原理 4 | 5 | 本质上,seatunnel不是对Spark和Flink内部的修改,而是在Spark和Flink的基础上,做了一个平台化和产品化的包装,使广大开发者使用Spark和Flink的时候更加简单和易用,主要有两个亮点: 6 | 7 | * 完全可以做到开箱即用 8 | 9 | * 开发者可以开发自己的插件,plugin in 到 seatunnel上跑,而不需要写一个完整的Spark或者Flink程序 10 | 11 | * 当然,seatunnel从v2.0开始,同时支持Spark和Flink。 12 | 13 | 如果想了解seatunnel的实现原理,建议熟练掌握一个最重要的设计模式:控制反转(或者叫依赖注入),这是seatunnel实现的基本思想。控制反转(或者叫依赖注入)是什么?我们用两句话来总结: 14 | 15 | * 上层不依赖底层,两者依赖抽象。 16 | 17 | * 流程代码与业务逻辑应该分离。 18 | 19 | #### seatunnel的插件化体系是如何构建出来的? 20 | 21 | 可能有些同学听说过Google Guice,它是一个非常优秀的依赖注入框架,很多开源项目如Elasticsearch就是利用的Guice实现的各个模块依赖的注入,当然包括Elasticsearch 插件。 22 | 23 | 其实对于插件体系架构来说,java本身就带了一个可以用来实现插件化的功能,即[Java SPI](https://docs.oracle.com/javase/tutorial/sound/SPI-intro.html),开源项目Presto中也大量使用了它来加载插件。 24 | 25 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/SocketStream.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : Socket [Streaming] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.0 6 | 7 | ### Description 8 | 9 | Socket作为数据源 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [host](#host-string) | string | no | localhost | 16 | | [port](#port-number) | number | no | 9999 | 17 | | [common-options](#common-options-string)| string | yes | - | 18 | 19 | 20 | ##### host [string] 21 | 22 | socket server hostname 23 | 24 | ##### port [number] 25 | 26 | socket server port 27 | 28 | ##### common options [string] 29 | 30 | `Input` 插件通用参数,详情参照 [Input Plugin](/zh-cn/v1/configuration/input-plugin) 31 | 32 | -------------------------------------------------------------------------------- /zh-cn/case_study/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v1/) 2 | 3 | - [快速开始](/zh-cn/v1/quick-start) 4 | 5 | - [下载、安装](/zh-cn/v1/installation) 6 | 7 | - [行业应用案例](/zh-cn/v1/case_study/) 8 | 9 | - [如何快速地把HDFS中的数据导入Clickhouse](/zh-cn/case_study/1.md) 10 | - [如何快速地将Hive中的数据导入ClickHouse](/zh-cn/case_study/2.md) 11 | - [如何使用Spark快速将数据写入Elasticsearch](/zh-cn/case_study/3.md) 12 | - [优秀的数据工程师,怎么用Spark在TiDB上做OLAP分析](/zh-cn/case_study/4.md) 13 | - [seatunnel中StructuredStreaming怎么用](/zh-cn/case_study/5.md) 14 | 15 | - [配置](/zh-cn/v1/configuration/base) 16 | 17 | - [部署与运行](/zh-cn/v1/deployment) 18 | 19 | - [监控](/zh-cn/v1/monitoring) 20 | 21 | - [插件开发](/zh-cn/v1/developing-plugin) 22 | 23 | - [深入seatunnel](/zh-cn/v1/internal) 24 | 25 | - [Roadmap](/zh-cn/v1/roadmap) 26 | 27 | - [贡献代码](/zh-cn/v1/contribution.md) 28 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/S3Stream.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : S3Stream [Streaming] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.0 6 | 7 | ### Description 8 | 9 | 从S3云存储上读取原始数据 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [path](#path-string) | string | yes | - | 16 | | [common-options](#common-options-string)| string | yes | - | 17 | 18 | 19 | ##### path [string] 20 | 21 | S3云存储路径,当前支持的路径格式有**s3://**, **s3a://**, **s3n://** 22 | 23 | ##### common options [string] 24 | 25 | `Input` 插件通用参数,详情参照 [Input Plugin](/zh-cn/v1/configuration/input-plugin) 26 | 27 | 28 | ### Example 29 | 30 | ``` 31 | s3Stream { 32 | path = "s3n://bucket/access.log" 33 | } 34 | ``` 35 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Remove.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Remove 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 删除数据中的字段 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [source_field](#source_field-array) | array | yes | - | 16 | | [common-options](#common-options-string)| string | no | - | 17 | 18 | 19 | ##### source_field [array] 20 | 21 | 需要删除的字段列表 22 | 23 | ##### common options [string] 24 | 25 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 26 | 27 | 28 | ### Examples 29 | 30 | ``` 31 | remove { 32 | source_field = ["field1", "field2"] 33 | } 34 | ``` 35 | 36 | > 删除原始数据中的`field1`和`field2`字段 37 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Repartition.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Repartition 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 调整数据处理的分区个数,并行度。这个filter主要是为了调节数据处理性能,不对数据本身做任何处理。 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [num_partitions](#num_partitions-number) | number | yes | - | 16 | | [common-options](#common-options-string)| string | no | - | 17 | 18 | 19 | ##### num_partitions [number] 20 | 21 | 目标分区个数 22 | 23 | ##### common options [string] 24 | 25 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 26 | 27 | 28 | ### Examples 29 | 30 | ``` 31 | repartition { 32 | num_partitions = 8 33 | } 34 | ``` 35 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Uppercase.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Uppercase 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Uppercase specified field. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [source_field](#source_field-string) | string | no | raw_message | 16 | | [target_field](#target_field-string) | string | no | uppercased | 17 | 18 | ##### source_field [string] 19 | 20 | Source field, default is `raw_message` 21 | 22 | ##### target_field [string] 23 | 24 | New field name, default is `uppercased` 25 | 26 | ### Example 27 | 28 | ``` 29 | uppercase { 30 | source_field = "username" 31 | target_field = "username_uppercased" 32 | } 33 | ``` 34 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Drop.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Drop 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Drop Rows that match the condition 10 | 11 | 12 | ### Options 13 | 14 | | name | type | required | default value | 15 | | --- | --- | --- | --- | 16 | | [condition](#condition-string) | string | yes | - | 17 | 18 | ##### condition [string] 19 | 20 | Conditional expression, Rows that match this conditional expression will be dropped. Expressions in where clause of sql language can be used, such as `name = 'grayelephant'`, `status = 200 AND resp_time > 100` 21 | 22 | 23 | ### Examples 24 | 25 | ``` 26 | drop { 27 | condition = "status = '200'" 28 | } 29 | ``` 30 | 31 | > Rows will be dropped if status is 200 32 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Lowercase.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Lowercase 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Lowercase specified string field. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [source_field](#source_field-string) | string | no | raw_message | 16 | | [target_field](#target_field-string) | string | no | lowercased | 17 | 18 | ##### source_field [string] 19 | 20 | Source field, default is `raw_message` 21 | 22 | ##### target_field [string] 23 | 24 | New field name, default is `lowercased` 25 | 26 | # Examples 27 | 28 | ``` 29 | lowercase { 30 | source_field = "address" 31 | target_field = "address_lowercased" 32 | } 33 | ``` 34 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Uuid.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Uuid 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 为原始数据集新增一个全局唯一且自增的UUID字段,使用的是spark的`monotonically_increasing_id()`函数。 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [target_field](#target_field-string) | string | no | uuid | 16 | | [common-options](#common-options-string)| string | no | - | 17 | 18 | 19 | ##### target_field [string] 20 | 21 | 存储uuid的目标字段,若不配置默认为`uuid` 22 | 23 | ##### common options [string] 24 | 25 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 26 | 27 | 28 | ### Example 29 | 30 | ``` 31 | uuid { 32 | target_field = "id" 33 | } 34 | ``` 35 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/source-plugins/Fake.md: -------------------------------------------------------------------------------- 1 | ## Source plugin : Fake [Flink] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | > Fake Source主要用于自动生成数据,数据只有两列,第一列为String类型,内容为["Gary", "Ricky Huo", "Kid Xiong"]中随机一个,第二列为Long类型,为当前的13位时间戳,以此作为输入来对seatunnel进行功能验证,测试等。 9 | 10 | ### Options 11 | | name | type | required | default value | 12 | | --- | --- | --- | --- | 13 | | [common-options](#common-options-string)| string | no | - | 14 | 15 | ##### common options [string] 16 | 17 | `Source` 插件通用参数,详情参照 [Source Plugin](/zh-cn/v2/flink/configuration/source-plugins/) 18 | 19 | 20 | ### Examples 21 | ``` 22 | source { 23 | FakeSourceStream { 24 | result_table_name = "fake" 25 | field_name = "name,age" 26 | } 27 | } 28 | ``` 29 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Sample.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Sample 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Taking Samples from the events. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [fraction](#fraction-number) | number | no | 0.1 | 16 | | [limit](#limit-number) | number | no | -1 | 17 | 18 | ##### fraction [number] 19 | 20 | The fraction of sampling. For example, `fraction=0.8` represents to extract `80%` data from the events. 21 | 22 | ##### limit [number] 23 | 24 | The number of Rows after sampling, where `-1` represents no limit. 25 | 26 | ### Examples 27 | 28 | ``` 29 | sample { 30 | fraction = 0.8 31 | } 32 | ``` 33 | 34 | > Extract 80% of events. 35 | -------------------------------------------------------------------------------- /zh-cn/case_study/README.md: -------------------------------------------------------------------------------- 1 | ### 行业应用案例 2 | 3 | * [如何快速地把HDFS中的数据导入Clickhouse](/zh-cn/case_study/1.md) 4 | * [如何快速地将Hive中的数据导入ClickHouse](/zh-cn/case_study/2.md) 5 | * [如何使用Spark快速将数据写入Elasticsearch](/zh-cn/case_study/3.md) 6 | * [优秀的数据工程师,怎么用Spark在TiDB上做OLAP分析](/zh-cn/case_study/4.md) 7 | * [seatunnel中StructuredStreaming怎么用](/zh-cn/case_study/5.md) 8 | 9 | ### 使用seatunnel的公司 10 | 11 | * [微博](https://weibo.com), 增值业务部数据平台 12 | 13 | ![微博Logo](https://img.t.sinajs.cn/t5/style/images/staticlogo/groups3.png?version=f362a1c5be520a15) 14 | 15 | * [新浪](http://www.sina.com.cn/), 大数据运维分析平台 16 | 17 | ![新浪Logo](http://n.sinaimg.cn/tech/ir/imges/logo.png) 18 | 19 | * [一下科技](https://www.yixia.com/), 一直播数据平台 20 | 21 | ![一下科技Logo](https://imgaliyuncdn.miaopai.com/static20131031/miaopai20140729/new_yixia/static/imgs/logo.png) 22 | 23 | * 其他公司 ... 期待您的加入 24 | 25 | -------------------------------------------------------------------------------- /en/configuration/output-plugins/Stdout.md: -------------------------------------------------------------------------------- 1 | ## Output plugin : Stdout 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Output Rows to console, it is always used for debugging. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | engine | 14 | | --- | --- | --- | --- | --- | 15 | | [limit](#limit-number) | number | no | 100 | batch/spark streaming | 16 | | [format](#format-string) | string | no | plain | batch/spark streaming | 17 | 18 | ##### limit [number] 19 | 20 | Limit number of output. `-1` means no limit. 21 | 22 | ##### format [string] 23 | 24 | The format used for output, the allowed formats are `json`, `plain` and `schema` 25 | 26 | ### Example 27 | 28 | ``` 29 | stdout { 30 | limit = 10 31 | format = "json" 32 | } 33 | ``` 34 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Add.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Add 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 在源数据中新增一个字段 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [target_field](#target_field-string) | string | yes | - | 16 | | [value](#value-string) | string | yes | - | 17 | | [common-options](#common-options-string)| string | no | - | 18 | 19 | ##### target_field [string] 20 | 21 | 新增的字段名 22 | 23 | ##### value [string] 24 | 25 | 新增字段的值, 目前仅支持固定值,不支持变量 26 | 27 | ##### common options [string] 28 | 29 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 30 | 31 | ### Examples 32 | 33 | ``` 34 | add { 35 | value = "1" 36 | } 37 | ``` 38 | 39 | > 新增一个字段,其值为1 40 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Table.docs: -------------------------------------------------------------------------------- 1 | @seatunnelPlugin 2 | @pluginGroup filter 3 | @pluginName Table 4 | @pluginDesc "Table 用于将静态文件映射为一张表,可与实时处理的流进行关联,常用于用户昵称,国家省市等字典表关联" 5 | @pluginAuthor InterestingLab 6 | @pluginHomepage https://interestinglab.github.io/seatunnel-docs 7 | @pluginVersion 1.0.0 8 | 9 | @pluginOption string path yes "Hadoop支持的文件路径(默认hdfs路径, 如/path/to/file), 如本地文件:file:///path/to/file, hdfs:///path/to/file, s3:///path/to/file ..." 10 | @pluginOption string delimiter="," no "文件中列与列之间的分隔符" 11 | @pluginOption string table_name yes "将文件载入后将注册为一张表,这里指定的是表名称,可用于在SQL中直接与流处理数据关联" 12 | @pluginOption array fields yes "文件中,每行中各个列的名称,按照数据中实际列顺序提供" 13 | @pluginOption array field_types no "每个列的类型,顺序与个数必须与`fields`参数一一对应, 不指定此参数,默认所有列的类型为字符串; 支持的数据类型包括:boolean, double, long, string" 14 | @pluginOption boolean cache="true" no "是否内存中缓存文件内容,true表示缓存,false表示每次需要时重新加载" 15 | -------------------------------------------------------------------------------- /en/configuration/input-plugins/File.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : File 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.1 6 | 7 | ### Description 8 | 9 | Read raw data from local file system. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [format](#format-string) | string | yes | json | 16 | | [path](#path-string) | string | yes | - | 17 | | [table_name](#table_name-string) | string | yes | - | 18 | 19 | ##### format [string] 20 | 21 | The input data source format. 22 | 23 | ##### path [string] 24 | 25 | File path. 26 | 27 | ##### table_name [string] 28 | 29 | Registered table name of input data. 30 | 31 | ### Example 32 | 33 | ``` 34 | file { 35 | path = "file:///var/log/access.log" 36 | table_name = "access" 37 | format = "text" 38 | } 39 | ``` 40 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Drop.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Drop 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 丢弃掉符合指定条件的Row 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [condition](#condition-string) | string | yes | - | 16 | | [common-options](#common-options-string)| string | no | - | 17 | 18 | 19 | ##### condition [string] 20 | 21 | 条件表达式,符合此条件表达式的Row将被丢弃。条件表达式语法即sql中where条件中的条件表达式,如 `name = 'garyelephant'`, `status = '200' and resp_time > 100` 22 | 23 | ##### common options [string] 24 | 25 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 26 | 27 | 28 | ### Examples 29 | 30 | ``` 31 | drop { 32 | condition = "status = '200'" 33 | } 34 | ``` 35 | 36 | > 状态码为200的Row将被丢弃 37 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Sample.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Sample 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 对原始数据集进行抽样 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [fraction](#fraction-number) | number | no | 0.1 | 16 | | [limit](#limit-number) | number | no | -1 | 17 | | [common-options](#common-options-string)| string | no | - | 18 | 19 | 20 | ##### fraction [number] 21 | 22 | 数据采样的比例,例如fraction=0.8,就是抽取其中80%的数据 23 | 24 | ##### limit [number] 25 | 26 | 数据采样后的条数,其中`-1`代表不限制 27 | 28 | ##### common options [string] 29 | 30 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 31 | 32 | 33 | ### Examples 34 | 35 | ``` 36 | sample { 37 | fraction = 0.8 38 | } 39 | ``` 40 | 41 | > 抽取80%的数据 42 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/deployment.md: -------------------------------------------------------------------------------- 1 | # 部署与运行 2 | 3 | > seatunnel For Flink 依赖Java运行环境和Flink,详细的seatunnel 安装步骤参考[安装seatunnel](/zh-cn/v2/flink/installation) 4 | 5 | 下面重点说明不同平台的运行方式: 6 | 7 | > 首先编辑解压后seatunnel目录下的config/seatunnel-env.sh, 指定必须环境配置FLINK_HOME 8 | 9 | ### 在Flink Standalone集群上运行seatunnel 10 | 11 | ``` 12 | .bin/start-seatunnel-flink.sh --config config-path 13 | # -p 2 指定flink job的并行度为2,还可以指定更多的参数,使用 flink run -h查看 14 | .bin/start-seatunnel-flink.sh -p 2 --config config-path 15 | ``` 16 | ### 在Yarn集群上运行seatunnel 17 | ``` 18 | .bin/start-seatunnel-flink.sh -m yarn-cluster --config config-path 19 | # -ynm seatunnel 指定在yarn webUI显示的名称为seatunnel,还可以指定更多的参数,使用 flink run -h查看 20 | .bin/start-seatunnel-flink.sh -m yarn-cluster -ynm seatunnel --config config-path 21 | ``` 22 | 23 | 24 | --- 25 | 26 | 可参考: [Flink Yarn Setup](https://ci.apache.org/projects/flink/flink-docs-release-1.9/zh/ops/deployment/yarn_setup.html) 27 | 28 | 29 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Convert.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Convert 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Convert a field’s value to a different type, such as converting a string to an integer. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [new_type](#new_type-string) | string | yes | - | 16 | | [source_field](#source_field-string) | string | yes | - | 17 | 18 | ##### new_type [string] 19 | 20 | Conversion type, supports `string`, `integer`, `long`, `float`, `double` and `boolean` now. 21 | 22 | ##### source_field [string] 23 | 24 | Source field. 25 | 26 | 27 | ### Examples 28 | 29 | ``` 30 | convert { 31 | source_field = "age" 32 | new_type = "integer" 33 | } 34 | ``` 35 | 36 | > Convert the `age` field to `integer` type 37 | 38 | 39 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/source-plugins/Socket.md: -------------------------------------------------------------------------------- 1 | ## Source plugin : Socket [Flink] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | > Socket作为数据源 9 | 10 | ### Options 11 | | name | type | required | default value | 12 | | --- | --- | --- | --- | 13 | | [host](#host-string) | string | no | localhost | 14 | | [port](#port-int) | int | no | 9999 | 15 | | [common-options](#common-options-string)| string | no | - | 16 | 17 | ##### host [string] 18 | 19 | socket server hostname 20 | 21 | ##### port [int] 22 | 23 | socket server port 24 | 25 | ##### common options [string] 26 | 27 | `Source` 插件通用参数,详情参照 [Source Plugin](/zh-cn/v2/flink/configuration/source-plugins/) 28 | 29 | ### Examples 30 | ``` 31 | source { 32 | SocketStream{ 33 | result_table_name = "socket" 34 | field_name = "info" 35 | } 36 | } 37 | ``` 38 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Truncate.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Truncate 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Truncate string. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [max_length](#max_length-number) | number | no | 256 | 16 | | [source_field](#source_field-string) | string | no | raw_message | 17 | | [target_field](#target_field-string) | string | no | truncated | 18 | 19 | ##### max_length [number] 20 | 21 | Maximum length of the string. 22 | 23 | ##### source_field [string] 24 | 25 | Source field name, default is `raw_message`. 26 | 27 | ##### target_field [string] 28 | 29 | New field name, default is `__root__`. 30 | 31 | ### Example 32 | 33 | ``` 34 | truncate { 35 | source_field = "telephone" 36 | max_length = 10 37 | } 38 | ``` 39 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugin.md: -------------------------------------------------------------------------------- 1 | # Input 插件 2 | 3 | ### Input插件通用参数 4 | 5 | | name | type | required | default value | 6 | | --- | --- | --- | --- | 7 | | [result_table_name](#result_table_name-string) | string | yes | - | 8 | | [table_name](#table_name-string) | string | no | - | 9 | 10 | 11 | ##### result_table_name [string] 12 | 13 | 不指定 `result_table_name时` ,此插件处理后的数据,不会被注册为一个可供其他插件直接访问的数据集(dataset),或者被称为临时表(table); 14 | 15 | 指定 `result_table_name` 时,此插件处理后的数据,会被注册为一个可供其他插件直接访问的数据集(dataset),或者被称为临时表(table)。此处注册的数据集(dataset),其他插件可通过指定 `source_table_name` 来直接访问。 16 | 17 | 18 | ##### table_name [string] 19 | 20 | **\[从v1.4开始废弃\]** 功能同 `result_table_name`,后续 Release 版本中将删除此参数,建议使用 `result_table_name` 参数 21 | 22 | 23 | ### 使用样例 24 | 25 | ``` 26 | fake { 27 | result_table_name = "view_table_2" 28 | } 29 | ``` 30 | 31 | > 数据源 `fake` 的结果将注册为名为 `view_table_2` 的临时表。这个临时表,可以被任意 `Filter` 或者 `Output` 插件通过指定 `source_table_name` 使用。 32 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/source-plugins/SocketStream.md: -------------------------------------------------------------------------------- 1 | ## Source plugin : SocketStream [Spark] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | `SocketStream` 主要用于接受 Socket 数据,用于快速验证 Spark 流式计算。 10 | 11 | 12 | ### Options 13 | 14 | | name | type | required | default value | 15 | | --- | --- | --- | --- | 16 | | [host](#host-string) | string | no | localhost | 17 | | [port](#port-number) | number | no | 9999 | 18 | | [common-options](#common-options-string)| string | yes | - | 19 | 20 | ##### host [string] 21 | 22 | socket server hostname 23 | 24 | ##### port [number] 25 | 26 | socket server port 27 | 28 | ##### common options [string] 29 | 30 | `Source` 插件通用参数,详情参照 [Source Plugin](/zh-cn/v2/spark/configuration/source-plugins/) 31 | 32 | 33 | 34 | ### Examples 35 | 36 | ``` 37 | socketStream { 38 | port = 9999 39 | } 40 | ``` 41 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v2/) 2 | 3 | - [v2.x与v1.x的区别](/zh-cn/v2/diff_v1_v2) 4 | 5 | - [行业应用案例](/zh-cn/case_study/) 6 | 7 | - [seatunnel v2.x For Flink](/zh-cn/v2/flink/) 8 | 9 | - [快速开始](/zh-cn/v2/flink/quick-start) 10 | 11 | - [下载、安装](/zh-cn/v2/flink/installation) 12 | 13 | - [命令使用说明](/zh-cn/v2/flink/commands/) 14 | 15 | - [配置文件](/zh-cn/v2/flink/configuration/) 16 | - [通用配置](/zh-cn/v2/flink/configuration/) 17 | - [Source 插件配置](/zh-cn/v2/flink/configuration/source-plugins/) 18 | - [Transform 插件配置](/zh-cn/v2/flink/configuration/transform-plugins/) 19 | - [Sink 插件配置](/zh-cn/v2/flink/configuration/sink-plugins/) 20 | 21 | - [部署与运行](/zh-cn/v2/flink/deployment) 22 | 23 | - [插件开发](/zh-cn/v2/flink/developing-plugin) 24 | 25 | - [seatunnel v2.x For Spark](/zh-cn/v2/spark/) 26 | 27 | - [深入seatunnel](/zh-cn/v2/internal.md) 28 | 29 | - [Roadmap](/zh-cn/v2/roadmap.md) 30 | 31 | - [贡献代码](/zh-cn/v2/contribution.md) 32 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Rename.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Remove 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 重命名数据中的字段 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [source_field](#source_field-string) | array | yes | - | 16 | | [target_field](#target_field-string) | array | yes | - | 17 | | [common-options](#common-options-string)| string | no | - | 18 | 19 | 20 | ##### source_field [string] 21 | 22 | 需要重命名的字段 23 | 24 | ##### target_field [string] 25 | 26 | 变更之后的字段名 27 | 28 | ##### common options [string] 29 | 30 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 31 | 32 | 33 | ### Examples 34 | 35 | ``` 36 | rename { 37 | source_field = "field1" 38 | target_field = "field2" 39 | } 40 | ``` 41 | 42 | > 将原始数据中的`field1`字段重命名为`field2`字段 43 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/installation.md: -------------------------------------------------------------------------------- 1 | # 下载、安装 2 | 3 | ## 下载 4 | 5 | ### 社区版本(Community) 6 | 7 | https://github.com/InterestingLab/seatunnel/releases 8 | 9 | ## 环境准备 10 | 11 | ### 准备好JDK1.8 12 | 13 | seatunnel 依赖JDK1.8运行环境。 14 | 15 | ### 准备好Flink 16 | 17 | 请先[下载Flink](https://flink.apache.org/downloads.html), Flink版本请选择 >= 1.9.0。下载完成进行[安装](https://ci.apache.org/projects/flink/flink-docs-release-1.9/zh/ops/deployment/cluster_setup.html) 18 | 19 | ### 安装seatunnel 20 | 21 | 下载seatunnel安装包并解压, 这里以社区版为例: 22 | 23 | ``` 24 | wget https://github.com/InterestingLab/seatunnel/releases/download/v/seatunnel-.zip -O seatunnel-.zip 25 | unzip seatunnel-.zip 26 | ln -s seatunnel- seatunnel 27 | ``` 28 | 29 | 没有任何复杂的安装配置步骤,seatunnel的使用方法请参考[Quick Start](/zh-cn/v2/flink/quick-start.md), 配置请参考[Configuration](/zh-cn/v2/flink/configuration/)。 30 | 31 | 如果想把seatunnel部署在Flink Standalone/Yarn集群上运行,请参考[seatunnel部署](/zh-cn/v2/flink/deployment) 32 | 33 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Lowercase.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Lowercase 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 将指定字段内容全部转换为小写字母 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [source_field](#source_field-string) | string | no | raw_message | 16 | | [target_field](#target_field-string) | string | no | lowercased | 17 | | [common-options](#common-options-string)| string | no | - | 18 | 19 | 20 | ##### source_field [string] 21 | 22 | 源字段,若不配置默认为`raw_message` 23 | 24 | ##### target_field [string] 25 | 26 | 目标字段,若不配置默认为`lowercased` 27 | 28 | ##### common options [string] 29 | 30 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 31 | 32 | 33 | # Examples 34 | 35 | ``` 36 | lowercase { 37 | source_field = "address" 38 | target_field = "address_lowercased" 39 | } 40 | ``` 41 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Uppercase.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Uppercase 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 将指定字段内容全部转换为大写字母 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [source_field](#source_field-string) | string | no | raw_message | 16 | | [target_field](#target_field-string) | string | no | uppercased | 17 | | [common-options](#common-options-string)| string | no | - | 18 | 19 | 20 | ##### source_field [string] 21 | 22 | 源字段,若不配置默认为`raw_message` 23 | 24 | ##### target_field [string] 25 | 26 | 目标字段,若不配置默认为`uppercased` 27 | 28 | ##### common options [string] 29 | 30 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 31 | 32 | 33 | ### Example 34 | 35 | ``` 36 | uppercase { 37 | source_field = "username" 38 | target_field = "username_uppercased" 39 | } 40 | ``` 41 | -------------------------------------------------------------------------------- /zh-cn/v2/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v2/) 2 | 3 | - [v2.x与v1.x的区别](/zh-cn/v2/diff_v1_v2) 4 | 5 | - [行业应用案例](/zh-cn/case_study/) 6 | 7 | - [seatunnel v2.x For Flink](/zh-cn/v2/flink/) 8 | 9 | - [快速开始](/zh-cn/v2/flink/quick-start) 10 | 11 | - [下载、安装](/zh-cn/v2/flink/installation) 12 | 13 | - [命令使用说明](/zh-cn/v2/flink/commands/) 14 | 15 | - [配置文件](/zh-cn/v2/flink/configuration/) 16 | 17 | - [部署与运行](/zh-cn/v2/flink/deployment) 18 | 19 | - [插件开发](/zh-cn/v2/flink/developing-plugin) 20 | 21 | - [seatunnel v2.x For Spark](/zh-cn/v2/spark/) 22 | 23 | - [快速开始](/zh-cn/v2/spark/quick-start) 24 | 25 | - [下载、安装](/zh-cn/v2/spark/installation) 26 | 27 | - [命令使用说明](/zh-cn/v2/spark/commands/) 28 | 29 | - [配置文件](/zh-cn/v2/spark/configuration/) 30 | 31 | - [部署与运行](/zh-cn/v2/spark/deployment) 32 | 33 | - [插件开发](/zh-cn/v2/spark/developing-plugin) 34 | 35 | - [深入seatunnel](/zh-cn/v2/internal.md) 36 | 37 | - [Roadmap](/zh-cn/v2/roadmap.md) 38 | 39 | - [贡献代码](/zh-cn/v2/contribution.md) 40 | -------------------------------------------------------------------------------- /zh-cn/v1/installation.md: -------------------------------------------------------------------------------- 1 | # 下载、安装 2 | 3 | ## 下载 4 | 5 | ### 社区版本(Community) 6 | 7 | https://github.com/InterestingLab/seatunnel/releases 8 | 9 | ## 环境准备 10 | 11 | ### 准备好JDK1.8 12 | 13 | seatunnel 依赖JDK1.8运行环境。 14 | 15 | ### 准备好Spark 16 | 17 | seatunnel 依赖Spark,安装seatunnel前,需要先准备好Spark。 18 | 请先[下载Spark](http://spark.apache.org/downloads.html), Spark版本请选择 >= 2.x.x。下载解压后,不需要做任何配置即可提交Spark deploy-mode = local模式的任务。 19 | 如果你期望任务运行在Standalone集群或者Yarn、Mesos集群上,请参考Spark官网配置文档。 20 | 21 | ### 安装seatunnel 22 | 23 | 下载seatunnel安装包并解压, 这里以社区版为例: 24 | 25 | ``` 26 | wget https://github.com/InterestingLab/seatunnel/releases/download/v/seatunnel-.zip -O seatunnel-.zip 27 | unzip seatunnel-.zip 28 | ln -s seatunnel- seatunnel 29 | ``` 30 | 31 | 没有任何复杂的安装配置步骤,seatunnel的使用方法请参考[Quick Start](/zh-cn/v1/quick-start.md), 配置请参考[Configuration](/zh-cn/v1/configuration/base)。 32 | 33 | 如果想把seatunnel部署在Spark Standalone/Yarn/Mesos集群上运行,请参考[seatunnel部署](/zh-cn/v1/deployment) 34 | 35 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Convert.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Convert 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 对指定字段进行类型转换 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [new_type](#new_type-string) | string | yes | - | 16 | | [source_field](#source_field-string) | string | yes | - | 17 | | [common-options](#common-options-string)| string | no | - | 18 | 19 | 20 | ##### new_type [string] 21 | 22 | 需要转换的结果类型,当前支持的类型有`string`、`integer`、`long`、`float`、`double`和`boolean`等 23 | 24 | ##### source_field [string] 25 | 26 | 源数据字段 27 | 28 | ##### common options [string] 29 | 30 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 31 | 32 | 33 | ### Examples 34 | 35 | ``` 36 | convert { 37 | source_field = "age" 38 | new_type = "integer" 39 | } 40 | ``` 41 | 42 | > 将源数据中的`age`字段转换为`integer`类型 43 | 44 | 45 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/Kudu.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : Kudu 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.2 6 | 7 | ### Description 8 | 9 | 从[Apache Kudu](https://kudu.apache.org) 表中读取数据. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [kudu_master](#kudu_master-string) | string | yes | - | 16 | | [kudu_table](#kudu_table) | string | yes | - | 17 | | [common-options](#common-options-string)| string | yes | - | 18 | 19 | 20 | ##### kudu_master [string] 21 | 22 | kudu的master,多个master以逗号隔开 23 | 24 | ##### kudu_table [string] 25 | 26 | kudu中要读取的表名 27 | 28 | ##### common options [string] 29 | 30 | `Input` 插件通用参数,详情参照 [Input Plugin](/zh-cn/v1/configuration/input-plugin) 31 | 32 | 33 | ### Example 34 | 35 | ``` 36 | kudu{ 37 | kudu_master="hadoop01:7051,hadoop02:7051,hadoop03:7051" 38 | kudu_table="my_kudu_table" 39 | result_table_name="reg_table" 40 | } 41 | ``` 42 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Watermark.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Watermark 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.3.0 6 | 7 | ### Description 8 | 9 | Allows the user to specify the threshold of late data, and allows the engine to accordingly clean up old state. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [event_time](#event_time-string) | string | yes | | 16 | | [delay_threshold](#delay_threshold-string) | string | yes | | 17 | 18 | ##### event_time [string] 19 | 20 | The name of the column that contains the event time of the row. 21 | 22 | ##### delay_threshold [string] 23 | 24 | The minimum delay to wait to data to arrive late, relative to the latest record that has been processed in the form of an interval (e.g. "1 minute" or "5 hours"). 25 | 26 | ### Example 27 | 28 | ``` 29 | Watermark { 30 | event_time = "datetime" 31 | delay_threshold = "5 minutes" 32 | } 33 | ``` 34 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/installation.md: -------------------------------------------------------------------------------- 1 | # 下载、安装 2 | 3 | ## 下载 4 | 5 | ### 社区版本(Community) 6 | 7 | https://github.com/InterestingLab/seatunnel/releases 8 | 9 | ## 环境准备 10 | 11 | ### 准备好JDK1.8 12 | 13 | seatunnel 依赖JDK1.8运行环境。 14 | 15 | ### 准备好 Spark 16 | 17 | seatunnel 依赖 Spark,安装 seatunnel 前,需要先准备好Spark。 18 | 请先[下载Spark](http://spark.apache.org/downloads.html), Spark版本请选择 >= 2.x.x。下载解压后,不需要做任何配置即可提交Spark deploy-mode = local模式的任务。 19 | 如果你期望任务运行在Standalone集群或者Yarn、Mesos集群上,请参考Spark官网配置文档。 20 | 21 | ### 安装 seatunnel 22 | 23 | 下载seatunnel安装包并解压, 这里以社区版为例: 24 | 25 | ``` 26 | wget https://github.com/InterestingLab/seatunnel/releases/download/v/seatunnel-.zip -O seatunnel-.zip 27 | unzip seatunnel-.zip 28 | ln -s seatunnel- seatunnel 29 | ``` 30 | 31 | 没有任何复杂的安装配置步骤,seatunnel 的使用方法请参考 [Quick Start](/zh-cn/v2/spark/quick-start.md), 配置请参考 [Configuration](/zh-cn/v2/spark/configuration/)。 32 | 33 | 如果想把 seatunnel 部署在 Spark Standalone/Yarn/Mesos 集群上运行,请参考 [seatunnel部署](/zh-cn/v2/spark/deployment) 34 | 35 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Urldecode.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : UrlDecode 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.5.0 6 | 7 | ### Description 8 | 9 | UrlDecode 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [source_field](#source_field-string) | string | no | raw_message | 16 | | [target_field](#target_field-string)| string | no | - | 17 | 18 | 19 | ##### source_field [string] 20 | 21 | 需要进行 `UrlDecode` 处理的字段。 22 | 23 | 24 | ##### target_field [string] 25 | 26 | 存储 `UrlDecode` 处理结果的目标字段,若不配置则与 `source_field` 保持一致。 27 | 28 | ##### common options [string] 29 | 30 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 31 | 32 | 33 | ### Example 34 | 35 | ``` 36 | urldecode { 37 | source_field = "url" 38 | } 39 | ``` 40 | 41 | `UrlDecode` 方法已经注册为 **UDF**,可以直接在 `SQL` 插件中使用 42 | 43 | ``` 44 | sql { 45 | sql = "select urldecode(url) as url from view_1" 46 | } 47 | ``` 48 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Urlencode.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : UrlEncode 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.5.0 6 | 7 | ### Description 8 | 9 | UrlEncode 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [source_field](#source_field-string) | string | no | raw_message | 16 | | [target_field](#target_field-string)| string | no | - | 17 | 18 | 19 | ##### source_field [string] 20 | 21 | 需要进行 `UrlEncode` 处理的字段。 22 | 23 | 24 | ##### target_field [string] 25 | 26 | 存储 `UrlEncode` 处理结果的目标字段,若不配置则与 `source_field` 保持一致。 27 | 28 | ##### common options [string] 29 | 30 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 31 | 32 | 33 | ### Example 34 | 35 | ``` 36 | urlencode { 37 | source_field = "url" 38 | } 39 | ``` 40 | 41 | `UrlEncode` 方法已经注册为 **UDF**,可以直接在 `SQL` 插件中使用 42 | 43 | ``` 44 | sql { 45 | sql = "select urlencode(url) as url from view_1" 46 | } 47 | ``` 48 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/source-plugins/README.md: -------------------------------------------------------------------------------- 1 | ## Source(数据源) 插件的配置 2 | 3 | ### Source插件通用参数 4 | | name | type | required | default value | 5 | | --- | --- | --- | --- | 6 | | [result_table_name](#result_table_name-string) | string | no | - | 7 | | [field_name](#field_name-string) | string | no | - | 8 | 9 | 10 | ##### result_table_name [string] 11 | 12 | 不指定 `result_table_name时` ,此插件处理后的数据,不会被注册为一个可供其他插件直接访问的数据集(dataStream/dataset),或者被称为临时表(table); 13 | 14 | 指定 `result_table_name` 时,此插件处理后的数据,会被注册为一个可供其他插件直接访问的数据集(dataStream/dataset),或者被称为临时表(table)。此处注册的数据集(dataStream/dataset),其他插件可通过指定 `source_table_name` 来直接访问。 15 | 16 | 17 | #### field_name [string] 18 | 19 | 当从上级插件获取到数据时,可以指定获取到字段的名称,方便在后续的sql插件使用。 20 | 21 | ### 使用样例 22 | 23 | ``` 24 | source { 25 | FakeSourceStream { 26 | result_table_name = "fake" 27 | field_name = "name,age" 28 | } 29 | } 30 | ``` 31 | 32 | > 数据源 `FakeSourceStream` 的结果将注册为名为 `fake` 的临时表。这个临时表,可以被任意 `Transform` 或者 `Sink` 插件通过指定 `source_table_name` 使用。\ 33 | `field_name` 将临时表的两列分别命名为`name`和`age`。 34 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v2/) 2 | 3 | - [v2.x与v1.x的区别](/zh-cn/v2/diff_v1_v2) 4 | 5 | - [行业应用案例](/zh-cn/case_study/) 6 | 7 | - [seatunnel v2.x For Flink](/zh-cn/v2/flink/) 8 | 9 | - [seatunnel v2.x For Spark](/zh-cn/v2/spark/) 10 | 11 | - [快速开始](/zh-cn/v2/spark/quick-start) 12 | 13 | - [下载、安装](/zh-cn/v2/spark/installation) 14 | 15 | - [命令使用说明](/zh-cn/v2/spark/commands/) 16 | 17 | - [配置文件](/zh-cn/v2/spark/configuration/) 18 | 19 | - [通用配置](/zh-cn/v2/spark/configuration/) 20 | 21 | - [Source 插件配置](/zh-cn/v2/spark/configuration/source-plugins/) 22 | 23 | - [Transform 插件配置](/zh-cn/v2/spark/configuration/transform-plugins/) 24 | 25 | - [Sink 插件配置](/zh-cn/v2/spark/configuration/sink-plugins/) 26 | 27 | - [完整配置文件案例](/zh-cn/v2/spark/configuration/ConfigExamples.md) 28 | 29 | - [部署与运行](/zh-cn/v2/spark/deployment) 30 | 31 | - [插件开发](/zh-cn/v2/spark/developing-plugin) 32 | 33 | - [深入seatunnel](/zh-cn/v2/internal.md) 34 | 35 | - [Roadmap](/zh-cn/v2/roadmap.md) 36 | 37 | - [贡献代码](/zh-cn/v2/contribution.md) 38 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/commands/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v2/) 2 | 3 | - [v2.x与v1.x的区别](/zh-cn/v2/diff_v1_v2) 4 | 5 | - [行业应用案例](/zh-cn/case_study/) 6 | 7 | - [seatunnel v2.x For Flink](/zh-cn/v2/flink/) 8 | 9 | - [快速开始](/zh-cn/v2/flink/quick-start) 10 | 11 | - [下载、安装](/zh-cn/v2/flink/installation) 12 | 13 | - [命令使用说明](/zh-cn/v2/flink/commands/) 14 | 15 | - [start-seatunnel-flink.sh](/zh-cn/v2/flink/commands/start-seatunnel-flink.sh.md) 16 | 17 | - [配置文件](/zh-cn/v2/flink/configuration/) 18 | - [通用配置](/zh-cn/v2/flink/configuration/) 19 | - [Source 插件配置](/zh-cn/v2/flink/configuration/source-plugins/) 20 | - [Transform 插件配置](/zh-cn/v2/flink/configuration/transform-plugins/) 21 | - [Sink 插件配置](/zh-cn/v2/flink/configuration/sink-plugins/) 22 | 23 | - [部署与运行](/zh-cn/v2/flink/deployment) 24 | 25 | - [插件开发](/zh-cn/v2/flink/developing-plugin) 26 | 27 | - [seatunnel v2.x For Spark](/zh-cn/v2/spark/) 28 | 29 | - [深入seatunnel](/zh-cn/v2/internal.md) 30 | 31 | - [Roadmap](/zh-cn/v2/roadmap.md) 32 | 33 | - [贡献代码](/zh-cn/v2/contribution.md) 34 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v2/) 2 | 3 | - [v2.x与v1.x的区别](/zh-cn/v2/diff_v1_v2) 4 | 5 | - [行业应用案例](/zh-cn/case_study/) 6 | 7 | - [seatunnel v2.x For Flink](/zh-cn/v2/flink/) 8 | 9 | - [快速开始](/zh-cn/v2/flink/quick-start) 10 | 11 | - [下载、安装](/zh-cn/v2/flink/installation) 12 | 13 | - [命令使用说明](/zh-cn/v2/flink/commands/) 14 | 15 | - [配置文件](/zh-cn/v2/flink/configuration/) 16 | 17 | - [通用配置](/zh-cn/v2/flink/configuration/) 18 | 19 | - [Source 插件配置](/zh-cn/v2/flink/configuration/source-plugins/) 20 | 21 | - [Transform 插件配置](/zh-cn/v2/flink/configuration/transform-plugins/) 22 | 23 | - [Sink 插件配置](/zh-cn/v2/flink/configuration/sink-plugins/) 24 | 25 | - [完整配置文件案例](/zh-cn/v2/flink/configuration/ConfigExamples.md) 26 | 27 | - [部署与运行](/zh-cn/v2/flink/deployment) 28 | 29 | - [插件开发](/zh-cn/v2/flink/developing-plugin) 30 | 31 | - [seatunnel v2.x For Spark](/zh-cn/v2/spark/) 32 | 33 | - [深入seatunnel](/zh-cn/v2/internal.md) 34 | 35 | - [Roadmap](/zh-cn/v2/roadmap.md) 36 | 37 | - [贡献代码](/zh-cn/v2/contribution.md) 38 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v2/) 2 | 3 | - [v2.x与v1.x的区别](/zh-cn/v2/diff_v1_v2) 4 | 5 | - [行业应用案例](/zh-cn/case_study/) 6 | 7 | - [seatunnel v2.x For Flink](/zh-cn/v2/flink/) 8 | 9 | - [seatunnel v2.x For Spark](/zh-cn/v2/spark/) 10 | 11 | - [快速开始](/zh-cn/v2/spark/quick-start) 12 | 13 | - [下载、安装](/zh-cn/v2/spark/installation) 14 | 15 | - [命令使用说明](/zh-cn/v2/spark/commands/) 16 | 17 | - [配置文件](/zh-cn/v2/spark/configuration/) 18 | 19 | - [通用配置](/zh-cn/v2/spark/configuration/) 20 | 21 | - [Source 插件配置](/zh-cn/v2/spark/configuration/source-plugins/) 22 | 23 | - [Transform 插件配置](/zh-cn/v2/spark/configuration/transform-plugins/) 24 | 25 | - [Sink 插件配置](/zh-cn/v2/spark/configuration/sink-plugins/) 26 | 27 | - [完整配置文件案例](/zh-cn/v2/spark/configuration/ConfigExamples.md) 28 | 29 | - [部署与运行](/zh-cn/v2/spark/deployment) 30 | 31 | - [插件开发](/zh-cn/v2/spark/developing-plugin) 32 | 33 | - [深入seatunnel](/zh-cn/v2/internal.md) 34 | 35 | - [Roadmap](/zh-cn/v2/roadmap.md) 36 | 37 | - [贡献代码](/zh-cn/v2/contribution.md) 38 | -------------------------------------------------------------------------------- /zh-cn/v1/internal.md: -------------------------------------------------------------------------------- 1 | # 深入 seatunnel 2 | 3 | ## seatunnel 努力改善多处痛点 4 | 5 | 除了大大简化分布式数据处理难度外,seatunnel尽所能为您解决可能遇到的问题: 6 | 7 | * 数据丢失与重复 8 | 9 | 如 seatunnel 的 Kafka Input 是通过 Kafka Direct API 实现的,同时通过checkpoint机制或者支持幂等写入的Output的支持,实现了exactly once操作。此外seatunnel的项目代码经过了详尽测试,尽可能减少了因数据处理异常导致的数据意外丢弃。 10 | 11 | * 任务堆积与延迟 12 | 13 | 在线上环境,存在大量的Spark任务或者包含较多task的单个stage的Spark运行环境中,我们多次遇到单个task处理时间较长,拖慢了整个batch的情况。seatunnel默认开启了Spark推测执行的功能,推测执行功能会找到慢task并启动新的task,并以先完成的task作为结算结果。 14 | 15 | * 吞吐量低 16 | 17 | seatunnel 的代码实现中,直接利用了多项在实践中被证明有利于提升处理性能的Spark的高级特性,如: 18 | 19 | (1)在核心流程代码中,使用Dataset,Spark SQL 编程API,有效利用了Spark 的catalyst优化器。 20 | 21 | (2)支持插件实现中使用broadcast variable,对于IP库解析,写数据库链接维护这样的应用场景,能起到优化作用。 22 | 23 | (3)在插件的实现代码中,性能始终是我们优先考虑的因素。 24 | 25 | * 应用到生产环境周期长 26 | 27 | 使用 seatunnel 可以做到开箱即用,在安装、部署、启动上做了多处简化;插件体系容易配置和部署,开发者能够很快在 seatunnel 中集成特定业务逻辑。 28 | 29 | * 缺少应用运行状态监控 30 | 31 | (1)seatunnel 自带监控工具 `Guardian`,是 seatunnel 的子项目,可监控 seatunnel 是否存活,并能够根据配置自动拉起 seatunnel 实例;可监控其运行时streaming batch是否存在堆积和延迟,并发送报警。 32 | 33 | (2)下一个release版本中将加入数据处理各阶段耗时统计,方便做性能优化。 34 | 35 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Truncate.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Truncate 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 对指定字段进行字符串截取 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [max_length](#max_length-number) | number | no | 256 | 16 | | [source_field](#source_field-string) | string | no | raw_message | 17 | | [target_field](#target_field-string) | string | no | truncated | 18 | | [common-options](#common-options-string)| string | no | - | 19 | 20 | 21 | ##### max_length [number] 22 | 23 | 截取字符串的最大长度 24 | 25 | ##### source_field [string] 26 | 27 | 源字段,若不配置默认为`raw_message` 28 | 29 | ##### target_field [string] 30 | 31 | 目标字段,若不配置默认为`__root__` 32 | 33 | ##### common options [string] 34 | 35 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 36 | 37 | 38 | ### Example 39 | 40 | ``` 41 | truncate { 42 | source_field = "telephone" 43 | max_length = 10 44 | } 45 | ``` 46 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Checksum.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Checksum 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Calculate checksum(default algorithm is SHA1) of specific field and add a new field with the checksum value. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [method](#method-string) | string | no | SHA1 | 16 | | [source_field](#source_field-string) | string | no | raw_message | 17 | | [target_field](#target_field-string) | string | no | checksum | 18 | 19 | ##### method [string] 20 | 21 | Checksum algorithm, supports SHA1,MD5 and CRC32 now. 22 | 23 | ##### source_field [string] 24 | 25 | Source field 26 | 27 | ##### target_field [string] 28 | 29 | Target field 30 | 31 | ### Examples 32 | 33 | ``` 34 | checksum { 35 | source_field = "deviceId" 36 | target_field = "device_crc32" 37 | method = "CRC32" 38 | } 39 | ``` 40 | 41 | > Get CRC32 checksum from `deviceId`, and set it to `device_crc32` 42 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/Stdout.md: -------------------------------------------------------------------------------- 1 | ## Output plugin : Stdout 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 输出数据到标准输出/终端, 常用于debug, 能够很方便输出数据. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | engine | 14 | | --- | --- | --- | --- | --- | 15 | | [limit](#limit-number) | number | no | 100 | batch/spark streaming | 16 | | [format](#format-string) | string | no | plain | batch/spark streaming | 17 | | [common-options](#common-options-string)| string | no | - | all streaming | 18 | 19 | ##### limit [number] 20 | 21 | 限制输出Row的条数,合法范围[-1, 2147483647], `-1`表示输出最多2147483647条Row 22 | 23 | ##### format [string] 24 | 25 | 输出到终端的格式,可用的`format`包括: `json`, `plain` 以及 `schema`,用于输出数据的 **Schema** 26 | 27 | ##### common options [string] 28 | 29 | `Output` 插件通用参数,详情参照 [Output Plugin](/zh-cn/v1/configuration/output-plugin) 30 | 31 | 32 | ### Example 33 | 34 | ``` 35 | stdout { 36 | limit = 10 37 | format = "json" 38 | } 39 | ``` 40 | 41 | > 以Json格式输出10条数据 42 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/transform-plugins/Split.md: -------------------------------------------------------------------------------- 1 | ## Transform plugin : Split [Flink] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 定义了一个字符串切割函数,用于在Sql插件对指定字段进行分割。 9 | 10 | ### Options 11 | | name | type | required | default value | 12 | | --- | --- | --- | --- | 13 | | [separator](#separator-string) | string | no | , | 14 | | [fields](#fields-array) | array | yes | - | 15 | | [common-options](#common-options-string)| string | no | - | 16 | 17 | 18 | 19 | ##### separator [string] 20 | 21 | 指定的分隔符,默认为`,` 22 | 23 | ##### fields [array] 24 | 25 | 分割后各个字段的名称 26 | 27 | ##### common options [string] 28 | 29 | `Transform` 插件通用参数,详情参照 [Transform Plugin](/zh-cn/v2/flink/configuration/transform-plugins/) 30 | 31 | ### Examples 32 | 33 | ``` 34 | #这个只是创建了一个叫split的udf 35 | Split{ 36 | separator = "#" 37 | fields = ["name","age"] 38 | } 39 | #使用split函数(确认fake表存在) 40 | sql { 41 | sql = "select * from (select info,split(info) as info_row from fake) t1" 42 | } 43 | ``` 44 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/sink-plugins/Console.md: -------------------------------------------------------------------------------- 1 | ## Sink plugin : Console [Spark] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | 输出数据到标准输出/终端, 常用于debug, 能够很方便观察数据. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | engine | 14 | | --- | --- | --- | --- | --- | 15 | | [limit](#limit-number) | number | no | 100 | batch/spark streaming | 16 | | [serializer](#serializer-string) | string | no | plain | batch/spark streaming | 17 | | [common-options](#common-options-string)| string | no | - | all streaming | 18 | 19 | ##### limit [number] 20 | 21 | 限制输出Row的条数,合法范围[-1, 2147483647], `-1`表示输出最多2147483647条Row 22 | 23 | ##### serializer [string] 24 | 25 | 输出时序列化的格式,可用的serializer包括: `json`, `plain` 26 | 27 | ##### common options [string] 28 | 29 | `Sink` 插件通用参数,详情参照 [Sink Plugin](/zh-cn/v2/spark/configuration/sink-plugins/) 30 | 31 | 32 | ### Examples 33 | 34 | ``` 35 | console { 36 | limit = 10 37 | serializer = "json" 38 | } 39 | ``` 40 | 41 | > 以Json格式输出10条数据 42 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Checksum.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Checksum 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 获取指定字段的校验码 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [method](#method-string) | string | no | SHA1 | 16 | | [source_field](#source_field-string) | string | no | raw_message | 17 | | [target_field](#target_field-string) | string | no | checksum | 18 | | [common-options](#common-options-string)| string | no | - | 19 | 20 | 21 | ##### method [string] 22 | 23 | 校验方法,当前支持SHA1、MD5和CRC32 24 | 25 | ##### source_field [string] 26 | 27 | 源字段 28 | 29 | ##### target_field [string] 30 | 31 | 转换后的字段 32 | 33 | ##### common options [string] 34 | 35 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 36 | 37 | 38 | ### Examples 39 | 40 | ``` 41 | checksum { 42 | source_field = "deviceId" 43 | target_field = "device_crc32" 44 | method = "CRC32" 45 | } 46 | ``` 47 | 48 | > 获取`deviceId`字段CRC32校验码 49 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/sink-plugins/README.md: -------------------------------------------------------------------------------- 1 | ## Sink(数据输出) 插件的配置 2 | 3 | ### Sink插件通用参数 4 | 5 | | name | type | required | default value | 6 | | --- | --- | --- | --- | 7 | | [source_table_name](#source_table_name-string) | string | no | - | 8 | 9 | 10 | ##### source_table_name [string] 11 | 12 | 不指定 `source_table_name` 时,当前插件处理的就是配置文件中上一个插件输出的数据集(dataStream/dataset); 13 | 14 | 指定 `source_table_name` 的时候,当前插件处理的就是此参数对应的数据集。 15 | 16 | 17 | ### 使用样例 18 | 19 | ``` 20 | source { 21 | FakeSourceStream { 22 | result_table_name = "fake" 23 | field_name = "name,age" 24 | } 25 | } 26 | 27 | transform { 28 | sql { 29 | source_table_name = "fake" 30 | sql = "select name from fake" 31 | result_table_name = "fake_name" 32 | } 33 | sql { 34 | source_table_name = "fake" 35 | sql = "select age from fake" 36 | result_table_name = "fake_age" 37 | } 38 | } 39 | 40 | sink { 41 | console { 42 | source_table_name = "fake_name" 43 | } 44 | } 45 | ``` 46 | 47 | > 如果不指定`source_table_name`的话,console输出的是最后一个transform的数据,设置了为`fake_name`将输出`fake_name`的数据 48 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/commands/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v2/) 2 | 3 | - [v2.x与v1.x的区别](/zh-cn/v2/diff_v1_v2) 4 | 5 | - [行业应用案例](/zh-cn/case_study/) 6 | 7 | - [seatunnel v2.x For Flink](/zh-cn/v2/flink/) 8 | 9 | - [seatunnel v2.x For Spark](/zh-cn/v2/spark/) 10 | 11 | - [快速开始](/zh-cn/v2/spark/quick-start) 12 | 13 | - [下载、安装](/zh-cn/v2/spark/installation) 14 | 15 | - [命令使用说明](/zh-cn/v2/spark/commands/) 16 | 17 | - [start-seatunnel-spark.sh](/zh-cn/v2/spark/commands/start-seatunnel-spark.sh.md) 18 | 19 | - [配置文件](/zh-cn/v2/spark/configuration/) 20 | 21 | - [通用配置](/zh-cn/v2/spark/configuration/) 22 | 23 | - [Source 插件配置](/zh-cn/v2/spark/configuration/source-plugins/) 24 | 25 | - [Transform 插件配置](/zh-cn/v2/spark/configuration/transform-plugins/) 26 | 27 | - [Sink 插件配置](/zh-cn/v2/spark/configuration/sink-plugins/) 28 | 29 | - [完整配置文件案例](/zh-cn/v2/spark/configuration/ConfigExamples.md) 30 | 31 | - [部署与运行](/zh-cn/v2/spark/deployment) 32 | 33 | - [插件开发](/zh-cn/v2/spark/developing-plugin) 34 | 35 | - [深入seatunnel](/zh-cn/v2/internal.md) 36 | 37 | - [Roadmap](/zh-cn/v2/roadmap.md) 38 | 39 | - [贡献代码](/zh-cn/v2/contribution.md) 40 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/FileStream.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : FileStream [Streaming] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.0 6 | 7 | ### Description 8 | 9 | 从本地文件目录中读取原始数据,会监听新文件生成。 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [format](#format-string) | string | yes | text | 16 | | [path](#path-string) | string | yes | - | 17 | | [rowTag](#rowtag-string) | string | yes | - | 18 | | [common-options](#common-options-string)| string | yes | - | 19 | 20 | 21 | ##### format [string] 22 | 23 | 文件格式 24 | 25 | 26 | ##### path [string] 27 | 28 | 文件目录路径 29 | 30 | 31 | ##### rowTag [string] 32 | 33 | 仅当format为xml时使用,表示XML格式数据的Tag 34 | 35 | ##### common options [string] 36 | 37 | `Input` 插件通用参数,详情参照 [Input Plugin](/zh-cn/v1/configuration/input-plugin) 38 | 39 | 40 | ### Example 41 | 42 | ``` 43 | fileStream { 44 | path = "file:///var/log/" 45 | } 46 | ``` 47 | 48 | 或者指定`format` 49 | 50 | ``` 51 | fileStream { 52 | path = "file:///var/log/" 53 | format = "xml" 54 | rowTag = "book" 55 | } 56 | ``` 57 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/sink-plugins/File.md: -------------------------------------------------------------------------------- 1 | ## Sink plugin : File [Flink] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 写入数据到文件系统 9 | 10 | ### Options 11 | 12 | ### Options 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [format](#format-string) | string | yes | - | 16 | | [path](#path-string) | string | yes | - | 17 | | [write_mode](#write_mode-string)| string | no | - | 18 | | [common-options](#common-options-string)| string | no | - | 19 | 20 | ##### format [string] 21 | 22 | 目前支持`csv`、`json`、和 `text`。streaming模式目前只支持`text` 23 | 24 | ##### path [string] 25 | 26 | 需要文件路径,hdfs文件以hdfs://开头,本地文件以file://开头。 27 | 28 | ##### write_mode [string] 29 | 30 | - NO_OVERWRITE 31 | - 不覆盖,路径存在报错 32 | - OVERWRITE 33 | - 覆盖,路径存在则先删除再写入 34 | 35 | ##### common options [string] 36 | 37 | `Sink` 插件通用参数,详情参照 [Sink Plugin](/zh-cn/v2/flink/configuration/sink-plugins/) 38 | 39 | ### Examples 40 | 41 | ``` 42 | FileSink { 43 | format = "json" 44 | path = "hdfs://localhost:9000/flink/output/" 45 | write_mode = "OVERWRITE" 46 | } 47 | ``` 48 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/source-plugins/FakeStream.md: -------------------------------------------------------------------------------- 1 | ## Source plugin : FakeStream [Spark] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | `FakeStream` 主要用于方便得生成用户指定的数据,作为输入来对seatunnel进行功能验证,测试,以及性能测试等。 10 | 11 | 12 | ### Options 13 | 14 | | name | type | required | default value | 15 | | --- | --- | --- | --- | 16 | | [content](#content-array) | array | no | - | 17 | | [rate](#rate-number) | number | yes | - | 18 | | [common-options](#common-options-string)| string | yes | - | 19 | 20 | 21 | ##### content [array] 22 | 23 | 测试数据字符串列表 24 | 25 | ##### rate [number] 26 | 27 | 每秒生成测试用例个数 28 | 29 | ##### common options [string] 30 | 31 | `Source` 插件通用参数,详情参照 [Source Plugin](/zh-cn/v2/spark/configuration/source-plugins/) 32 | 33 | 34 | 35 | ### Examples 36 | 37 | ``` 38 | fakeStream { 39 | content = ['name=ricky&age=23', 'name=gary&age=28'] 40 | rate = 5 41 | } 42 | ``` 43 | 生成的数据如下,从`content`列表中随机抽取其中的字符串 44 | 45 | ``` 46 | +-----------------+ 47 | |raw_message | 48 | +-----------------+ 49 | |name=gary&age=28 | 50 | |name=ricky&age=23| 51 | +-----------------+ 52 | ``` 53 | -------------------------------------------------------------------------------- /en/configuration/input-plugins/MySQL.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : Mysql 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.2 6 | 7 | ### Description 8 | 9 | Read data from MySQL. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [password](#password-string) | string | yes | - | 16 | | [table](#table-string) | string | yes | - | 17 | | [table_name](#table_name-string) | string | yes | - | 18 | | [url](#url-string) | string | yes | - | 19 | | [user](#user-string) | string | yes | - | 20 | 21 | 22 | ##### password [string] 23 | 24 | Password. 25 | 26 | ##### table [string] 27 | 28 | Table name. 29 | 30 | 31 | ##### table_name [string] 32 | 33 | Registered table name of input data. 34 | 35 | 36 | ##### url [string] 37 | 38 | The url of JDBC. For example: `jdbc:mysql://localhost:3306/info` 39 | 40 | 41 | ##### user [string] 42 | 43 | Username. 44 | 45 | 46 | ### Example 47 | 48 | ``` 49 | mysql { 50 | url = "jdbc:mysql://localhost:3306/info" 51 | table = "access" 52 | table_name = "access_log" 53 | user = "username" 54 | password = "password" 55 | } 56 | ``` 57 | 58 | > Read data from MySQL. 59 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Join.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Join 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.3.0 6 | 7 | ### Description 8 | 9 | Joining a streaming Dataset/DataFrame with a static Dataset/DataFrame. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [source_field](#source_field-string) | string | no | raw_message | 16 | | [table_name](#table_name-string) | string | yes | - | 17 | 18 | ##### source_field [string] 19 | 20 | Source field, default is `raw_message`. 21 | 22 | ##### table_name [string] 23 | 24 | Static Dataset/DataFrame name. 25 | 26 | ### Examples 27 | 28 | ``` 29 | input { 30 | fakestream { 31 | content = ["Hello World,seatunnel"] 32 | rate = 1 33 | } 34 | 35 | mysql { 36 | url = "jdbc:mysql://localhost:3306/info" 37 | table = "project_info" 38 | table_name = "access_log" 39 | user = "username" 40 | password = "password" 41 | } 42 | } 43 | 44 | filter { 45 | split { 46 | fields = ["msg", "project"] 47 | delimiter = "," 48 | } 49 | 50 | join { 51 | table_name = "user_info" 52 | source_field = "project" 53 | } 54 | } 55 | ``` 56 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/transform-plugins/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v2/) 2 | 3 | - [v2.x与v1.x的区别](/zh-cn/v2/diff_v1_v2) 4 | 5 | - [行业应用案例](/zh-cn/case_study/) 6 | 7 | - [seatunnel v2.x For Flink](/zh-cn/v2/flink/) 8 | 9 | - [快速开始](/zh-cn/v2/flink/quick-start) 10 | 11 | - [下载、安装](/zh-cn/v2/flink/installation) 12 | 13 | - [命令使用说明](/zh-cn/v2/flink/commands/) 14 | 15 | - [配置文件](/zh-cn/v2/flink/configuration/) 16 | - [通用配置](/zh-cn/v2/flink/configuration/) 17 | 18 | - [Source 插件配置](/zh-cn/v2/flink/configuration/source-plugins/) 19 | 20 | - [Transform 插件配置](/zh-cn/v2/flink/configuration/transform-plugins/) 21 | 22 | - [SQL](/zh-cn/v2/flink/configuration/transform-plugins/Sql.md) 23 | 24 | - [Split](/zh-cn/v2/flink/configuration/transform-plugins/Split.md) 25 | 26 | - [Sink 插件配置](/zh-cn/v2/flink/configuration/sink-plugins/) 27 | 28 | - [完整配置文件案例](/zh-cn/v2/flink/configuration/ConfigExamples.md) 29 | 30 | - [部署与运行](/zh-cn/v2/flink/deployment) 31 | 32 | - [插件开发](/zh-cn/v2/flink/developing-plugin) 33 | 34 | - [seatunnel v2.x For Spark](/zh-cn/v2/spark/) 35 | 36 | - [深入seatunnel](/zh-cn/v2/internal.md) 37 | 38 | - [Roadmap](/zh-cn/v2/roadmap.md) 39 | 40 | - [贡献代码](/zh-cn/v2/contribution.md) 41 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/Kudu.md: -------------------------------------------------------------------------------- 1 | ## Output plugin : Kudu 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.2 6 | 7 | ### Description 8 | 9 | 写入数据到[Apache Kudu](https://kudu.apache.org)表中 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [kudu_master](#kudu_master-string) | string | yes | - | 16 | | [kudu_table](#kudu_table) | string | yes | - | 17 | | [mode](#mode-string) | string | no | insert | 18 | | [common-options](#common-options-string)| string | no | - | 19 | 20 | 21 | ##### kudu_master [string] 22 | 23 | kudu的master,多个master以逗号隔开 24 | 25 | ##### kudu_table [string] 26 | 27 | kudu中要写入的表名,表必须已经存在 28 | 29 | ##### mode [string] 30 | 31 | 写入kudu中采取的模式,支持 insert|update|upsert|insertIgnore,默认为insert 32 | insert和insertIgnore :insert在遇见主键冲突将会报错,insertIgnore不会报错,将会舍弃这条数据 33 | update和upsert :update找不到要更新的主键将会报错,upsert不会,将会把这条数据插入 34 | 35 | ##### common options [string] 36 | 37 | `Output` 插件通用参数,详情参照 [Output Plugin](/zh-cn/v1/configuration/output-plugin) 38 | 39 | 40 | ### Example 41 | 42 | ``` 43 | kudu{ 44 | kudu_master="hadoop01:7051,hadoop02:7051,hadoop03:7051" 45 | kudu_table="my_kudu_table" 46 | mode="upsert" 47 | } 48 | ``` 49 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/HdfsStream.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : HdfsStream [Streaming] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.0 6 | 7 | ### Description 8 | 9 | 监听HDFS目录中的文件变化,实时加载并处理新文件,形成文件处理流。 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [format](#format-string) | no | yes | text | 16 | | [path](#path-string) | string | yes | - | 17 | | [rowTag](#rowtag-string) | no | yes | - | 18 | | [common-options](#common-options-string)| string | yes | - | 19 | 20 | 21 | ##### format [string] 22 | 23 | 文件格式 24 | 25 | 26 | ##### path [string] 27 | 28 | 文件目录路径 29 | 30 | 31 | ##### rowTag [string] 32 | 33 | 仅当format为xml时使用,表示XML格式数据的Tag 34 | 35 | ##### common options [string] 36 | 37 | `Input` 插件通用参数,详情参照 [Input Plugin](/zh-cn/v1/configuration/input-plugin) 38 | 39 | 40 | ### Example 41 | 42 | ``` 43 | hdfsStream { 44 | path = "hdfs:///access/log/" 45 | } 46 | ``` 47 | 48 | 或者可以指定 hdfs name service: 49 | 50 | ``` 51 | hdfsStream { 52 | path = "hdfs://m2:8022/access/log/" 53 | } 54 | ``` 55 | 56 | 或者指定`format` 57 | 58 | ``` 59 | hdfsStream { 60 | path = "hdfs://m2:8022/access/log/" 61 | format = "xml" 62 | rowTag = "book" 63 | } 64 | ``` 65 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/sink-plugins/Phoenix.md: -------------------------------------------------------------------------------- 1 | ## Sink plugin : Phoenix [Spark] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | 输出数据到Phoenix,兼容Kerberos认证 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [zk-connect](#zk-connect-string) | array | yes | - | 16 | | [table](#table-string) | string | yes | - | 17 | | [tenantId](#tenant-id-string) | string | no | - | 18 | | [skipNormalizingIdentifier](#skipNormalizingIdentifier-boolean) | boolean | no | false | 19 | 20 | | [skipNormalizingIdentifier](#skip-normalizing-identifier-boolean) | boolean | no | false | 21 | 22 | ##### zk-connect [string] 23 | 24 | 连接串, 配置示例:host1:2181,host2:2181,host3:2181【/znode】 25 | 26 | ##### table [string] 27 | 28 | 目标表名 29 | 30 | ##### tenant-id [string] 31 | 32 | 租户ID,非必须配置项 33 | 34 | ##### skipNormalizingIdentifier [boolean] 35 | 36 | 是否跳过规范化标识符,如果列名被双引号包围,它是按原样使用的,否则名称是大写的。非必须配置项,默认为false 37 | 38 | ##### common options [string] 39 | 40 | `Sink` 插件通用参数,详情参照 [Sink Plugin](/zh-cn/v2/spark/configuration/sink-plugins/) 41 | 42 | ### Examples 43 | 44 | ``` 45 | Phoenix { 46 | zk-connect = "host1:2181,host2:2181,host3:2181" 47 | table = "table22" 48 | } 49 | ``` 50 | -------------------------------------------------------------------------------- /en/configuration/output-plugins/MySQL.md: -------------------------------------------------------------------------------- 1 | ## Output plugin : Mysql 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Write Rows to MySQL. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [password](#password-string) | string | yes | - | 16 | | [save_mode](#save_mode-string) | string | no | append | 17 | | [table](#table-string) | string | yes | - | 18 | | [url](#url-string) | string | yes | - | 19 | | [user](#user-string) | string | yes | - | 20 | 21 | 22 | ##### password [string] 23 | 24 | Password. 25 | 26 | ##### save_mode [string] 27 | 28 | Save mode, supports `overwrite`, `append`, `ignore` and `error`. The detail of save_mode see [save-modes](http://spark.apache.org/docs/2.2.0/sql-programming-guide.html#save-modes). 29 | 30 | ##### table [string] 31 | 32 | Table name. 33 | 34 | ##### url [string] 35 | 36 | The url of JDBC. For example: `jdbc:mysql://localhose:3306/info` 37 | 38 | 39 | ##### user [string] 40 | 41 | Username. 42 | 43 | 44 | ### Example 45 | 46 | ``` 47 | mysql { 48 | url = "jdbc:mysql://localhost:3306/info" 49 | table = "access" 50 | user = "username" 51 | password = "password" 52 | save_mode = "append" 53 | } 54 | ``` 55 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/transform-plugins/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v2/) 2 | 3 | - [v2.x与v1.x的区别](/zh-cn/v2/diff_v1_v2) 4 | 5 | - [行业应用案例](/zh-cn/case_study/) 6 | 7 | - [seatunnel v2.x For Flink](/zh-cn/v2/flink/) 8 | 9 | - [seatunnel v2.x For Spark](/zh-cn/v2/spark/) 10 | 11 | - [快速开始](/zh-cn/v2/spark/quick-start) 12 | 13 | - [下载、安装](/zh-cn/v2/spark/installation) 14 | 15 | - [命令使用说明](/zh-cn/v2/spark/commands/) 16 | 17 | - [配置文件](/zh-cn/v2/spark/configuration/) 18 | 19 | - [通用配置](/zh-cn/v2/spark/configuration/) 20 | 21 | - [Source 插件配置](/zh-cn/v2/spark/configuration/source-plugins/) 22 | 23 | - [Transform 插件配置](/zh-cn/v2/spark/configuration/transform-plugins/) 24 | 25 | - [Json](/zh-cn/v2/spark/configuration/transform-plugins/Json.md) 26 | 27 | - [Split](/zh-cn/v2/spark/configuration/transform-plugins/Split.md) 28 | 29 | - [SQL](/zh-cn/v2/spark/configuration/transform-plugins/Sql.md) 30 | 31 | - [Sink 插件配置](/zh-cn/v2/spark/configuration/sink-plugins/) 32 | 33 | - [完整配置文件案例](/zh-cn/v2/spark/configuration/ConfigExamples.md) 34 | 35 | - [部署与运行](/zh-cn/v2/spark/deployment) 36 | 37 | - [插件开发](/zh-cn/v2/spark/developing-plugin) 38 | 39 | - [深入seatunnel](/zh-cn/v2/internal.md) 40 | 41 | - [Roadmap](/zh-cn/v2/roadmap.md) 42 | 43 | - [贡献代码](/zh-cn/v2/contribution.md) 44 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/source-plugins/Phoenix.md: -------------------------------------------------------------------------------- 1 | ## Source plugin : Phoenix [Spark] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | 通过Phoenix读取外部数据源数据,兼容Kerberos认证 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [zk-connect](#zk-connect-string) | string | yes | - | 16 | | [table](#table-string) | string| yes | | 17 | | [columns](#columns-string-list) | string | no | - | 18 | | [tenantId](#tenant-id-string) | string | no | - | 19 | | [predicate](#predicate-string) | string | no | - | 20 | 21 | 22 | ##### zk-connect [string] 23 | 24 | 连接串, 配置示例:host1:2181,host2:2181,host3:2181【/znode】 25 | 26 | ##### table [string] 27 | 28 | 源数据表名 29 | 30 | ##### columns [string-list] 31 | 32 | 读取列名配置. 读取全部列设置为[], 非必须配置项,默认为[] 33 | 34 | ##### tenant-id [string] 35 | 36 | 租户ID,非必须配置项 37 | 38 | ##### predicate [string] 39 | 40 | 条件过滤串配置, 非必须配置项 41 | 42 | ##### common options [string] 43 | 44 | `Source` 插件通用参数,详情参照 [Source Plugin](/zh-cn/v2/spark/configuration/source-plugins/) 45 | 46 | 47 | ### Example 48 | 49 | ``` 50 | Phoenix { 51 | zk-connect = "host1:2181,host2:2181,host3:2181" 52 | table = "table22" 53 | result_table_name = "tmp1" 54 | } 55 | ``` 56 | 57 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugin.md: -------------------------------------------------------------------------------- 1 | # Filter 插件 2 | 3 | ### Filter插件通用参数 4 | 5 | | name | type | required | default value | 6 | | --- | --- | --- | --- | 7 | | [source_table_name](#source_table_name-string) | string | no | - | 8 | | [result_table_name](#result_table_name-string) | string | no | - | 9 | 10 | 11 | ##### source_table_name [string] 12 | 13 | 不指定 `source_table_name` 时,当前插件处理的就是配置文件中上一个插件输出的数据集(dataset); 14 | 15 | 指定 `source_table_name` 的时候,当前插件处理的就是此参数对应的数据集。 16 | 17 | ##### result_table_name [string] 18 | 19 | 不指定 `result_table_name时` ,此插件处理后的数据,不会被注册为一个可供其他插件直接访问的数据集(dataset),或者被称为临时表(table); 20 | 21 | 指定 `result_table_name` 时,此插件处理后的数据,会被注册为一个可供其他插件直接访问的数据集(dataset),或者被称为临时表(table)。此处注册的数据集(dataset),其他插件可通过指定 `source_table_name` 来直接访问。 22 | 23 | ### 使用样例 24 | 25 | ``` 26 | split { 27 | source_table_name = "view_table_1" 28 | source_field = "message" 29 | delimiter = "&" 30 | fields = ["field1", "field2"] 31 | result_table_name = "view_table_2" 32 | } 33 | ``` 34 | 35 | > `Split` 插件将会处理临时表 `view_table_1` 中的数据,并将处理结果注册为名为 `view_table_2` 的临时表, 这张临时表可以被后续任意 `Filter` 或 `Output` 插件通过指定 `source_table_name` 使用。 36 | 37 | ``` 38 | split { 39 | source_field = "message" 40 | delimiter = "&" 41 | fields = ["field1", "field2"] 42 | } 43 | ``` 44 | 45 | > 没有配置 `source_table_name`,`Split` 插件会读取上一个插件传递过来的数据集,并且传递给下一个插件。 46 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/sink-plugins/Kafka.md: -------------------------------------------------------------------------------- 1 | ## Sink plugin : Kafka [Flink] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 写入数据到Kafka 9 | 10 | ### Options 11 | 12 | | name | type | required | default value | 13 | | --- | --- | --- | --- | 14 | | [producer.bootstrap.servers](#producerbootstrapservers-string) | string | yes | - | 15 | | [topic](#topic-string) | string | yes | - | 16 | | [producer.*](#producer-string) | string | no | - | 17 | | [common-options](#common-options-string)| string | no | - | 18 | 19 | 20 | ##### producer.bootstrap.servers [string] 21 | 22 | Kafka Brokers List 23 | 24 | ##### topic [string] 25 | 26 | Kafka Topic 27 | 28 | ##### producer [string] 29 | 30 | 除了以上必备的kafka producer客户端必须指定的参数外,用户还可以指定多个producer客户端非必须参数,覆盖了[kafka官方文档指定的所有producer参数](http://kafka.apache.org/documentation.html#producerconfigs). 31 | 32 | 指定参数的方式是在原参数名称上加上前缀"producer.",如指定`request.timeout.ms`的方式是: `producer.request.timeout.ms = 60000`。如果不指定这些非必须参数,它们将使用Kafka官方文档给出的默认值。 33 | 34 | ##### common options [string] 35 | 36 | `Sink` 插件通用参数,详情参照 [Sink Plugin](/zh-cn/v2/flink/configuration/sink-plugins/) 37 | 38 | ### Examples 39 | 40 | ``` 41 | KafkaTable { 42 | producer.bootstrap.servers = "127.0.0.1:9092" 43 | topics = test_sink 44 | } 45 | ``` 46 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/commands/start-waterdrop-flink.sh.md: -------------------------------------------------------------------------------- 1 | ## start-seatunnel-flink.sh 使用方法 2 | 3 | 4 | ```bash 5 | bin/start-seatunnel-flink.sh -c config-path -i key=value [other params] 6 | ``` 7 | > 使用 `-c`或者`--config`来指定配置文件的路径。 8 | 9 | > 使用 `-i` 或者 `--variable` 来指定配置文件中的变量,可以配置多个 10 | 11 | ``` 12 | env { 13 | execution.parallelism = 1 14 | } 15 | 16 | source { 17 | FakeSourceStream { 18 | result_table_name = "fake" 19 | field_name = "name,age" 20 | } 21 | } 22 | 23 | transform { 24 | sql { 25 | sql = "select name,age from fake where name='"${my_name}"'" 26 | } 27 | } 28 | 29 | sink { 30 | ConsoleSink {} 31 | } 32 | ``` 33 | 34 | ```bash 35 | bin/start-seatunnel-flink.sh -c config-path -i my_name=kid-xiong 36 | ``` 37 | 这样指定将会把配置文件中的`"${my_name}"`替换为`kid-xiong` 38 | 39 | > 其余参数参考flink原始参数,查看flink参数方法:`flink run -h`,参数可以根据需求任意添加,如`-m yarn-cluster`则指定为on yarn模式。 40 | 41 | ```bash 42 | flink run -h 43 | ``` 44 | * flink standalone 可配置的参数 45 | ![standalone](../../../images/flink/standalone.jpg) 46 | 例如:-p 2 指定作业并行度为2 47 | ```bash 48 | bin/start-seatunnel-flink.sh -p 2 -c config-path 49 | ``` 50 | 51 | * flink yarn-cluster 可配置参数 52 | ![yarn-cluster](../../../images/flink/yarn.jpg) 53 | 例如:-m yarn-cluster -ynm seatunnel 指定作业在运行在yarn上,并且yarn webUI的名称为seatunnel 54 | ```bash 55 | bin/start-seatunnel-flink.sh -m yarn-cluster -ynm seatunnel -c config-path 56 | ``` 57 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/transform-plugins/README.md: -------------------------------------------------------------------------------- 1 | # Transform Plugin 2 | 3 | ### Transform Common Options 4 | 5 | | name | type | required | default value | 6 | | --- | --- | --- | --- | 7 | | [source_table_name](#source_table_name-string) | string | no | - | 8 | | [result_table_name](#result_table_name-string) | string | no | - | 9 | 10 | 11 | ##### source_table_name [string] 12 | 13 | 不指定 `source_table_name` 时,当前插件处理的就是配置文件中上一个插件输出的数据集(dataset); 14 | 15 | 指定 `source_table_name` 的时候,当前插件处理的就是此参数对应的数据集。 16 | 17 | ##### result_table_name [string] 18 | 19 | 不指定 `result_table_name`时 ,此插件处理后的数据,不会被注册为一个可供其他插件直接访问的数据集(dataset),或者被称为临时表(table); 20 | 21 | 指定 `result_table_name` 时,此插件处理后的数据,会被注册为一个可供其他插件直接访问的数据集(dataset),或者被称为临时表(table)。此处注册的数据集(dataset),其他插件可通过指定 `source_table_name` 来直接访问。 22 | 23 | 24 | ### Examples 25 | 26 | ``` 27 | split { 28 | source_table_name = "view_table_1" 29 | source_field = "message" 30 | delimiter = "&" 31 | fields = ["field1", "field2"] 32 | result_table_name = "view_table_2" 33 | } 34 | ``` 35 | 36 | > `Split` 插件将会处理临时表 `view_table_1` 中的数据,并将处理结果注册为名为 `view_table_2` 的临时表, 这张临时表可以被后续任意 `Filter` 或 `Output` 插件通过指定 `source_table_name` 使用。 37 | 38 | ``` 39 | split { 40 | source_field = "message" 41 | delimiter = "&" 42 | fields = ["field1", "field2"] 43 | } 44 | ``` 45 | 46 | > 若不配置 `source_table_name`, 将配置文件中最后一个 `Transform` 插件的处理结果输出 47 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/Hive.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : Hive 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.2 6 | 7 | ### Description 8 | 9 | 从hive中获取数据, 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [pre_sql](#pre_sql-string) | string | yes | - | 16 | | [common-options](#common-options-string)| string | yes | - | 17 | 18 | 19 | ##### pre_sql [string] 20 | 21 | 进行预处理的sql, 如果不需要预处理,可以使用select * from hive_db.hive_table 22 | 23 | ##### common options [string] 24 | 25 | `Input` 插件通用参数,详情参照 [Input Plugin](/zh-cn/v1/configuration/input-plugin) 26 | 27 | 28 | **注意:从seatunnel v1.3.4 开始,使用hive input必须做如下配置:** 29 | 30 | ``` 31 | # seatunnel 配置文件中的spark section中: 32 | 33 | spark { 34 | ... 35 | spark.sql.catalogImplementation = "hive" 36 | ... 37 | } 38 | 39 | ``` 40 | 41 | 42 | ### Example 43 | 44 | ``` 45 | spark { 46 | ... 47 | spark.sql.catalogImplementation = "hive" 48 | ... 49 | } 50 | 51 | input { 52 | hive { 53 | pre_sql = "select * from mydb.mytb" 54 | result_table_name = "myTable" 55 | } 56 | } 57 | 58 | ... 59 | ``` 60 | 61 | ### Notes 62 | 必须保证hive的metastore是在服务状态。启动命令 `hive --service metastore` 服务的默认端口的`9083` 63 | cluster、client、local模式下必须把hive-site.xml置于提交任务节点的$HADOOP_CONF目录下(或者放在$SPARK_HOME/conf下面),IDE本地调试将其放在resources目录 64 | 65 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Join.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Join 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.3.0 6 | 7 | ### Description 8 | 9 | 和指定的临时表进行Join操作, 目前仅支持Stream-static Inner Joins 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [source_field](#source_field-string) | string | no | raw_message | 16 | | [table_name](#table_name-string) | string | yes | - | 17 | | [common-options](#common-options-string)| string | no | - | 18 | 19 | 20 | ##### source_field [string] 21 | 22 | 源字段,若不配置默认为`raw_message` 23 | 24 | ##### table_name [string] 25 | 26 | 临时表表名 27 | 28 | ##### common options [string] 29 | 30 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 31 | 32 | 33 | ### Examples 34 | 35 | ``` 36 | input { 37 | fakestream { 38 | content = ["Hello World,seatunnel"] 39 | rate = 1 40 | } 41 | 42 | mysql { 43 | url = "jdbc:mysql://localhost:3306/info" 44 | table = "project_info" 45 | table_name = "spark_project_info" 46 | user = "username" 47 | password = "password" 48 | } 49 | } 50 | 51 | filter { 52 | split { 53 | fields = ["msg", "project"] 54 | delimiter = "," 55 | } 56 | 57 | join { 58 | table_name = "spark_project_info" 59 | source_field = "project" 60 | } 61 | } 62 | ``` 63 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/source-plugins/Hive.md: -------------------------------------------------------------------------------- 1 | ## Source plugin : Hive [Spark] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | 从hive中获取数据 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [pre_sql](#pre_sql-string) | string | yes | - | 16 | | [common-options](#common-options-string)| string | yes | - | 17 | 18 | 19 | ##### pre_sql [string] 20 | 21 | 进行预处理的sql, 如果不需要预处理,可以使用select * from hive_db.hive_table 22 | 23 | ##### common options [string] 24 | 25 | `Source` 插件通用参数,详情参照 [Source Plugin](/zh-cn/v2/spark/configuration/source-plugins/) 26 | 27 | 28 | **注意:使用hive source必须做如下配置:** 29 | 30 | ``` 31 | # seatunnel 配置文件中的spark section中: 32 | 33 | env { 34 | ... 35 | spark.sql.catalogImplementation = "hive" 36 | ... 37 | } 38 | 39 | ``` 40 | 41 | 42 | ### Example 43 | 44 | ``` 45 | env { 46 | ... 47 | spark.sql.catalogImplementation = "hive" 48 | ... 49 | } 50 | 51 | source { 52 | hive { 53 | pre_sql = "select * from mydb.mytb" 54 | result_table_name = "myTable" 55 | } 56 | } 57 | 58 | ... 59 | ``` 60 | 61 | ### Notes 62 | 必须保证hive的metastore是在服务状态。启动命令 `hive --service metastore` 服务的默认端口的`9083` 63 | cluster、client、local模式下必须把hive-site.xml置于提交任务节点的$HADOOP_CONF目录下(或者放在$SPARK_HOME/conf下面),IDE本地调试将其放在resources目录 64 | 65 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/source-plugins/Jdbc.md: -------------------------------------------------------------------------------- 1 | ## Source plugin : JDBC [Flink] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 通过jdbc的方式读取数据 9 | 10 | ### Options 11 | | name | type | required | default value | 12 | | --- | --- | --- | --- | 13 | | [driver](#driver-string) | string | yes | - | 14 | | [url](#url-string) | string | yes | - | 15 | | [username](#username-string) | string | yes | - | 16 | | [password](#password-string) | string | no | - | 17 | | [query](#query-string) | string | yes | - | 18 | | [fetch_size](#fetch_size-int) | int | no | - | 19 | | [common-options](#common-options-string)| string | no | - | 20 | 21 | ##### driver [string] 22 | 23 | 驱动名,如`com.mysql.jdbc.Driver` 24 | 25 | ##### url [string] 26 | 27 | JDBC连接的URL。如:`jdbc:mysql://localhost:3306/test` 28 | 29 | ##### username [string] 30 | 31 | 用户名 32 | 33 | ##### password [string] 34 | 35 | 密码 36 | 37 | ##### query [string] 38 | 查询语句 39 | 40 | ##### fetch_size [int] 41 | 拉取数量 42 | 43 | ##### common options [string] 44 | 45 | `Source` 插件通用参数,详情参照 [Source Plugin](/zh-cn/v2/flink/configuration/source-plugins/) 46 | 47 | ### Examples 48 | 49 | ``` 50 | JdbcSource { 51 | driver = com.mysql.jdbc.Driver 52 | url = "jdbc:mysql://localhost/test" 53 | username = root 54 | query = "select * from test" 55 | } 56 | 57 | ``` 58 | -------------------------------------------------------------------------------- /en/configuration/input-plugins/Jdbc.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : Jdbc 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.2 6 | 7 | ### Description 8 | 9 | Read data from an external data source via JDBC. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [driver](#driver-string) | string | yes | - | 16 | | [password](#password-string) | string | yes | - | 17 | | [table](#table-string) | string | yes | - | 18 | | [table_name](#table_name-string) | string | yes | - | 19 | | [url](#url-string) | string | yes | - | 20 | | [user](#user-string) | string | yes | - | 21 | 22 | ##### driver [string] 23 | 24 | Class name of jdbc driver. 25 | 26 | ##### password [string] 27 | 28 | Password. 29 | 30 | 31 | ##### table [string] 32 | 33 | Table name. 34 | 35 | 36 | ##### table_name [string] 37 | 38 | Registered table name of input data. 39 | 40 | 41 | ##### url [string] 42 | 43 | The url of JDBC. For example: `jdbc:postgresql://localhost/test` 44 | 45 | 46 | ##### user [string] 47 | 48 | Username. 49 | 50 | 51 | ### Example 52 | 53 | ``` 54 | jdbc { 55 | driver = "com.mysql.jdbc.Driver" 56 | url = "jdbc:mysql://localhost:3306/info" 57 | table = "access" 58 | table_name = "access_log" 59 | user = "username" 60 | password = "password" 61 | } 62 | ``` 63 | 64 | > Read data from MySQL with jdbc. 65 | -------------------------------------------------------------------------------- /en/configuration/output-plugins/Kafka.md: -------------------------------------------------------------------------------- 1 | ## Output plugin : Kafka 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Write Rows to a Kafka topic. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | engine | 14 | | --- | --- | --- | --- | --- | 15 | | [producer.bootstrap.servers](#producerbootstrapservers-string) | string | yes | - | all streaming | 16 | | [topic](#topic-string) | string | yes | - | all streaming | 17 | | [producer.*](#producer-string) | string | no | - | all streaming | 18 | 19 | ##### producer.bootstrap.servers [string] 20 | 21 | Kafka Brokers List 22 | 23 | ##### topic [string] 24 | 25 | Kafka Topic 26 | 27 | ##### producer [string] 28 | 29 | In addition to the above parameters that must be specified for the producer client, you can also specify multiple kafka's producer parameters described in [producerconfigs](http://kafka.apache.org/10/documentation.html#producerconfigs) 30 | 31 | The way to specify parameters is to use the prefix "producer" before the parameter. For example, `request.timeout.ms` is specified as: `producer.request.timeout.ms = 60000`.If you do not specify these parameters, it will be set the default values according to Kafka documentation 32 | 33 | 34 | ### Examples 35 | 36 | ``` 37 | kafka { 38 | topic = "seatunnel" 39 | producer.bootstrap.servers = "localhost:9092" 40 | } 41 | ``` 42 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Replace.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Replace 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 将指定字段内容根据正则表达式进行替换 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [pattern](#pattern-string) | string | yes | - | 16 | | [replacement](#replacement-string) | string | yes | - | 17 | | [source_field](#source_field-string) | string | no | raw_message | 18 | | [target_field](#target_field-string) | string | no | replaced | 19 | | [common-options](#common-options-string)| string | no | - | 20 | 21 | 22 | ##### pattern [string] 23 | 24 | 用于做匹配的正则表达式。常见的书写方式如 `"[a-zA-Z0-9_-]+"`, 详见[Regex Pattern](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html)。 25 | 也可以到这里测试正则表达式是正确:[Regex 101](https://regex101.com/) 26 | 27 | ##### replacement [string] 28 | 29 | 替换的字符串 30 | 31 | ##### source_field [string] 32 | 33 | 源字段,若不配置默认为`raw_message` 34 | 35 | ##### target_field [string] 36 | 37 | 目标字段,若不配置默认为`replaced` 38 | 39 | ##### common options [string] 40 | 41 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 42 | 43 | 44 | ### Examples 45 | 46 | ``` 47 | replace { 48 | target_field = "tmp" 49 | source_field = "message" 50 | pattern = "is" 51 | replacement = "are" 52 | } 53 | ``` 54 | > 将`message`中的**is**替换为**are**,并赋值给`tmp` 55 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/MySQL.md: -------------------------------------------------------------------------------- 1 | ## Output plugin : Mysql 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 输出数据到MySQL 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [password](#password-string) | string | yes | - | 16 | | [save_mode](#save_mode-string) | string | no | append | 17 | | [table](#table-string) | string | yes | - | 18 | | [url](#url-string) | string | yes | - | 19 | | [user](#user-string) | string | yes | - | 20 | | [common-options](#common-options-string)| string | no | - | 21 | 22 | 23 | ##### password [string] 24 | 25 | 密码 26 | 27 | ##### save_mode [string] 28 | 29 | 存储模式,当前支持overwrite,append,ignore以及error。每个模式具体含义见[save-modes](http://spark.apache.org/docs/2.2.0/sql-programming-guide.html#save-modes) 30 | 31 | ##### table [string] 32 | 33 | 表名 34 | 35 | ##### url [string] 36 | 37 | JDBC连接的URL。参考一个案例:`jdbc:mysql://localhose:3306/info` 38 | 39 | 40 | ##### user [string] 41 | 42 | 用户名 43 | 44 | ##### common options [string] 45 | 46 | `Output` 插件通用参数,详情参照 [Output Plugin](/zh-cn/v1/configuration/output-plugin) 47 | 48 | 49 | ### Example 50 | 51 | ``` 52 | mysql { 53 | url = "jdbc:mysql://localhost:3306/info" 54 | table = "access" 55 | user = "username" 56 | password = "password" 57 | save_mode = "append" 58 | } 59 | ``` 60 | 61 | > 将数据写入MySQL 62 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/source-plugins/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v2/) 2 | 3 | - [v2.x与v1.x的区别](/zh-cn/v2/diff_v1_v2) 4 | 5 | - [行业应用案例](/zh-cn/case_study/) 6 | 7 | - [seatunnel v2.x For Flink](/zh-cn/v2/flink/) 8 | 9 | - [快速开始](/zh-cn/v2/flink/quick-start) 10 | 11 | - [下载、安装](/zh-cn/v2/flink/installation) 12 | 13 | - [命令使用说明](/zh-cn/v2/flink/commands/) 14 | 15 | - [配置文件](/zh-cn/v2/flink/configuration/) 16 | - [通用配置](/zh-cn/v2/flink/configuration/) 17 | 18 | - [Source 插件配置](/zh-cn/v2/flink/configuration/source-plugins/) 19 | 20 | - [Fake](/zh-cn/v2/flink/configuration/source-plugins/Fake.md) 21 | 22 | - [Socket](/zh-cn/v2/flink/configuration/source-plugins/Socket.md) 23 | 24 | - [File](/zh-cn/v2/flink/configuration/source-plugins/File.md) 25 | 26 | - [JDBC](/zh-cn/v2/flink/configuration/source-plugins/Jdbc.md) 27 | 28 | - [Kafka](/zh-cn/v2/flink/configuration/source-plugins/Kafka.md) 29 | 30 | - [Transform 插件配置](/zh-cn/v2/flink/configuration/transform-plugins/) 31 | 32 | - [Sink 插件配置](/zh-cn/v2/flink/configuration/sink-plugins/) 33 | 34 | - [完整配置文件案例](/zh-cn/v2/flink/configuration/ConfigExamples.md) 35 | 36 | - [部署与运行](/zh-cn/v2/flink/deployment) 37 | 38 | - [插件开发](/zh-cn/v2/flink/developing-plugin) 39 | 40 | - [seatunnel v2.x For Spark](/zh-cn/v2/spark/) 41 | 42 | - [深入seatunnel](/zh-cn/v2/internal.md) 43 | 44 | - [Roadmap](/zh-cn/v2/roadmap.md) 45 | 46 | - [贡献代码](/zh-cn/v2/contribution.md) 47 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/sink-plugins/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v2/) 2 | 3 | - [v2.x与v1.x的区别](/zh-cn/v2/diff_v1_v2) 4 | 5 | - [行业应用案例](/zh-cn/case_study/) 6 | 7 | - [seatunnel v2.x For Flink](/zh-cn/v2/flink/) 8 | 9 | - [快速开始](/zh-cn/v2/flink/quick-start) 10 | 11 | - [下载、安装](/zh-cn/v2/flink/installation) 12 | 13 | - [命令使用说明](/zh-cn/v2/flink/commands/) 14 | 15 | - [配置文件](/zh-cn/v2/flink/configuration/) 16 | - [通用配置](/zh-cn/v2/flink/configuration/) 17 | 18 | - [Source 插件配置](/zh-cn/v2/flink/configuration/source-plugins/) 19 | 20 | - [Transform 插件配置](/zh-cn/v2/flink/configuration/transform-plugins/) 21 | 22 | - [Sink 插件配置](/zh-cn/v2/flink/configuration/sink-plugins/) 23 | 24 | - [Console](/zh-cn/v2/flink/configuration/sink-plugins/Console.md) 25 | 26 | - [Elasticsearch](/zh-cn/v2/flink/configuration/sink-plugins/Elasticsearch.md) 27 | 28 | - [File](/zh-cn/v2/flink/configuration/sink-plugins/File.md) 29 | 30 | - [JDBC](/zh-cn/v2/flink/configuration/sink-plugins/Jdbc.md) 31 | 32 | - [Kafka](/zh-cn/v2/flink/configuration/sink-plugins/Kafka.md) 33 | 34 | - [完整配置文件案例](/zh-cn/v2/flink/configuration/ConfigExamples.md) 35 | 36 | - [部署与运行](/zh-cn/v2/flink/deployment) 37 | 38 | - [插件开发](/zh-cn/v2/flink/developing-plugin) 39 | 40 | - [seatunnel v2.x For Spark](/zh-cn/v2/spark/) 41 | 42 | - [深入seatunnel](/zh-cn/v2/internal.md) 43 | 44 | - [Roadmap](/zh-cn/v2/roadmap.md) 45 | 46 | - [贡献代码](/zh-cn/v2/contribution.md) 47 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/transform-plugins/Sql.md: -------------------------------------------------------------------------------- 1 | ## Transform plugin : SQL [Spark] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | 使用SQL处理数据,支持Spark丰富的[UDF函数](http://spark.apache.org/docs/latest/api/sql/) 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [sql](#sql-string) | string | yes | - | 16 | | [common-options](#common-options-string)| string | no | - | 17 | 18 | ##### sql [string] 19 | 20 | SQL语句,SQL中使用的表名为 `Source` 或 `Transform` 插件中配置的 `result_table_name` 21 | 22 | ##### common options [string] 23 | 24 | `Transform` 插件通用参数,详情参照 [Transform Plugin](/zh-cn/v2/spark/configuration/transform-plugins/) 25 | 26 | 27 | ### Examples 28 | 29 | ``` 30 | sql { 31 | sql = "select username, address from user_info", 32 | } 33 | ``` 34 | 35 | > 使用SQL插件用于字段删减,仅保留 `username` 和 `address` 字段,将丢弃其余字段。`user_info` 为之前插件配置的 `result_table_name` 36 | 37 | ``` 38 | sql { 39 | sql = "select substring(telephone, 0, 10) from user_info", 40 | } 41 | ``` 42 | 43 | > 使用SQL插件用于数据处理,使用[substring functions](http://spark.apache.org/docs/latest/api/sql/#substring)对 `telephone` 字段进行截取操作 44 | 45 | ``` 46 | sql { 47 | sql = "select avg(age) from user_info", 48 | table_name = "user_info" 49 | } 50 | ``` 51 | 52 | > 使用SQL插件用于数据聚合,使用[avg functions](http://spark.apache.org/docs/latest/api/sql/#avg)对原始数据集进行聚合操作,取出 `age` 字段平均值 53 | 54 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/Tidb.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : TiDB 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.5 6 | 7 | ### Description 8 | 9 | 通过[TiSpark](https://github.com/pingcap/tispark)从[TiDB](https://github.com/pingcap/tidb)数据库中读取数据,当前仅仅支持Spark 2.1 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [database](#database-string) | string | yes | - | 16 | | [pre_sql](#pre_sql-string) | string | yes | - | 17 | | [common-options](#common-options-string)| string | yes | - | 18 | 19 | ##### database [string] 20 | 21 | TiDB库名 22 | 23 | ##### pre_sql [string] 24 | 25 | 进行预处理的sql, 如果不需要预处理,可以使用select * from tidb_db.tidb_table 26 | 27 | ##### common options [string] 28 | 29 | `Input` 插件通用参数,详情参照 [Input Plugin](/zh-cn/v1/configuration/input-plugin) 30 | 31 | 32 | ### Example 33 | 34 | 35 | 使用TiDB Input必须在`spark-defaults.conf`或者seatunnel配置文件中配置`spark.tispark.pd.addresses`和`spark.sql.extensions`。 36 | 37 | 一个seatunnel读取TiDB数据的配置文件如下: 38 | 39 | ``` 40 | spark { 41 | ... 42 | spark.tispark.pd.addresses = "localhost:2379" 43 | spark.sql.extensions = "org.apache.spark.sql.TiExtensions" 44 | } 45 | 46 | input { 47 | tidb { 48 | database = "test" 49 | pre_sql = "select * from test.my_table" 50 | result_table_name = "myTable" 51 | } 52 | } 53 | 54 | filter { 55 | ... 56 | } 57 | 58 | output { 59 | ... 60 | } 61 | ``` 62 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/ConfigExamples.md: -------------------------------------------------------------------------------- 1 | ## 完整配置文件案例 [Flink] 2 | 3 | 4 | 一个示例如下: 5 | 6 | >配置中, 以#开头的行为注释。 7 | 8 | ``` 9 | ###### 10 | ###### This config file is a demonstration of streaming processing in seatunnel config 11 | ###### 12 | 13 | env { 14 | # You can set flink configuration here 15 | execution.parallelism = 1 16 | #execution.checkpoint.interval = 10000 17 | #execution.checkpoint.data-uri = "hdfs://localhost:9000/checkpoint" 18 | } 19 | 20 | source { 21 | # This is a example source plugin **only for test and demonstrate the feature source plugin** 22 | FakeSourceStream { 23 | result_table_name = "fake" 24 | field_name = "name,age" 25 | } 26 | 27 | # If you would like to get more information about how to configure seatunnel and see full list of source plugins, 28 | # please go to https://interestinglab.github.io/seatunnel-docs/#/zh-cn/configuration/base 29 | } 30 | 31 | transform { 32 | sql { 33 | sql = "select name,age from fake" 34 | } 35 | 36 | # If you would like to get more information about how to configure seatunnel and see full list of transform plugins, 37 | # please go to https://interestinglab.github.io/seatunnel-docs/#/zh-cn/configuration/base 38 | } 39 | 40 | sink { 41 | ConsoleSink {} 42 | 43 | 44 | # If you would like to get more information about how to configure seatunnel and see full list of sink plugins, 45 | # please go to https://interestinglab.github.io/seatunnel-docs/#/zh-cn/configuration/base 46 | } 47 | ``` 48 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Replace.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Replace 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Replaces field contents based on regular expression. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [pattern](#pattern-string) | string | yes | - | 16 | | [replacement](#replacement-string) | string | yes | - | 17 | | [source_field](#source_field-string) | string | no | raw_message | 18 | | [target_field](#target_field-string) | string | no | replaced | 19 | 20 | ##### pattern [string] 21 | 22 | regular expression, such as [a-z0-9], \w, \d 23 | 24 | Regular expression used for matching string, such as `"[a-zA-Z0-9_-]+"`.Please see [Regex Pattern](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html) for details. 25 | 26 | You can also go to [Regex 101](https://regex101.com/) to test your regex interactively. 27 | 28 | ##### replacement [string] 29 | 30 | String replacement. 31 | 32 | ##### source_field [string] 33 | 34 | Source field, default is `raw_message`. 35 | 36 | ##### target_field [string] 37 | 38 | New field name, default is `replaced`. 39 | 40 | ### Examples 41 | 42 | ``` 43 | replace { 44 | target_field = "tmp" 45 | source_field = "message" 46 | pattern = "\w+" 47 | replacement = "are" 48 | } 49 | ``` 50 | 51 | > Replace **\w+** in `message` with **are** and set it to `tmp` column. 52 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/transform-plugins/README.md: -------------------------------------------------------------------------------- 1 | ## Transform(数据转换) 插件的配置 2 | 3 | ### Transform插件通用参数 4 | | name | type | required | default value | 5 | | --- | --- | --- | --- | 6 | | [source_table_name](#source_table_name-string) | string | no | - | 7 | | [result_table_name](#result_table_name-string) | string | no | - | 8 | | [field_name](#field_name-string) | string | no | - | 9 | 10 | 11 | ##### source_table_name [string] 12 | 13 | 不指定 `source_table_name` 时,当前插件处理的就是配置文件中上一个插件输出的数据集(dataStream/dataset); 14 | 15 | 指定 `source_table_name` 的时候,当前插件处理的就是此参数对应的数据集。 16 | 17 | ##### result_table_name [string] 18 | 19 | 不指定 `result_table_name时` ,此插件处理后的数据,不会被注册为一个可供其他插件直接访问的数据集(dataStream/dataset),或者被称为临时表(table); 20 | 21 | 指定 `result_table_name` 时,此插件处理后的数据,会被注册为一个可供其他插件直接访问的数据集(dataStream/dataset),或者被称为临时表(table)。此处注册的数据集(dataStream/dataset),其他插件可通过指定 `source_table_name` 来直接访问。 22 | 23 | 24 | #### field_name [string] 25 | 26 | 当从上级插件获取到数据时,可以指定获取到字段的名称,方便在后续的sql插件使用。 27 | 28 | ### 使用样例 29 | 30 | ``` 31 | source { 32 | FakeSourceStream { 33 | result_table_name = "fake_1" 34 | field_name = "name,age" 35 | } 36 | FakeSourceStream { 37 | result_table_name = "fake_2" 38 | field_name = "name,age" 39 | } 40 | } 41 | 42 | transform { 43 | sql { 44 | source_table_name = "fake_1" 45 | sql = "select name from fake_1" 46 | result_table_name = "fake_name" 47 | } 48 | } 49 | ``` 50 | 51 | > 如果不指定`source_table_name`的话,sql插件处理的就是 `fake_2`的数据,设置了为`fake_1`将处理`fake_1`的数据 52 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Sql.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Sql 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 使用SQL处理数据,支持Spark丰富的[UDF函数](http://spark.apache.org/docs/latest/api/sql/) 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [sql](#sql-string) | string | yes | - | 16 | | [table_name](#table_name-string) | string | no | - | 17 | | [common-options](#common-options-string)| string | no | - | 18 | 19 | 20 | ##### sql [string] 21 | 22 | SQL语句,SQL中使用的表名为 `Input` 或 `Filter` 插件中配置的 `result_table_name` 23 | 24 | ##### table_name [string] 25 | 26 | **\[从v1.4开始废弃\]**,后续 Release 版本中将删除此参数 27 | 28 | ##### common options [string] 29 | 30 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 31 | 32 | 33 | ### Examples 34 | 35 | ``` 36 | sql { 37 | sql = "select username, address from user_info", 38 | } 39 | ``` 40 | 41 | > 仅保留`username`和`address`字段,将丢弃其余字段。`user_info` 为之前插件配置的 `result_table_name` 42 | 43 | ``` 44 | sql { 45 | sql = "select substring(telephone, 0, 10) from user_info", 46 | } 47 | ``` 48 | 49 | > 使用[substring functions](http://spark.apache.org/docs/latest/api/sql/#substring)对`telephone`字段进行截取操作 50 | 51 | ``` 52 | sql { 53 | sql = "select avg(age) from user_info", 54 | table_name = "user_info" 55 | } 56 | ``` 57 | 58 | > 使用[avg functions](http://spark.apache.org/docs/latest/api/sql/#avg)对原始数据集进行聚合操作,取出`age`平均值 59 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Sql.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Sql 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Processing Rows using SQL, feel free to use [Spark UDF](http://spark.apache.org/docs/latest/api/sql/). 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [sql](#sql-string) | string | yes | - | 16 | | [table_name](#table_name-string) | string | yes | - | 17 | 18 | ##### sql [string] 19 | 20 | SQL content. 21 | 22 | ##### table_name [string] 23 | 24 | When `table` set, the current batch of events will be registered as a table, named by this `table` setting, on which you can execute sql. 25 | 26 | ### Examples 27 | 28 | ``` 29 | sql { 30 | sql = "select username, address from user_info", 31 | table_name = "user_info" 32 | } 33 | ``` 34 | 35 | > Select the `username` and `address` fields, the remaining fields will be removed. 36 | 37 | ``` 38 | sql { 39 | sql = "select substring(telephone, 0, 10) from user_info", 40 | table_name = "user_info" 41 | } 42 | ``` 43 | 44 | > Use the [substring function](http://spark.apache.org/docs/latest/api/sql/#substring) to retrieve a substring on the `telephone` field. 45 | 46 | 47 | ``` 48 | sql { 49 | sql = "select avg(age) from user_info", 50 | table_name = "user_info" 51 | } 52 | ``` 53 | 54 | > Get the aggregation of the average of `age` using the [avg functions](http://spark.apache.org/docs/latest/api/sql/#avg). 55 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/sink-plugins/Jdbc.md: -------------------------------------------------------------------------------- 1 | ## Sink plugin : JDBC [Flink] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 通过jdbc方式写入数据 9 | 10 | ### Options 11 | | name | type | required | default value | 12 | | --- | --- | --- | --- | 13 | | [driver](#driver-string) | string | yes | - | 14 | | [url](#url-string) | string | yes | - | 15 | | [username](#username-string) | string | yes | - | 16 | | [password](#password-string) | string | no | - | 17 | | [query](#query-string) | string | yes | - | 18 | | [batch_size](#batch_size-int) | int | no | - | 19 | | [source_table_name](#source_table_name-string) | string | yes | - | 20 | | [common-options](#common-options-string)| string | no | - | 21 | 22 | ##### driver [string] 23 | 24 | 驱动名,如`com.mysql.jdbc.Driver` 25 | 26 | ##### url [string] 27 | 28 | JDBC连接的URL。如:`jdbc:mysql://localhost:3306/test` 29 | 30 | ##### username [string] 31 | 32 | 用户名 33 | 34 | ##### password [string] 35 | 36 | 密码 37 | 38 | ##### query [string] 39 | 插入语句 40 | 41 | ##### batch_size [int] 42 | 每批写入数量 43 | 44 | ##### common options [string] 45 | 46 | `Sink` 插件通用参数,详情参照 [Sink Plugin](/zh-cn/v2/flink/configuration/sink-plugins/) 47 | 48 | ### Examples 49 | ``` 50 | JdbcSink { 51 | source_table_name = fake 52 | driver = com.mysql.jdbc.Driver 53 | url = "jdbc:mysql://localhost/test" 54 | username = root 55 | query = "insert into test(name,age) values(?,?)" 56 | batch_size = 2 57 | } 58 | ``` 59 | -------------------------------------------------------------------------------- /en/configuration/output-plugins/Jdbc.md: -------------------------------------------------------------------------------- 1 | ## Output plugin : Jdbc 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Write Rows to an external data source via JDBC. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [driver](#driver-string) | string | yes | - | 16 | | [password](#password-string) | string | yes | - | 17 | | [save_mode](#save_mode-string) | string | no | append | 18 | | [table](#table-string) | string | yes | - | 19 | | [url](#url-string) | string | yes | - | 20 | | [user](#user-string) | string | yes | - | 21 | 22 | ##### driver [string] 23 | 24 | Class name of jdbc driver. 25 | 26 | ##### password [string] 27 | 28 | Password. 29 | 30 | ##### save_mode [string] 31 | 32 | Save mode, supports `overwrite`, `append`, `ignore` and `error`. The detail of save_mode see [save-modes](http://spark.apache.org/docs/2.2.0/sql-programming-guide.html#save-modes). 33 | 34 | ##### table [string] 35 | 36 | Table name. 37 | 38 | ##### url [string] 39 | 40 | The url of JDBC. For example: `jdbc:postgresql://localhost/test` 41 | 42 | 43 | ##### user [string] 44 | 45 | Username. 46 | 47 | 48 | ### Example 49 | 50 | ``` 51 | jdbc { 52 | driver = "com.mysql.jdbc.Driver" 53 | url = "jdbc:mysql://localhost:3306/info" 54 | table = "access" 55 | user = "username" 56 | password = "password" 57 | save_mode = "append" 58 | } 59 | ``` 60 | 61 | > write data to mysql with jdbc output. 62 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/Redis.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : Redis 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://github.com/InterestingLab/seatunnel 5 | * Version: 1.1.0 6 | 7 | ### Description 8 | 9 | 从Redis中读取数据. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [host](#host-string) | string | yes | - | 16 | | [port](#port-int) | int | no | 6379 | 17 | | [key_pattern](#key_pattern-string) | string | yes | - | 18 | | [partition](#partition-int) | int | no | 3 | 19 | | [db_num](#db_num-int) | int | no | 0 | 20 | | [auth](#auth-string) | string | no | - | 21 | | [common-options](#common-options-string)| string | yes | - | 22 | 23 | 24 | ##### host [string] 25 | 26 | Redis服务器地址 27 | 28 | ##### port [int] 29 | 30 | Redis服务端口, 默认6379 31 | 32 | ##### key_pattern [string] 33 | 34 | Redis Key, 支持模糊匹配 35 | 36 | ##### partition [int] 37 | 38 | Redis分片数量. 默认为3 39 | 40 | ##### db_num [int] 41 | 42 | Redis数据库索引标识. 默认连接到db0. 43 | 44 | ##### auth [string] 45 | 46 | redis 鉴权密码 47 | 48 | ##### common options [string] 49 | 50 | `Input` 插件通用参数,详情参照 [Input Plugin](/zh-cn/v1/configuration/input-plugin) 51 | 52 | 53 | ### Example 54 | 55 | ``` 56 | Redis { 57 | host = "192.168.1.100" 58 | port = 6379 59 | key_pattern = "*keys*" 60 | partition = 20 61 | db_num = 2 62 | result_table_name = "reids_result_table" 63 | } 64 | ``` 65 | 66 | > 返回的table中为一个两个字段均为string的数据表 67 | 68 | | raw_key | raw_message | 69 | | --- | --- | 70 | | [keys](#keys) | xxx | 71 | | [my_keys](#my_keys) | xxx | 72 | | [keys_mine](#keys_mine) | xxx | 73 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Watermark.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Watermark 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.3.0 6 | 7 | ### Description 8 | 9 | Spark Structured Streaming Watermark 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [time_field](#time_field-string) | string | yes | - | 16 | | [time_type](#time_type-string) | string | no | UNIX | 17 | | [time_pattern](#time_pattern-string) | string | no | yyyy-MM-dd HH:mm:ss | 18 | | [delay_threshold](#delay_threshold-string) | string | yes | - | 19 | | [watermark_field](#watermark_field-string) | string | yes | - | 20 | | [common-options](#common-options-string)| string | no | - | 21 | 22 | 23 | ##### time_field [string] 24 | 25 | 日志中的事件时间字段 26 | 27 | ##### time_type [string] 28 | 29 | 日志中的事件时间字段的类型,支持三种类型 `UNIX_MS|UNIX|string`,UNIX_MS为13位的时间戳,UNIX为10位的时间戳,string为字符串类型的时间,如2019-04-08 22:10:23 30 | 31 | ##### time_pattern [string] 32 | 33 | 当你的`time_type`选择为string时,你可以指定这个参数来进行时间字符串的匹配,默认匹配格式为yyyy-MM-dd HH:mm:ss 34 | 35 | ##### delay_threshold [string] 36 | 37 | 等待数据到达的最小延迟。 38 | 39 | ##### watermark_field [string] 40 | 41 | 经过这个filter处理之后将会增加一个timestamp类型的字段,这个字段用于添加watermark 42 | 43 | ##### common options [string] 44 | 45 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 46 | 47 | 48 | ### Example 49 | 50 | ``` 51 | Watermark { 52 | delay_threshold = "5 minutes" 53 | time_field = "tf" 54 | time_type = "UNIX" 55 | watermark_field = "wm" 56 | } 57 | ``` 58 | -------------------------------------------------------------------------------- /en/configuration/input-plugins/Alluxio.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : Alluxio 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.5.0 6 | 7 | ### Description 8 | 9 | Read raw data from Alluxio. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [path](#path-string) | string | yes | - | 16 | 17 | ##### path [string] 18 | 19 | File path on Alluxio cluster. 20 | 21 | ### Note 22 | if use alluxio with zookeeper, please add below in start-seatunnel.sh 23 | 24 | ``` 25 | driverJavaOpts="-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.zookeeper.address=your.zookeeper.address:zookeeper.port -Dalluxio.zookeeper.enabled=true" 26 | executorJavaOpts="-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.zookeeper.address=your.zookeeper.address:zookeeper.port -Dalluxio.zookeeper.enabled=true" 27 | ``` 28 | 29 | or you can also add below in spark{} in seatunnel configuration after 1.5.0 30 | 31 | ``` 32 | spark.driverJavaOpts="-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.zookeeper.address=your.zookeeper.address:zookeeper.port -Dalluxio.zookeeper.enabled=true" 33 | spark.executorJavaOpts="-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.zookeeper.address=your.zookeeper.address:zookeeper.port -Dalluxio.zookeeper.enabled=true" 34 | ``` 35 | 36 | ### Example 37 | 38 | ``` 39 | alluxio { 40 | path = "alluxio:///access.log" 41 | } 42 | ``` 43 | 44 | or you can specify alluxio name service: 45 | 46 | ``` 47 | alluxio { 48 | path = "alluxio://m2:8022/access.log" 49 | } 50 | ``` 51 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/source-plugins/KafkaStream.md: -------------------------------------------------------------------------------- 1 | ## Source plugin : Kafka [Spark] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | 从Kafka消费数据,支持的Kafka版本 >= 0.10.0. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [topics](#topics-string) | string | yes | - | 16 | | [consumer.group.id](#consumergroupid-string) | string | yes | - | 17 | | [consumer.bootstrap.servers](#consumerbootstrapservers-string) | string | yes | - | 18 | | [consumer.*](#consumer-string) | string | no | - | 19 | | [common-options](#common-options-string)| string | yes | - | 20 | 21 | 22 | ##### topics [string] 23 | 24 | Kafka topic名称。如果有多个topic,用","分割,例如: "tpc1,tpc2" 25 | 26 | ##### consumer.group.id [string] 27 | 28 | Kafka consumer group id,用于区分不同的消费组 29 | 30 | ##### consumer.bootstrap.servers [string] 31 | 32 | Kafka集群地址,多个用","隔开 33 | 34 | ##### consumer.* [string] 35 | 36 | 除了以上必备的kafka consumer客户端必须指定的参数外,用户还可以指定多个consumer客户端非必须参数,覆盖了[kafka官方文档指定的所有consumer参数](http://kafka.apache.org/documentation.html#oldconsumerconfigs). 37 | 38 | 指定参数的方式是在原参数名称上加上前缀"consumer.",如指定`auto.offset.reset`的方式是: `consumer.auto.offset.reset = latest`。如果不指定这些非必须参数,它们将使用Kafka官方文档给出的默认值。 39 | 40 | ##### common options [string] 41 | 42 | `Source` 插件通用参数,详情参照 [Source Plugin](/zh-cn/v2/spark/configuration/source-plugins/) 43 | 44 | 45 | ### Examples 46 | 47 | ``` 48 | kafkaStream { 49 | topics = "seatunnel" 50 | consumer.bootstrap.servers = "localhost:9092" 51 | consumer.group.id = "seatunnel_group" 52 | } 53 | ``` 54 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/source-plugins/Elasticsearch.md: -------------------------------------------------------------------------------- 1 | ## Source plugin : Elasticsearch [Spark] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | 从 Elasticsearch 中读取数据 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [hosts](#hosts-array) | array | yes | - | 16 | | [index](#index-string) | string | yes | | 17 | | [es.*](#es-string) | string | no | | 18 | | [common-options](#common-options-string)| string | yes | - | 19 | 20 | 21 | ##### hosts [array] 22 | 23 | ElasticSearch 集群地址,格式为host:port,允许指定多个host。如 \["host1:9200", "host2:9200"]。 24 | 25 | 26 | ##### index [string] 27 | 28 | ElasticSearch index名称,支持 `*` 模糊匹配 29 | 30 | 31 | ##### es.* [string] 32 | 33 | 用户还可以指定多个非必须参数,详细的参数列表见[Elasticsearch支持的参数](https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html#cfg-mapping). 34 | 35 | 如指定 `es.read.metadata` 的方式是: `es.read.metadata = true`。如果不指定这些非必须参数,它们将使用官方文档给出的默认值。 36 | 37 | ##### common options [string] 38 | 39 | `Source` 插件通用参数,详情参照 [Source Plugin](/zh-cn/v2/spark/configuration/source-plugins/) 40 | 41 | 42 | ### Examples 43 | 44 | ``` 45 | elasticsearch { 46 | hosts = ["localhost:9200"] 47 | index = "seatunnel-20190424" 48 | result_table_name = "my_dataset" 49 | } 50 | ``` 51 | 52 | 53 | ``` 54 | elasticsearch { 55 | hosts = ["localhost:9200"] 56 | index = "seatunnel-*" 57 | es.read.field.include = "name, age" 58 | resulttable_name = "my_dataset" 59 | } 60 | ``` 61 | 62 | > 匹配所有以 `seatunnel-` 开头的索引, 并且仅仅读取 `name`和 `age` 两个字段。 63 | -------------------------------------------------------------------------------- /zh-cn/v2/contribution.md: -------------------------------------------------------------------------------- 1 | # 为 seatunnel v2.x 贡献代码 2 | 3 | ## Coding Style 4 | 5 | seatunnel v2.x的主要编程语言为:Java,包括流程代码,Flink插件;部分Spark插件仍然沿用v1.x的scala代码。 6 | 7 | * Java Coding Style 参考: 8 | 9 | Google Java Coding Style: https://google.github.io/styleguide/javaguide.html 10 | 11 | * Scala Coding Style 参考: 12 | 13 | http://docs.scala-lang.org/style/ 14 | 15 | https://github.com/databricks/scala-style-guide 16 | 17 | 使用sbt插件[scalastyle](http://www.scalastyle.org/)作为coding style检查工具;无法通过coding style检查的代码无法提交. 18 | 19 | 通过scalafmt利用[Cli或者IntelliJ Idea](http://scalameta.org/scalafmt/#IntelliJ)自动完成scala代码的格式化。 20 | 如果使用scalafmt的Idea插件,请在插件安装完后设置`文件保存时自动更正代码格式`,方法 "Preferences" -> "Tools" -> "Scalafmt", 勾选"format on file save" 21 | 22 | ## 项目代码编译运行 23 | 24 | seatunnel v2.x 放弃了sbt,改为社区用户期待已久的maven来做项目的管理,国内用户再也不用痛苦于sbt依赖下载很慢的问题了。 25 | 26 | ## 代码/文档贡献流程 27 | 28 | * Interesting Lab成员(内部协作流程): 29 | 30 | (1) 从 master上 checkout 出新分支,分支名称要求新功能: 31 | .fea.,修复bug: .fixbug., 文档:.doc. 32 | 33 | (2) 开发, 提交commit 34 | 35 | (3) 在github的项目主页,选中你的分支,点"new pull request",提交pull request 36 | 37 | (3) 经至少1个其他成员审核通过,并且travis-ci的build全部通过后,由审核人merge到master分支中. 38 | 39 | (4) 删除你的分支 40 | 41 | * 非Interesting Lab 成员(常见的github协作流程): 42 | 43 | (1) 在seatunnel主页 fork 这个项目 https://github.com/InterestingLab/seatunnel 44 | 45 | (2) 开发 46 | 47 | (3) 提交commit 48 | 49 | (4) 在你自己的项目主页上,点"new pull request",提交pull request 50 | 51 | (5) Interesting Lab 审核通过后,你的贡献将被纳入项目代码中。 52 | 53 | ## 自动化Build与Test 54 | 55 | 此项目使用 [travis-ci](https://travis-ci.org/) 作为自动化Build工具. 56 | 57 | 所有分支每次commit有更新,都会触发自动化Build,新的pull request也会触发。 58 | 59 | -------------------------------------------------------------------------------- /zh-cn/v1/deployment.md: -------------------------------------------------------------------------------- 1 | # 部署与运行 2 | 3 | > seatunnel 依赖Java运行环境和Spark,详细的seatunnel 安装步骤参考[安装seatunnel](/zh-cn/v1/installation) 4 | 5 | 下面重点说明不同平台的运行方式: 6 | 7 | ### 在本地以local方式运行seatunnel 8 | 9 | ``` 10 | ./bin/start-seatunnel.sh --master local[4] --deploy-mode client --config ./config/application.conf 11 | ``` 12 | 13 | ### 在Spark Standalone集群上运行seatunnel 14 | 15 | ``` 16 | # client 模式 17 | ./bin/start-seatunnel.sh --master spark://207.184.161.138:7077 --deploy-mode client --config ./config/application.conf 18 | 19 | # cluster 模式 20 | ./bin/start-seatunnel.sh --master spark://207.184.161.138:7077 --deploy-mode cluster --config ./config/application.conf 21 | ``` 22 | 23 | ### 在Yarn集群上运行seatunnel 24 | 25 | ``` 26 | # client 模式 27 | ./bin/start-seatunnel.sh --master yarn --deploy-mode client --config ./config/application.conf 28 | 29 | # cluster 模式 30 | ./bin/start-seatunnel.sh --master yarn --deploy-mode cluster --config ./config/application.conf 31 | ``` 32 | 33 | ### 在Mesos上运行seatunnel 34 | 35 | ``` 36 | # cluster 模式 37 | ./bin/start-seatunnel.sh --master mesos://207.184.161.138:7077 --deploy-mode cluster --config ./config/application.conf 38 | ``` 39 | 40 | --- 41 | 42 | start-seatunnel.sh 的`master`, `deploy-mode`参数的含义与Spark `master`, `deploy-mode`相同, 43 | 可参考: [Spark Submitting Applications](http://spark.apache.org/docs/latest/submitting-applications.html) 44 | 45 | 如果要指定seatunnel运行时占用的资源大小,或者其他Spark参数,可以在`--config`指定的配置文件里面指定: 46 | 47 | ``` 48 | spark { 49 | spark.executor.instances = 2 50 | spark.executor.cores = 1 51 | spark.executor.memory = "1g" 52 | ... 53 | } 54 | ... 55 | 56 | ``` 57 | 58 | 关于如何配置seatunnel, 请见[seatunnel 配置](/zh-cn/v1/configuration/base) 59 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/deployment.md: -------------------------------------------------------------------------------- 1 | # 部署与运行 2 | 3 | > seatunnel v2 For Spark 依赖Java运行环境和Spark,详细的seatunnel 安装步骤参考[安装seatunnel](/zh-cn/v2/spark/installation) 4 | 5 | 下面重点说明不同平台的运行方式: 6 | 7 | ### 在本地以 local 方式运行 seatunnel 8 | 9 | ``` 10 | ./bin/start-seatunnel-spark.sh --master local[4] --deploy-mode client --config ./config/application.conf 11 | ``` 12 | 13 | ### 在 Spark Standalone 集群上运行 seatunnel 14 | 15 | ``` 16 | # client 模式 17 | ./bin/start-seatunnel-spark.sh --master spark://207.184.161.138:7077 --deploy-mode client --config ./config/application.conf 18 | 19 | # cluster 模式 20 | ./bin/start-seatunnel-spark.sh --master spark://207.184.161.138:7077 --deploy-mode cluster --config ./config/application.conf 21 | ``` 22 | 23 | ### 在 Yarn 集群上运行 seatunnel 24 | 25 | ``` 26 | # client 模式 27 | ./bin/start-seatunnel-spark.sh --master yarn --deploy-mode client --config ./config/application.conf 28 | 29 | # cluster 模式 30 | ./bin/start-seatunnel-spark.sh --master yarn --deploy-mode cluster --config ./config/application.conf 31 | ``` 32 | 33 | ### 在 Mesos 上运行 seatunnel 34 | 35 | ``` 36 | # cluster 模式 37 | ./bin/start-seatunnel-spark.sh --master mesos://207.184.161.138:7077 --deploy-mode cluster --config ./config/application.conf 38 | ``` 39 | 40 | --- 41 | 42 | `start-seatunnel-spark.sh` 的 `master`, `deploy-mode` 参数的含义请参考: [命令使用说明](/zh-cn/v2/spark/commands/start-seatunnel-spark.sh) 43 | 44 | 如果要指定 seatunnel 运行时占用的资源大小,或者其他 Spark 参数,可以在 `--config` 指定的配置文件里面指定: 45 | 46 | ``` 47 | env { 48 | spark.executor.instances = 2 49 | spark.executor.cores = 1 50 | spark.executor.memory = "1g" 51 | ... 52 | } 53 | ... 54 | 55 | ``` 56 | 57 | 关于如何配置 seatunnel, 请见 [seatunnel 通用配置](/zh-cn/v2/spark/configuration/) 58 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Split.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Split 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 根据delimiter分割字符串。 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [delimiter](#delimiter-string) | string | no | " "(空格) | 16 | | [fields](#fields-array) | array | yes | - | 17 | | [source_field](#source_field-string) | string | no | raw_message | 18 | | [target_field](#target_field-string) | string | no | _root_ | 19 | | [common-options](#common-options-string)| string | no | - | 20 | 21 | 22 | ##### delimiter [string] 23 | 24 | 分隔符,根据分隔符对输入字符串进行分隔操作,默认分隔符为一个空格(" ")。 25 | 26 | ##### fields [list] 27 | 28 | 分割后的字段名称列表,按照顺序指定被分割后的各个字符串的字段名称。 29 | 若`fields`长度大于分隔结果长度,则多余字段赋值为空字符。 30 | 31 | ##### source_field [string] 32 | 33 | 被分割前的字符串来源字段,若不配置默认为`raw_message` 34 | 35 | ##### target_field [string] 36 | 37 | `target_field` 可以指定被分割后的多个字段被添加到Event的位置,若不配置默认为`_root_`,即将所有分割后的字段,添加到Event最顶级。 38 | 如果指定了特定的字段,则被分割后的字段将被添加到这个字段的下面一级。 39 | 40 | ##### common options [string] 41 | 42 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 43 | 44 | 45 | ### Examples 46 | 47 | ``` 48 | split { 49 | source_field = "message" 50 | delimiter = "&" 51 | fields = ["field1", "field2"] 52 | } 53 | ``` 54 | 55 | > 将源数据中的`message`字段根据**&**进行分割,可以以`field1`或`field2`为key获取相应value 56 | 57 | ``` 58 | split { 59 | source_field = "message" 60 | target_field = "info" 61 | delimiter = "," 62 | fields = ["field1", "field2"] 63 | } 64 | ``` 65 | 66 | > 将源数据中的`message`字段根据**,**进行分割,分割后的字段为`info`,可以以`info.field1`或`info.field2`为key获取相应value 67 | -------------------------------------------------------------------------------- /zh-cn/v2/diff_v1_v2.md: -------------------------------------------------------------------------------- 1 | ## seatunnel v2.x 与 v1.x 的区别是什么? 2 | 3 | #### v1.x VS v2.x 4 | 5 | | - | v1.x | v2.x | 6 | |---|---|---| 7 | | 支持Spark | Yes | Yes | 8 | | 开发Spark插件 | Yes | Yes | 9 | | 支持Flink | No | Yes | 10 | | 开发Flink插件 | No | Yes | 11 | | 支持的seatunnel运行模式 | local, Spark Standalone Cluster, on Yarn, on k8s | local, Spark/Flink Standalone Cluster, on Yarn, on k8s | 12 | | 支持SQL计算 | Yes | Yes | 13 | | 配置文件动态变量替换 | Yes | Yes | 14 | | 项目代码编译方式 | sbt(下载依赖很困难,我们正式放弃sbt) | maven | 15 | | 主要编程语言 | scala | java | 16 | 17 | 18 | 备注: 19 | 1. seatunnel v1.x 与 v2.x 还有一个很大的区别,就是配置文件中,input改名为source, filter改名为transform, output改名为sink,如下: 20 | 21 | ``` 22 | # v1.x 的配置文件: 23 | input {} 24 | filter {} 25 | output {} 26 | ``` 27 | 28 | ``` 29 | # v2.x 的配置文件: 30 | source {} # input -> source 31 | transform {} # filter -> transform 32 | sink {} # output -> sink 33 | ``` 34 | 35 | 36 | #### 为什么InterestingLab团队要研发seatunnel v2.x ? 37 | 38 | 在2017年的夏天,InterestingLab 团队为了大幅提升海量、分布式数据计算程序的开发效率和运行稳定性,开源了支持Spark流式和离线批计算的seatunnel v1.x。 39 | 直到2019年的冬天,这两年的时间里,seatunnel逐渐被国内多个一二线互联网公司以及众多的规模较小的创业公司应用到生产环境,持续为其产生价值和收益。 40 | 在Github上,目前此项目的Star + Fork 数也超过了[1000+](https://github.com/InterestingLab/seatunnel/),它的能力和价值得到了充分的认可在。 41 | 42 | InterestingLab 坚信,只有真正为用户产生价值的开源项目,才是好的开源项目,这与那些为了彰显自身技术实力,疯狂堆砌功能和高端技术的开源项目不同,它们很少考虑用户真正需要的是什么。 43 | 然后,时代是在进步的,InterestingLab也有深深的危机感,无法停留在当前的成绩上不前进。 44 | 45 | 在2019年的夏天,InterestingLab 做出了一个重要的决策 —— 在seatunnel上尽快支持Flink,让Flink的用户也能够用上seatunnel,感受到它带来的实实在在的便利。 46 | 终于,在2020年的春节前夕,InterestingLab 正式对外开放了seatunnel v2.x,一个同时支持Spark(Spark >= 2.2)和Flink(Flink >=1.9)的版本,希望它能帮助到国内庞大的Flink社区用户。 47 | 48 | 在此特此感谢,Facebook Presto项目,Presto项目是一个非常优秀的开源OLAP查询引擎,提供了丰富的插件化能力。 49 | seatunnel项目正式学习了它的插件化体系架构之后,在Spark和Flink上研发出的一套插件化体系架构,为Spark和Flink计算程序的插件化开发插上了翅膀。 50 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/sink-plugins/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v2/) 2 | 3 | - [v2.x与v1.x的区别](/zh-cn/v2/diff_v1_v2) 4 | 5 | - [行业应用案例](/zh-cn/case_study/) 6 | 7 | - [seatunnel v2.x For Flink](/zh-cn/v2/flink/) 8 | 9 | - [seatunnel v2.x For Spark](/zh-cn/v2/spark/) 10 | 11 | - [快速开始](/zh-cn/v2/spark/quick-start) 12 | 13 | - [下载、安装](/zh-cn/v2/spark/installation) 14 | 15 | - [命令使用说明](/zh-cn/v2/spark/commands/) 16 | 17 | - [配置文件](/zh-cn/v2/spark/configuration/) 18 | 19 | - [通用配置](/zh-cn/v2/spark/configuration/) 20 | 21 | - [Source 插件配置](/zh-cn/v2/spark/configuration/source-plugins/) 22 | 23 | - [Transform 插件配置](/zh-cn/v2/spark/configuration/transform-plugins/) 24 | 25 | - [Sink 插件配置](/zh-cn/v2/spark/configuration/sink-plugins/) 26 | 27 | - [Clickhouse](/zh-cn/v2/spark/configuration/sink-plugins/Clickhouse.md) 28 | 29 | - [Console](/zh-cn/v2/spark/configuration/sink-plugins/Console.md) 30 | 31 | - [Elasticsearch](/zh-cn/v2/spark/configuration/sink-plugins/Elasticsearch.md) 32 | 33 | - [File](/zh-cn/v2/spark/configuration/sink-plugins/File.md) 34 | 35 | - [HBase](/zh-cn/v2/spark/configuration/sink-plugins/Hbase.md) 36 | 37 | - [Hdfs](/zh-cn/v2/spark/configuration/sink-plugins/Hdfs.md) 38 | 39 | - [MySQL](/zh-cn/v2/spark/configuration/sink-plugins/Mysql.md) 40 | 41 | - [Phoenix](/zh-cn/v2/spark/configuration/sink-plugins/Phoenix.md) 42 | 43 | - [完整配置文件案例](/zh-cn/v2/spark/configuration/ConfigExamples.md) 44 | 45 | - [部署与运行](/zh-cn/v2/spark/deployment) 46 | 47 | - [插件开发](/zh-cn/v2/spark/developing-plugin) 48 | 49 | - [深入seatunnel](/zh-cn/v2/internal.md) 50 | 51 | - [Roadmap](/zh-cn/v2/roadmap.md) 52 | 53 | - [贡献代码](/zh-cn/v2/contribution.md) 54 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/source-plugins/_sidebar.md: -------------------------------------------------------------------------------- 1 | - [介绍](/zh-cn/v2/) 2 | 3 | - [v2.x与v1.x的区别](/zh-cn/v2/diff_v1_v2) 4 | 5 | - [行业应用案例](/zh-cn/case_study/) 6 | 7 | - [seatunnel v2.x For Flink](/zh-cn/v2/flink/) 8 | 9 | - [seatunnel v2.x For Spark](/zh-cn/v2/spark/) 10 | 11 | - [快速开始](/zh-cn/v2/spark/quick-start) 12 | 13 | - [下载、安装](/zh-cn/v2/spark/installation) 14 | 15 | - [命令使用说明](/zh-cn/v2/spark/commands/) 16 | 17 | - [配置文件](/zh-cn/v2/spark/configuration/) 18 | 19 | - [通用配置](/zh-cn/v2/spark/configuration/) 20 | 21 | - [Source 插件配置](/zh-cn/v2/spark/configuration/source-plugins/) 22 | 23 | - [Elasticsearch](/zh-cn/v2/spark/configuration/source-plugins/Elasticsearch.md) 24 | 25 | - [Fake](/zh-cn/v2/spark/configuration/source-plugins/Fake.md) 26 | 27 | - [FakeStream](/zh-cn/v2/spark/configuration/source-plugins/FakeStream.md) 28 | 29 | - [Hive](/zh-cn/v2/spark/configuration/source-plugins/Hive.md) 30 | 31 | - [JDBC](/zh-cn/v2/spark/configuration/source-plugins/Jdbc.md) 32 | 33 | - [KafkaStream](/zh-cn/v2/spark/configuration/source-plugins/KafkaStream.md) 34 | 35 | - [SocketStream](/zh-cn/v2/spark/configuration/source-plugins/SocketStream.md) 36 | 37 | - [Phoenix](/zh-cn/v2/spark/configuration/source-plugins/Phoenix.md) 38 | 39 | - [Transform 插件配置](/zh-cn/v2/spark/configuration/transform-plugins/) 40 | 41 | - [Sink 插件配置](/zh-cn/v2/spark/configuration/sink-plugins/) 42 | 43 | - [完整配置文件案例](/zh-cn/v2/spark/configuration/ConfigExamples.md) 44 | 45 | - [部署与运行](/zh-cn/v2/spark/deployment) 46 | 47 | - [插件开发](/zh-cn/v2/spark/developing-plugin) 48 | 49 | - [深入seatunnel](/zh-cn/v2/internal.md) 50 | 51 | - [Roadmap](/zh-cn/v2/roadmap.md) 52 | 53 | - [贡献代码](/zh-cn/v2/contribution.md) 54 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/transform-plugins/Split.md: -------------------------------------------------------------------------------- 1 | ## Transform plugin : Split [Spark] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | 根据delimiter分割字符串 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [delimiter](#delimiter-string) | string | no | " "(空格) | 16 | | [fields](#fields-array) | array | yes | - | 17 | | [source_field](#source_field-string) | string | no | raw_message | 18 | | [target_field](#target_field-string) | string | no | _root_ | 19 | | [common-options](#common-options-string)| string | no | - | 20 | 21 | ##### delimiter [string] 22 | 23 | 分隔符,根据分隔符对输入字符串进行分隔操作,默认分隔符为一个空格(" ")。 24 | 25 | ##### fields [list] 26 | 27 | 分割后的字段名称列表,按照顺序指定被分割后的各个字符串的字段名称。 28 | 若`fields`长度大于分隔结果长度,则多余字段赋值为空字符。 29 | 30 | ##### source_field [string] 31 | 32 | 被分割前的字符串来源字段,若不配置默认为`raw_message` 33 | 34 | ##### target_field [string] 35 | 36 | `target_field` 可以指定被分割后的多个字段被添加到Event的位置,若不配置默认为`_root_`,即将所有分割后的字段,添加到Event最顶级。 37 | 如果指定了特定的字段,则被分割后的字段将被添加到这个字段的下面一级。 38 | 39 | ##### common options [string] 40 | 41 | `Transform` 插件通用参数,详情参照 [Transform Plugin](/zh-cn/v2/spark/configuration/transform-plugins/) 42 | 43 | 44 | ### Examples 45 | 46 | ``` 47 | split { 48 | source_field = "message" 49 | delimiter = "&" 50 | fields = ["field1", "field2"] 51 | } 52 | ``` 53 | 54 | > 将源数据中的 `message` 字段根据**&**进行分割,可以以 `field1` 或 `field2` 为key获取相应value 55 | 56 | ``` 57 | split { 58 | source_field = "message" 59 | target_field = "info" 60 | delimiter = "," 61 | fields = ["field1", "field2"] 62 | } 63 | ``` 64 | 65 | > 将源数据中的 `message` 字段根据**,**进行分割,分割后的字段为 `info`,可以以 `info.field1 `或 `info.field2` 为key获取相应value 66 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/File.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : File [Static] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.1 6 | 7 | ### Description 8 | 9 | 从本地文件中读取原始数据。 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [format](#format-string) | string | no | json | 16 | | [options.*](#options-object) | object | no | - | 17 | | [options.rowTag](#optionsrowTag-string) | string | no | - | 18 | | [path](#path-string) | string | yes | - | 19 | | [common-options](#common-options-string)| string | yes | - | 20 | 21 | ##### format [string] 22 | 23 | 文件的格式,目前支持`csv`、`json`、`parquet` 、`xml`、`orc`和 `text`. 24 | 25 | 26 | ##### options.* [object] 27 | 28 | 自定义参数,当 `format` 为 **xml** 时必须设置 `optionss.rowTag`,配置XML格式数据的Tag,其他参数不是必填参数。 29 | 30 | 31 | ##### options.rowTag [string] 32 | 33 | 当format为xml必须设置`optionss.rowTag`,配置XML格式数据的Tag 34 | 35 | 36 | ##### path [string] 37 | 38 | 文件路径,以file://开头 39 | 40 | 41 | ##### common options [string] 42 | 43 | `Input` 插件通用参数,详情参照 [Input Plugin](/zh-cn/v1/configuration/input-plugin) 44 | 45 | 46 | ### Example 47 | 48 | ``` 49 | file { 50 | path = "file:///var/log/access.log" 51 | result_table_name = "accesslog" 52 | format = "text" 53 | } 54 | ``` 55 | 56 | 读取XML格式文件 57 | 58 | ``` 59 | file { 60 | path = "file:///data0/src/books.xml" 61 | options.rowTag = "book" 62 | format = "xml" 63 | result_table_name = "books" 64 | } 65 | ``` 66 | 67 | 读取CSV格式文件 68 | 69 | ``` 70 | file { 71 | path = "file:///data0/src/books.csv" 72 | format = "csv" 73 | # 将第一列的header作为列名 74 | # 否则将以 _c0,_c1,_c2...依次命名 75 | options.header = "true" 76 | result_table_name = "books" 77 | } 78 | ``` 79 | 80 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/source-plugins/File.md: -------------------------------------------------------------------------------- 1 | ## Source plugin : File [Flink] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 从文件系统中读取数据 9 | 10 | ### Options 11 | | name | type | required | default value | 12 | | --- | --- | --- | --- | 13 | | [format.type](#format-string) | string | yes | - | 14 | | [path](#path-string) | string | yes | - | 15 | | [schema](#schema-string)| string | yes | - | 16 | | [common-options](#common-options-string)| string | no | - | 17 | 18 | ##### format.type [string] 19 | 20 | 从文件系统中读取文件的格式,目前支持`csv`、`json`、`parquet` 、`orc`和 `text`。 21 | 22 | ##### path [string] 23 | 24 | 需要文件路径,hdfs文件以hdfs://开头,本地文件以file://开头。 25 | 26 | ##### schema [string] 27 | 28 | - csv 29 | - csv的schema是一个jsonArray的字符串,如`"[{\"type\":\"long\"},{\"type\":\"string\"}]"`,这个只能指定字段的类型,不能指定字段名,一般还要配合公共配置参数`field_name`。 30 | - json 31 | - json的schema参数是提供一个原数据的json字符串,可以自动生成schema,但是需要提供内容最全的原数据,否则会有字段丢失。 32 | - parquet 33 | - parquet的schema是一个Avro schema的字符串,如`{\"type\":\"record\",\"name\":\"test\",\"fields\":[{\"name\":\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"string\"}]}`。 34 | - orc 35 | - orc的schema是orc schema的字符串,如`"struct>>"`。 36 | - text 37 | - text的schema填为string即可。 38 | 39 | 40 | ##### common options [string] 41 | 42 | `Source` 插件通用参数,详情参照 [Source Plugin](/zh-cn/v2/flink/configuration/source-plugins/) 43 | 44 | ### Examples 45 | 46 | ``` 47 | FileSource{ 48 | path = "hdfs://localhost:9000/input/" 49 | source_format = "json" 50 | schema = "{\"data\":[{\"a\":1,\"b\":2},{\"a\":3,\"b\":4}],\"db\":\"string\",\"q\":{\"s\":\"string\"}}" 51 | result_table_name = "test" 52 | } 53 | ``` 54 | -------------------------------------------------------------------------------- /zh-cn/v2/flink/configuration/sink-plugins/Elasticsearch.md: -------------------------------------------------------------------------------- 1 | ## Sink plugin : Elasticsearch [Flink] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | 输出数据到 ElasticSearch 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [hosts](#hosts-array) | array | yes | - | 16 | | [index_type](#index_type-string) | string | no | log | 17 | | [index_time_format](#index_time_format-string) | string | no | yyyy.MM.dd | 18 | | [index](#index-string) | string | no | seatunnel | 19 | | [common-options](#common-options-string)| string | no | - | 20 | 21 | 22 | ##### hosts [array] 23 | 24 | Elasticsearch集群地址,格式为host:port,允许指定多个host。如["host1:9200", "host2:9200"]。 25 | 26 | ##### index_type [string] 27 | 28 | Elasticsearch index type 29 | 30 | ##### index_time_format [string] 31 | 32 | 当`index`参数中的格式为`xxxx-${now}`时,`index_time_format`可以指定index名称的时间格式,默认值为 `yyyy.MM.dd`。常用的时间格式列举如下: 33 | 34 | | Symbol | Description | 35 | | --- | --- | 36 | | y | Year | 37 | | M | Month | 38 | | d | Day of month | 39 | | H | Hour in day (0-23) | 40 | | m | Minute in hour | 41 | | s | Second in minute | 42 | 43 | 详细的时间格式语法见[Java SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html)。 44 | 45 | 46 | ##### index [string] 47 | 48 | Elasticsearch index名称,如果需要根据时间生成index,可以指定时间变量,如:`seatunnel-${now}`。`now`代表当前数据处理的时间。 49 | 50 | ##### common options [string] 51 | 52 | `Sink` 插件通用参数,详情参照 [Sink Plugin](/zh-cn/v2/flink/configuration/sink-plugins/) 53 | 54 | 55 | ### Examples 56 | 57 | ``` 58 | elasticsearch { 59 | hosts = ["localhost:9200"] 60 | index = "seatunnel" 61 | } 62 | ``` 63 | 64 | > 将结果写入Elasticsearch集群的名称为 seatunnel 的索引中 65 | -------------------------------------------------------------------------------- /en/configuration/filter-plugins/Split.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Split 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Splits String using delimiter. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [delimiter](#delimiter-string) | string | no | " "(Space) | 16 | | [fields](#fields-list) | list | yes | - | 17 | | [source_field](#source_field-string) | string | no | _root_ | 18 | | [target_field](#target_field-string) | string | no | raw_message | 19 | 20 | ##### delimiter [string] 21 | 22 | The string to split on. Default is a whitespace. 23 | 24 | 25 | ##### fields [list] 26 | 27 | The corresponding field names of splited fields. Order of field names is important. 28 | 29 | If the length of `fields` is greater than the length of splited fields, the extra fields will be set to empty string. 30 | 31 | ##### source_field [string] 32 | 33 | Source field, default is `raw_message`. 34 | 35 | ##### target_field [string] 36 | 37 | New field name, default is `__root__`, and the result of `Split` will be added on the top level of Rows. 38 | 39 | If you specify `target_field`, the result of 'Split' will be added under the top level of Rows. 40 | 41 | ### Examples 42 | 43 | ``` 44 | split { 45 | source_field = "message" 46 | delimiter = "&" 47 | fields = ["field1", "field2"] 48 | } 49 | ``` 50 | 51 | > The string of `message` is split by **&**, and the results set to `field1` and `field2`. 52 | 53 | ``` 54 | split { 55 | source_field = "message" 56 | target_field = "info" 57 | delimiter = "," 58 | fields = ["field1", "field2"] 59 | } 60 | ``` 61 | 62 | > The string of `message` is split by **&**, and the results set to `info.field1` and `info.field2` 63 | 64 | -------------------------------------------------------------------------------- /en/configuration/input-plugins/RedisStream.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : RedisStream [Streaming] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.0 6 | 7 | ### Description 8 | 9 | Read data from Redis. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [host](#host-string) | string | yes | - | 16 | | [prefKey](#prefKey-string) | string | yes | - | 17 | | [queue](#queue-string) | string | yes | - | 18 | | [password](#password-string) | string | no | - | 19 | | [maxTotal](#maxTotal-number) | number | no | 200 | 20 | | [maxIdle](#maxIdle-number) | number | no | 200 | 21 | | [maxWaitMillis](#maxWaitMillis-number) | number | no | 2000 | 22 | | [connectionTimeout](#connectionTimeout-number) | number | no | 5000 | 23 | | [soTimeout](#soTimeout-number) | number | no | 5000 | 24 | | [maxAttempts](#maxAttempts-number) | number | no | 5 | 25 | 26 | ##### host [string] 27 | 28 | redis cluster server host 29 | 30 | ##### prefKey [string] 31 | 32 | redis key prefix , Splicing mode: prefKey + ':' + key 33 | 34 | ##### queue [string] 35 | 36 | redis queue name , Data stored to queue 37 | 38 | ##### password [string] 39 | 40 | redis password 41 | 42 | ##### maxTotal [number] 43 | 44 | redis maxTotal config 45 | 46 | ##### maxIdle [number] 47 | 48 | redis maxIdle config 49 | 50 | ##### maxWaitMillis [number] 51 | 52 | redis maxWaitMillis config 53 | 54 | ##### connectionTimeout [number] 55 | 56 | redis connectionTimeout config 57 | 58 | ##### soTimeout [number] 59 | 60 | redis soTimeout config 61 | 62 | ##### maxAttempts [number] 63 | 64 | redis maxAttempts config 65 | 66 | ### Example 67 | 68 | ``` 69 | RedisStream { 70 | host = "127.0.0.1:7000,127.0.0.1:7001,127.0.0.1:7002" 71 | prefKey = "api" 72 | queue = "test" 73 | password = "root" 74 | } 75 | ``` 76 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/File.md: -------------------------------------------------------------------------------- 1 | ## Output plugin : File 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 输出数据到文件 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [options](#options-object) | object | no | - | 16 | | [partition_by](#partition_by-array) | array | no | - | 17 | | [path](#path-string) | string | yes | - | 18 | | [path_time_format](#path_time_format-string) | string | no | yyyyMMddHHmmss | 19 | | [save_mode](#save_mode-string) | string | no | error | 20 | | [format](#format-string) | string | no | json | 21 | | [common-options](#common-options-string)| string | no | - | 22 | 23 | 24 | ##### options [object] 25 | 26 | 自定义参数 27 | 28 | ##### partition_by [array] 29 | 30 | 根据所选字段对数据进行分区 31 | 32 | ##### path [string] 33 | 34 | 输出文件路径,以file://开头 35 | 36 | ##### path_time_format [string] 37 | 38 | 当`path`参数中的格式为`xxxx-${now}`时,`path_time_format`可以指定路径的时间格式,默认值为 `yyyy.MM.dd`。常用的时间格式列举如下: 39 | 40 | | Symbol | Description | 41 | | --- | --- | 42 | | y | Year | 43 | | M | Month | 44 | | d | Day of month | 45 | | H | Hour in day (0-23) | 46 | | m | Minute in hour | 47 | | s | Second in minute | 48 | 49 | 详细的时间格式语法见[Java SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html)。 50 | 51 | ##### save_mode [string] 52 | 53 | 存储模式,当前支持overwrite,append,ignore以及error。每个模式具体含义见[save-modes](http://spark.apache.org/docs/2.2.0/sql-programming-guide.html#save-modes) 54 | 55 | ##### format [string] 56 | 57 | 序列化方法,当前支持csv、json、parquet、orc和text 58 | 59 | ##### common options [string] 60 | 61 | `Output` 插件通用参数,详情参照 [Output Plugin](/zh-cn/v1/configuration/output-plugin) 62 | 63 | 64 | ### Example 65 | 66 | ``` 67 | file { 68 | path = "file:///var/logs" 69 | format = "text" 70 | } 71 | ``` 72 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/S3.md: -------------------------------------------------------------------------------- 1 | ## Output plugin : S3 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 输出数据到S3文件 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [options](#options-object) | object | no | - | 16 | | [partition_by](#partition_by-array) | array | no | - | 17 | | [path](#path-string) | string | yes | - | 18 | | [path_time_format](#path_time_format-string) | string | no | yyyyMMddHHmmss | 19 | | [save_mode](#save_mode-string) | string | no | error | 20 | | [format](#format-string) | string | no | json | 21 | | [common-options](#common-options-string)| string | no | - | 22 | 23 | ##### options [object] 24 | 25 | 自定义参数 26 | 27 | ##### partition_by [array] 28 | 29 | 根据所选字段对数据进行分区 30 | 31 | ##### path [string] 32 | 33 | AWS S3文件路径,以s3://,s3a://或s3n://开头 34 | 35 | ##### path_time_format [string] 36 | 37 | 当`path`参数中的格式为`xxxx-${now}`时,`path_time_format`可以指定路径的的时间格式,默认值为 `yyyy.MM.dd`。常用的时间格式列举如下: 38 | 39 | | Symbol | Description | 40 | | --- | --- | 41 | | y | Year | 42 | | M | Month | 43 | | d | Day of month | 44 | | H | Hour in day (0-23) | 45 | | m | Minute in hour | 46 | | s | Second in minute | 47 | 48 | 详细的时间格式语法见[Java SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html)。 49 | 50 | ##### save_mode [string] 51 | 52 | 存储模式,当前支持overwrite,append,ignore以及error。每个模式具体含义见[save-modes](http://spark.apache.org/docs/2.2.0/sql-programming-guide.html#save-modes) 53 | 54 | ##### format [string] 55 | 56 | 序列化方法,当前支持csv、json、parquet和text 57 | 58 | ##### common options [string] 59 | 60 | `Output` 插件通用参数,详情参照 [Output Plugin](/zh-cn/v1/configuration/output-plugin) 61 | 62 | ### Example 63 | 64 | ``` 65 | s3 { 66 | path = "s3a://var/logs" 67 | format = "parquet" 68 | } 69 | ``` 70 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Table.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Table 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Table 用于将静态文件映射为一张表,可与实时处理的流进行关联,常用于用户昵称,国家省市等字典表关联 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [cache](#cache-boolean) | boolean | no | true | 16 | | [delimiter](#delimiter-string) | string | no | , | 17 | | [field_types](#field_types-array) | array | no | - | 18 | | [fields](#fields-array) | array | yes | - | 19 | | [path](#path-string) | string | yes | - | 20 | | [table_name](#table_name-string) | string | yes | - | 21 | | [common-options](#common-options-string)| string | no | - | 22 | 23 | 24 | ##### cache [boolean] 25 | 26 | 是否内存中缓存文件内容,true表示缓存,false表示每次需要时重新加载 27 | 28 | ##### delimiter [string] 29 | 30 | 文件中列与列之间的分隔符 31 | 32 | ##### field_types [array] 33 | 34 | 每个列的类型,顺序与个数必须与`fields`参数一一对应, 不指定此参数,默认所有列的类型为字符串; 支持的数据类型包括:boolean, double, long, string 35 | 36 | ##### fields [array] 37 | 38 | 文件中,每行中各个列的名称,按照数据中实际列顺序提供 39 | 40 | ##### path [string] 41 | 42 | Hadoop支持的文件路径(默认hdfs路径, 如/path/to/file), 如本地文件:file:///path/to/file, hdfs:///path/to/file, s3:///path/to/file ... 43 | 44 | ##### table_name [string] 45 | 46 | 将文件载入后将注册为一张表,这里指定的是表名称,可用于在SQL中直接与流处理数据关联 47 | 48 | ##### common options [string] 49 | 50 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 51 | 52 | 53 | ### Example 54 | 55 | > 不指定列的类型,默认为string 56 | 57 | ``` 58 | table { 59 | table_name = "mydict" 60 | path = "/user/seatunnel/mylog/a.txt" 61 | fields = ['city', 'population'] 62 | } 63 | ``` 64 | 65 | > 指定列的类型 66 | 67 | ``` 68 | table { 69 | table_name = "mydict" 70 | path = "/user/seatunnel/mylog/a.txt" 71 | fields = ['city', 'population'] 72 | field_types = ['string', 'long'] 73 | } 74 | ``` 75 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/sink-plugins/File.md: -------------------------------------------------------------------------------- 1 | ## Sink plugin : File [Spark] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | 输出数据到文件 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [options](#options-object) | object | no | - | 16 | | [partition_by](#partition_by-array) | array | no | - | 17 | | [path](#path-string) | string | yes | - | 18 | | [path_time_format](#path_time_format-string) | string | no | yyyyMMddHHmmss | 19 | | [save_mode](#save_mode-string) | string | no | error | 20 | | [serializer](#serializer-string) | string | no | json | 21 | | [common-options](#common-options-string)| string | no | - | 22 | 23 | ##### options [object] 24 | 25 | 自定义参数 26 | 27 | ##### partition_by [array] 28 | 29 | 根据所选字段对数据进行分区 30 | 31 | ##### path [string] 32 | 33 | 输出文件路径,以 **file://** 开头 34 | 35 | ##### path_time_format [string] 36 | 37 | 当`path`参数中的格式为`xxxx-${now}`时,`path_time_format`可以指定路径的时间格式,默认值为 `yyyy.MM.dd`。常用的时间格式列举如下: 38 | 39 | | Symbol | Description | 40 | | --- | --- | 41 | | y | Year | 42 | | M | Month | 43 | | d | Day of month | 44 | | H | Hour in day (0-23) | 45 | | m | Minute in hour | 46 | | s | Second in minute | 47 | 48 | 详细的时间格式语法见[Java SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html)。 49 | 50 | ##### save_mode [string] 51 | 52 | 存储模式,当前支持overwrite,append,ignore以及error。每个模式具体含义见[save-modes](http://spark.apache.org/docs/2.2.0/sql-programming-guide.html#save-modes) 53 | 54 | ##### serializer [string] 55 | 56 | 序列化方法,当前支持csv、json、parquet、orc和text 57 | 58 | ##### common options [string] 59 | 60 | `Sink` 插件通用参数,详情参照 [Sink Plugin](/zh-cn/v2/spark/configuration/sink-plugins/) 61 | 62 | 63 | ### Example 64 | 65 | ``` 66 | file { 67 | path = "file:///var/logs" 68 | serializer = "text" 69 | } 70 | ``` 71 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/Tidb.md: -------------------------------------------------------------------------------- 1 | ## Output plugin : TiDB 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.5 6 | 7 | ### Description 8 | 9 | 通过JDBC将数据写入[TiDB](https://github.com/pingcap/tidb) 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [batchsize](#batchsize-number) | number | no | 150 | 16 | | [isolationLevel](#isolationLevel-string) | string | no | NONE | 17 | | [password](#password-string) | string | yes | - | 18 | | [save_mode](#save_mode-string) | string | no | append | 19 | | [table](#table-string) | string | yes | - | 20 | | [url](#url-string) | string | yes | - | 21 | | [user](#user-string) | string | yes | - | 22 | | [useSSL](#useSSL-boolean) | boolean | no | false | 23 | | [common-options](#common-options-string)| string | no | - | 24 | 25 | ##### batchsize [number] 26 | 27 | 写入批次大小 28 | 29 | ##### isolationLevel [string] 30 | 31 | Isolation level means whether do the resolve lock for the underlying tidb clusters. 32 | 33 | ##### password [string] 34 | 35 | 密码 36 | 37 | ##### save_mode [string] 38 | 39 | 存储模式,当前支持overwrite,append,ignore以及error。每个模式具体含义见[save-modes](http://spark.apache.org/docs/2.2.0/sql-programming-guide.html#save-modes) 40 | 41 | ##### table [string] 42 | 43 | 表名 44 | 45 | ##### url [string] 46 | 47 | JDBC连接的URL。参考一个案例: `jdbc:mysql://127.0.0.1:4000/test?rewriteBatchedStatements=true` 48 | 49 | 50 | ##### user [string] 51 | 52 | 用户名 53 | 54 | ##### useSSL [boolean] 55 | 56 | useSSL 57 | 58 | ##### common options [string] 59 | 60 | `Output` 插件通用参数,详情参照 [Output Plugin](/zh-cn/v1/configuration/output-plugin) 61 | 62 | 63 | ### Example 64 | 65 | ``` 66 | tidb { 67 | url = "jdbc:mysql://127.0.0.1:4000/test?useUnicode=true&characterEncoding=utf8" 68 | table = "access" 69 | user = "username" 70 | password = "password" 71 | save_mode = "append" 72 | } 73 | ``` 74 | -------------------------------------------------------------------------------- /en/configuration/output-plugins/File.md: -------------------------------------------------------------------------------- 1 | ## Output plugin : File 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Write Rows to local file system. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [options](#options-object) | object | no | - | 16 | | [partition_by](#partition_by-array) | array | no | - | 17 | | [path](#path-string) | string | yes | - | 18 | | [path_time_format](#path_time_format-string) | string | no | yyyyMMddHHmmss | 19 | | [save_mode](#save_mode-string) | string | no | error | 20 | | [format](#format-string) | string | no | json | 21 | 22 | ##### options [object] 23 | 24 | Custom parameters. 25 | 26 | ##### partition_by [array] 27 | 28 | Partition the data based on the fields. 29 | 30 | ##### path [string] 31 | 32 | Output File path. Start with `file://`. 33 | 34 | ##### path_time_format [string] 35 | 36 | If `path` contains time variable, such as `xxxx-${now}`, `path_time_format` can be used to specify the format of path, default is `yyyy.MM.dd`. The commonly used time formats are listed below: 37 | 38 | 39 | | Symbol | Description | 40 | | --- | --- | 41 | | y | Year | 42 | | M | Month | 43 | | d | Day of month | 44 | | H | Hour in day (0-23) | 45 | | m | Minute in hour | 46 | | s | Second in minute | 47 | 48 | The detailed time format syntax:[Java SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html). 49 | 50 | ##### save_mode [string] 51 | 52 | Save mode, supports `overwrite`, `append`, `ignore` and `error`. The detail of save_mode see [save-modes](http://spark.apache.org/docs/2.2.0/sql-programming-guide.html#save-modes). 53 | 54 | ##### format [string] 55 | 56 | Output format, supports `csv`, `json`, `parquet` and `text`. 57 | 58 | 59 | ### Example 60 | 61 | ``` 62 | file { 63 | path = "file:///var/logs" 64 | format = "text" 65 | } 66 | ``` 67 | -------------------------------------------------------------------------------- /en/configuration/output-plugins/S3.md: -------------------------------------------------------------------------------- 1 | ## Output plugin : S3 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Write Rows to AWS S3 storage. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [options](#options-object) | object | no | - | 16 | | [partition_by](#partition_by-array) | array | no | - | 17 | | [path](#path-string) | string | yes | - | 18 | | [path_time_format](#path_time_format-string) | string | no | yyyyMMddHHmmss | 19 | | [save_mode](#save_mode-string) | string | no | error | 20 | | [format](#format-string) | string | no | json | 21 | 22 | ##### options [object] 23 | 24 | Custom parameters. 25 | 26 | ##### partition_by [array] 27 | 28 | Partition the data based on the fields. 29 | 30 | ##### path [string] 31 | 32 | File path on AWS S3 storage. Start with `s3://`, `s3a://` or `s3n://`. 33 | 34 | ##### path_time_format [string] 35 | 36 | If `path` contains time variables, such as `xxxx-${now}`, `path_time_format` can be used to specify the format of s3 path, default is `yyyy.MM.dd`. The commonly used time formats are listed below: 37 | 38 | 39 | | Symbol | Description | 40 | | --- | --- | 41 | | y | Year | 42 | | M | Month | 43 | | d | Day of month | 44 | | H | Hour in day (0-23) | 45 | | m | Minute in hour | 46 | | s | Second in minute | 47 | 48 | The detailed time format syntax:[Java SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html). 49 | 50 | ##### save_mode [string] 51 | 52 | Save mode, supports `overwrite`, `append`, `ignore` and `error`. The detail of save_mode see [save-modes](http://spark.apache.org/docs/2.2.0/sql-programming-guide.html#save-modes). 53 | 54 | ##### format [string] 55 | 56 | Output format, supports `csv`, `json`, `parquet` and `text`. 57 | 58 | 59 | ### Example 60 | 61 | ``` 62 | s3 { 63 | path = "s3a://var/logs" 64 | format = "parquet" 65 | } 66 | ``` 67 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/filter-plugins/Script.md: -------------------------------------------------------------------------------- 1 | ## Filter plugin : Script 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.1 6 | 7 | ### Description 8 | 9 | 解析并执行自定义脚本中逻辑, 即接受`object_name`(默认是event) 指定的JSONObject, 10 | 完成自定义的处理逻辑,再返回一个新的event. 11 | 12 | 脚本解析引擎的实现,采用的是[QLExpress](https://github.com/alibaba/QLExpress), 13 | 具体语法可参考[QLExpress 语法](https://github.com/alibaba/QLExpress#%E4%B8%89%E8%AF%AD%E6%B3%95%E4%BB%8B%E7%BB%8D). 14 | 15 | ### Options 16 | 17 | | name | type | required | default value | 18 | | --- | --- | --- | --- | 19 | | [object_name](#object_name-string) | string | no | events | 20 | | [script_name](#script_name-string) | string | yes | - | 21 | | [errorList](#errorList-boolean) | boolean | no | false | 22 | | [isCache](#isCache-boolean) | boolean | no | false | 23 | | [isTrace](#isTrace-boolean) | boolean | no | false | 24 | | [isPrecise](#isPrecise-boolean) | boolean | no | false | 25 | | [common-options](#common-options-string)| string | no | - | 26 | 27 | 28 | ##### object_name [string] 29 | 30 | 脚本内置JSONObject的引用名, 不设置默认为'event' 31 | 32 | ##### script_name [string] 33 | 34 | 需要执行脚本的文件名称, 注意脚本文件必须放到`plugins/script/files`目录下面. 35 | 36 | ##### errorList [boolean] 37 | 38 | 输出的错误信息List 39 | 40 | ##### isCache [boolean] 41 | 42 | 是否使用Cache中的指令集 43 | 44 | ##### isTrace [boolean] 45 | 46 | 是否输出所有的跟踪信息,同时还需要log级别是DEBUG级 47 | 48 | ##### isPrecise [boolean] 49 | 50 | 是否需要高精度的计算 51 | 52 | ##### common options [string] 53 | 54 | `Filter` 插件通用参数,详情参照 [Filter Plugin](/zh-cn/v1/configuration/filter-plugin) 55 | 56 | 57 | ### Examples 58 | 59 | * conf文件插件配置 60 | 61 | ``` 62 | script { 63 | script_name = "my_script.ql" 64 | } 65 | ``` 66 | 67 | * 自定义脚本(my_script.ql) 68 | 69 | ``` 70 | newEvent = new java.util.HashMap(); 71 | you = event.getString("name"); 72 | age = event.getLong("age"); 73 | if(age > 10){ 74 | newEvent.put("name",you); 75 | } 76 | return newEvent; 77 | ``` 78 | 79 | > 如果age大于10,则获取name放入map中并返回 80 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/Elasticsearch.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : Elasticsearch [Static] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.3.2 6 | 7 | ### Description 8 | 9 | 从 Elasticsearch 中读取数据 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [hosts](#hosts-array) | array | yes | - | 16 | | [index](#index-string) | string | yes | | 17 | | [es](#es-string) | string | no | | 18 | | [common-options](#common-options-string)| string | yes | - | 19 | 20 | 21 | ##### hosts [array] 22 | 23 | ElasticSearch 集群地址,格式为host:port,允许指定多个host。如 \["host1:9200", "host2:9200"]。 24 | 25 | 26 | ##### index [string] 27 | 28 | ElasticSearch index名称,支持 `*` 模糊匹配 29 | 30 | 31 | ##### es.* [string] 32 | 33 | 用户还可以指定多个非必须参数,详细的参数列表见[Elasticsearch支持的参数](https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html#cfg-mapping). 34 | 35 | 如指定 `es.read.metadata` 的方式是: `es.read.metadata = true`。如果不指定这些非必须参数,它们将使用官方文档给出的默认值。 36 | 37 | ### Tips 38 | 39 | 在使用 ElasticSearch插件时,可以配置参数 `es.input.max.docs.per.partition`,用以最大化 seatunnel 读取 es 的效率,该参数用于决定任务的分区个数: 40 | 41 | > 分区数 = 总数据条数 / es.input.max.docs.per.partition 42 | 43 | 通过增大任务分区数以支持更高的并发能力,根据实践优化这个参数的设置,读取ElasticSearch的效率可以提升3-10倍。 44 | 45 | 46 | 如上所述 `es.input.max.docs.per.partition`,支持用户自行根据实际的数据量进行调整,否则分区数为 ElasticSearch 索引 Shard 的个数。 47 | 48 | ##### common options [string] 49 | 50 | `Input` 插件通用参数,详情参照 [Input Plugin](/zh-cn/v1/configuration/input-plugin) 51 | 52 | 53 | ### Examples 54 | 55 | ``` 56 | elasticsearch { 57 | hosts = ["localhost:9200"] 58 | index = "seatunnel-20190424" 59 | result_table_name = "my_dataset" 60 | } 61 | ``` 62 | 63 | 64 | ``` 65 | elasticsearch { 66 | hosts = ["localhost:9200"] 67 | index = "seatunnel-*" 68 | es.read.field.include = "name, age" 69 | result_table_name = "my_dataset" 70 | } 71 | ``` 72 | 73 | > 匹配所有以 `seatunnel-` 开头的索引, 并且仅仅读取 `name`和 `age` 两个字段。 74 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/Hdfs.md: -------------------------------------------------------------------------------- 1 | ## Output plugin : Hdfs 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 输出数据到HDFS文件 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [options](#options-object) | object | no | - | 16 | | [partition_by](#partition_by-array) | array | no | - | 17 | | [path](#path-string) | string | yes | - | 18 | | [path_time_format](#path_time_format-string) | string | no | yyyyMMddHHmmss | 19 | | [save_mode](#save_mode-string) | string | no | error | 20 | | [format](#format-string) | string | no | json | 21 | | [common-options](#common-options-string)| string | no | - | 22 | 23 | 24 | ##### options [object] 25 | 26 | 自定义参数 27 | 28 | ##### partition_by [array] 29 | 30 | 根据所选字段对数据进行分区 31 | 32 | ##### path [string] 33 | 34 | Hadoop集群文件路径,以hdfs://开头 35 | 36 | ##### path_time_format [string] 37 | 38 | 当`path`参数中的格式为`xxxx-${now}`时,`path_time_format`可以指定HDFS路径的时间格式,默认值为 `yyyy.MM.dd`。常用的时间格式列举如下: 39 | 40 | | Symbol | Description | 41 | | --- | --- | 42 | | y | Year | 43 | | M | Month | 44 | | d | Day of month | 45 | | H | Hour in day (0-23) | 46 | | m | Minute in hour | 47 | | s | Second in minute | 48 | 49 | 详细的时间格式语法见[Java SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html)。 50 | 51 | ##### save_mode [string] 52 | 53 | 存储模式,当前支持overwrite,append,ignore以及error。每个模式具体含义见[save-modes](http://spark.apache.org/docs/2.2.0/sql-programming-guide.html#save-modes) 54 | 55 | ##### format [string] 56 | 57 | 序列化方法,当前支持csv、json、parquet、orc和text 58 | 59 | ##### common options [string] 60 | 61 | `Output` 插件通用参数,详情参照 [Output Plugin](/zh-cn/v1/configuration/output-plugin) 62 | 63 | 64 | ### Example 65 | 66 | ``` 67 | hdfs { 68 | path = "hdfs:///var/logs-${now}" 69 | format = "json" 70 | path_time_format = "yyyy.MM.dd" 71 | } 72 | ``` 73 | 74 | > 按天生成HDFS文件,例如**logs-2018.02.12** 75 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/sink-plugins/Hdfs.md: -------------------------------------------------------------------------------- 1 | ## Sink plugin : Hdfs [Spark] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | 输出数据到HDFS 10 | ### Options 11 | 12 | | name | type | required | default value | 13 | | --- | --- | --- | --- | 14 | | [options](#options-object) | object | no | - | 15 | | [partition_by](#partition_by-array) | array | no | - | 16 | | [path](#path-string) | string | yes | - | 17 | | [path_time_format](#path_time_format-string) | string | no | yyyyMMddHHmmss | 18 | | [save_mode](#save_mode-string) | string | no | error | 19 | | [serializer](#serializer-string) | string | no | json | 20 | | [common-options](#common-options-string)| string | no | - | 21 | 22 | ##### options [object] 23 | 24 | 自定义参数 25 | 26 | ##### partition_by [array] 27 | 28 | 根据所选字段对数据进行分区 29 | 30 | ##### path [string] 31 | 32 | 输出文件路径,以 **hdfs://** 开头 33 | 34 | ##### path_time_format [string] 35 | 36 | 当`path`参数中的格式为`xxxx-${now}`时,`path_time_format`可以指定路径的时间格式,默认值为 `yyyy.MM.dd`。常用的时间格式列举如下: 37 | 38 | | Symbol | Description | 39 | | --- | --- | 40 | | y | Year | 41 | | M | Month | 42 | | d | Day of month | 43 | | H | Hour in day (0-23) | 44 | | m | Minute in hour | 45 | | s | Second in minute | 46 | 47 | 详细的时间格式语法见[Java SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html)。 48 | 49 | ##### save_mode [string] 50 | 51 | 存储模式,当前支持overwrite,append,ignore以及error。每个模式具体含义见[save-modes](http://spark.apache.org/docs/2.2.0/sql-programming-guide.html#save-modes) 52 | 53 | ##### serializer [string] 54 | 55 | 序列化方法,当前支持csv、json、parquet、orc和text 56 | 57 | ##### common options [string] 58 | 59 | `Sink` 插件通用参数,详情参照 [Sink Plugin](/zh-cn/v2/spark/configuration/sink-plugins/) 60 | 61 | 62 | ### Examples 63 | 64 | ``` 65 | hdfs { 66 | path = "hdfs:///var/logs-${now}" 67 | serializer = "json" 68 | path_time_format = "yyyy.MM.dd" 69 | } 70 | ``` 71 | 72 | > 按天生成HDFS文件,例如**logs-2018.02.12** 73 | -------------------------------------------------------------------------------- /en/configuration/output-plugins/Hdfs.md: -------------------------------------------------------------------------------- 1 | ## Output plugin : Hdfs 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | Write Rows to HDFS. 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [options](#options-object) | object | no | - | 16 | | [partition_by](#partition_by-array) | array | no | - | 17 | | [path](#path-string) | string | yes | - | 18 | | [path_time_format](#path_time_format-string) | string | no | yyyyMMddHHmmss | 19 | | [save_mode](#save_mode-string) | string | no | error | 20 | | [format](#format-string) | string | no | json | 21 | 22 | ##### options [object] 23 | 24 | Custom parameters. 25 | 26 | ##### partition_by [array] 27 | 28 | Partition the data based on the fields. 29 | 30 | ##### path [string] 31 | 32 | File path on HDFS. Start with `hdfs://`. 33 | 34 | ##### path_time_format [string] 35 | 36 | If `path` contains time variables, such as `xxxx-${now}`, `path_time_format` can be used to specify the format of HDFS path, default is `yyyy.MM.dd`. The commonly used time formats are listed below: 37 | 38 | 39 | | Symbol | Description | 40 | | --- | --- | 41 | | y | Year | 42 | | M | Month | 43 | | d | Day of month | 44 | | H | Hour in day (0-23) | 45 | | m | Minute in hour | 46 | | s | Second in minute | 47 | 48 | The detailed time format syntax:[Java SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html). 49 | 50 | ##### save_mode [string] 51 | 52 | Save mode, supports `overwrite`, `append`, `ignore` and `error`. The detail of save_mode see [save-modes](http://spark.apache.org/docs/2.2.0/sql-programming-guide.html#save-modes). 53 | 54 | ##### format [string] 55 | 56 | Output format, supports `csv`, `json`, `parquet` and `text`. 57 | 58 | 59 | ### Example 60 | 61 | ``` 62 | hdfs { 63 | path = "hdfs:///var/logs-${now}" 64 | format = "json" 65 | path_time_format = "yyyy.MM.dd" 66 | } 67 | ``` 68 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/output-plugins/Hive.md: -------------------------------------------------------------------------------- 1 | # Output plugin : **Hive** 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.5.1 6 | 7 | ### Description 8 | 9 | 写入数据到[Apache Hive](https://hive.apache.org)表中 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --------------------------------------- | ------------- | -------- | ------------- | 15 | | [sql](#hql) | string | no | - | 16 | | [source_table_name](#source_table_name) | string | No | - | 17 | | [result_table_name](#result_table_name) | string | no | - | 18 | | [sink_columns](#sink_columns) | string | no | - | 19 | | [save_mode](#save_mode) | string | no | - | 20 | | [partition_by](#partition_by) | Array[string] | no | - | 21 | 22 | ##### sql[string] 23 | 24 | 标准的hql语句:insert into/overwrite $table select * from xxx_table 25 | 26 | 如果有这个option,会忽略其他的option 27 | 28 | ##### Source_table_name [string] 29 | 30 | 准备输出到hive的表名 31 | 32 | ##### result_table_name [string] 33 | 34 | 结果在hive中的存储表名 35 | 36 | ##### save_mode [string] 37 | 38 | 写入spark中采取的模式,与spark.mode语义相同 39 | 40 | ##### sink_columns[string] 41 | 42 | 选择source_table_name中的需要的字段,存储在result_table_name中,字段间逗号分隔 43 | 44 | ##### partition_by[Array[string]] 45 | 46 | hive分区 47 | 48 | ### Example 49 | 50 | ```conf 51 | output { 52 | Hive { 53 | sql = "insert overwrite table seatunnel.test1 partition(province) select name,age,province from myTable2" 54 | } 55 | } 56 | ``` 57 | 58 | 59 | 60 | 61 | 62 | ```conf 63 | output { 64 | Hive { 65 | source_table_name = "myTable2" 66 | result_table_name = "seatunnel.test1" 67 | save_mode = "overwrite" 68 | sink_columns = "name,age,province" 69 | partition_by = ["province"] 70 | } 71 | } 72 | ``` 73 | 74 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/Hdfs.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : Hdfs [Static] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.0 6 | 7 | ### Description 8 | 9 | 从HDFS文件中读取数据。注意此插件与`HdfsStream`不同,它不是流式的。 10 | 11 | 12 | ### Options 13 | 14 | | name | type | required | default value | 15 | | --- | --- | --- | --- | 16 | | [format](#format-string) | string | no | json | 17 | | [options.*](#options-object) | object | no | - | 18 | | [options.rowTag](#optionsrowTag-string) | string | no | - | 19 | | [path](#path-string) | string | yes | - | 20 | | [common-options](#common-options-string)| string | yes | - | 21 | 22 | ##### format [string] 23 | 24 | 从HDFS中读取文件的格式,目前支持`csv`、`json`、`parquet` 、`xml`、`orc`和 `text`. 25 | 26 | 27 | ##### options [object] 28 | 29 | 自定义参数,当`format = "xml"`时必须设置`optionss.rowTag`,配置XML格式数据的Tag,其他参数不是必填参数。 30 | 31 | 32 | ##### options.rowTag [string] 33 | 34 | 当 `format` 为 **xml** 必须设置 `optionss.rowTag`,配置XML格式数据的Tag 35 | 36 | 37 | ##### path [string] 38 | 39 | Hadoop集群文件路径,以hdfs://开头 40 | 41 | ##### common options [string] 42 | 43 | `Input` 插件通用参数,详情参照 [Input Plugin](/zh-cn/v1/configuration/input-plugin) 44 | 45 | 46 | 47 | ### Example 48 | 49 | ``` 50 | hdfs { 51 | path = "hdfs:///var/seatunnel-logs" 52 | result_table_name = "access_log" 53 | format = "json" 54 | } 55 | ``` 56 | 57 | > 从HDFS中读取json文件,加载到seatunnel中待后续处理. 58 | 59 | 60 | 或者可以指定 hdfs name service: 61 | 62 | ``` 63 | hdfs { 64 | result_table_name = "access_log" 65 | path = "hdfs://m2:8022/seatunnel-logs/access.log" 66 | } 67 | ``` 68 | 69 | 读取XML格式的文件: 70 | 71 | ``` 72 | hdfs { 73 | result_table_name = "books" 74 | path = "hdfs://m2:8022/seatunnel-logs/books.xml" 75 | options.rowTag = "book" 76 | format = "xml" 77 | } 78 | ``` 79 | 80 | 读取CSV格式文件 81 | 82 | ``` 83 | hdfs { 84 | path = "hdfs://m2:8022/seatunnel-logs/books.csv" 85 | format = "csv" 86 | # 将第一列的header作为列名 87 | # 否则将以 _c0,_c1,_c2...依次命名 88 | options.header = "true" 89 | result_table_name = "books" 90 | } 91 | ``` 92 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/RedisStream.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : RedisStream [Streaming] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.0 6 | 7 | ### Description 8 | 9 | Redis集群作为数据源,以队列作为数据输入源 10 | > 例如:logstash支持Redis集群方法资源页-》https://github.com/elastic/logstash/issues/12099 11 | 12 | ### Options 13 | 14 | | name | type | required | default value | 15 | | --- | --- | --- | --- | 16 | | [host](#host-string) | string | yes | - | 17 | | [prefKey](#prefKey-string) | string | yes | - | 18 | | [queue](#queue-string) | string | yes | - | 19 | | [password](#password-string) | string | no | - | 20 | | [maxTotal](#maxTotal-number) | number | no | 200 | 21 | | [maxIdle](#maxIdle-number) | number | no | 200 | 22 | | [maxWaitMillis](#maxWaitMillis-number) | number | no | 2000 | 23 | | [connectionTimeout](#connectionTimeout-number) | number | no | 5000 | 24 | | [soTimeout](#soTimeout-number) | number | no | 5000 | 25 | | [maxAttempts](#maxAttempts-number) | number | no | 5 | 26 | 27 | ##### host [string] 28 | 29 | redis集群地址:多个以逗号分隔 30 | > 例子:127.0.0.1:7000,127.0.0.1:7001,127.0.0.1:7002 31 | 32 | ##### prefKey [string] 33 | 34 | redis-queue业务前缀, 前缀规则: prefKey + ':' + queue 35 | > prefKey为空字符串,则实际队列名称为 queue 36 | 37 | ##### queue [string] 38 | 39 | redis队列名称 , 数据存储队列 40 | > 例子:队列实际名称为 prefKey:queue 41 | 42 | ##### password [string] 43 | 44 | redis密码,空字符串为无密码 45 | 46 | ##### maxTotal [number] 47 | 48 | redis连接池的最大数据库连接数 49 | 50 | ##### maxIdle [number] 51 | 52 | redis最大空闲数 53 | 54 | ##### maxWaitMillis [number] 55 | 56 | redis最大建立连接等待时间 57 | 58 | ##### connectionTimeout [number] 59 | 60 | redis连接超时时间 61 | 62 | ##### soTimeout [number] 63 | 64 | redis读取数据超时时间 65 | 66 | ##### maxAttempts [number] 67 | 68 | redis最大尝试次数 69 | 70 | ##### common options [string] 71 | 72 | `Input` 插件通用参数,详情参照 [Input Plugin](/zh-cn/v1/configuration/input-plugin) 73 | 74 | ### Example 75 | 76 | ``` 77 | RedisStream { 78 | host = "127.0.0.1:7000,127.0.0.1:7001,127.0.0.1:7002" 79 | prefKey = "" 80 | queue = "test" 81 | password = "root" 82 | } 83 | ``` 84 | -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Document 6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 62 | 63 | 64 | 65 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/MongoDB.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : MongoDB 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 1.1.2 6 | 7 | ### Description 8 | 9 | 从[MongoDB](https://www.mongodb.com/)读取数据 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [readconfig.uri](#readconfig.uri-string) | string | yes | - | 16 | | [readconfig.database](#readconfig.database-string) | string | yes | - | 17 | | [readconfig.collection](#readconfig.collection-string) | string | yes | - | 18 | | [readconfig.*](#readconfig.*-string) | string | no | - | 19 | | [schema](#schema-string) | string | no | - | 20 | | [common-options](#common-options-string)| string | yes | - | 21 | 22 | 23 | ##### readconfig.uri [string] 24 | 25 | 要读取mongoDB的uri 26 | 27 | ##### readconfig.database [string] 28 | 29 | 要读取mongoDB的database 30 | 31 | ##### readconfig.collection [string] 32 | 33 | 要读取mongoDB的collection 34 | 35 | #### readconfig.* 36 | 37 | 这里还可以配置更多其他参数,详见https://docs.mongodb.com/spark-connector/v1.1/configuration/, 参见其中的`Input Configuration`部分 38 | 指定参数的方式是在原参数名称上加上前缀"readconfig." 如设置`spark.mongodb.input.partitioner`的方式是 `readconfig.spark.mongodb.input.partitioner="MongoPaginateBySizePartitioner"`。如果不指定这些非必须参数,将使用MongoDB官方文档的默认值 39 | 40 | #### schema 41 | 42 | 因为mongoDB不存在schema的概念,在spark读取mongo的时候,会去对mongo的数据进行抽样并推断schema, 43 | 实际上这个过程会比较慢并且可能不准确,此参数可以手动指定schema避免这些问题。schema为一个json字符串,如`{\"name\":\"string\",\"age\":\"integer\",\"addrs\":{\"country\":\"string\",\"city\":\"string\"}}` 44 | 45 | ##### common options [string] 46 | 47 | `Input` 插件通用参数,详情参照 [Input Plugin](/zh-cn/v1/configuration/input-plugin) 48 | 49 | 50 | 51 | ### Example 52 | 53 | ``` 54 | mongodb{ 55 | readconfig.uri="mongodb://myhost:mypost" 56 | readconfig.database="mydatabase" 57 | readconfig.collection="mycollection" 58 | readconfig.spark.mongodb.input.partitioner = "MongoPaginateBySizePartitioner" 59 | schema="{\"name\":\"string\",\"age\":\"integer\",\"addrs\":{\"country\":\"string\",\"city\":\"string\"}}" 60 | result_table_name = "test" 61 | } 62 | ``` 63 | -------------------------------------------------------------------------------- /zh-cn/v2/spark/configuration/source-plugins/Jdbc.md: -------------------------------------------------------------------------------- 1 | ## Source plugin : JDBC [Spark] 2 | 3 | * Author: InterestingLab 4 | * Homepage: https://interestinglab.github.io/seatunnel-docs 5 | * Version: 2.0.0 6 | 7 | ### Description 8 | 9 | 通过JDBC读取外部数据源数据 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [driver](#driver-string) | string | yes | - | 16 | | [jdbc.*](#jdbc-string) | string| no | | 17 | | [password](#password-string) | string | yes | - | 18 | | [table](#table-string) | string | yes | - | 19 | | [url](#url-string) | string | yes | - | 20 | | [user](#user-string) | string | yes | - | 21 | | [common-options](#common-options-string)| string | yes | - | 22 | 23 | 24 | ##### driver [string] 25 | 26 | 用来连接远端数据源的JDBC类名 27 | 28 | 29 | ##### jdbc [string] 30 | 31 | 除了以上必须指定的参数外,用户还可以指定多个非必须参数,覆盖了Spark JDBC提供的所有[参数](https://spark.apache.org/docs/2.4.0/sql-programming-guide.html#jdbc-to-other-databases). 32 | 33 | 指定参数的方式是在原参数名称上加上前缀"jdbc.",如指定fetchsize的方式是: jdbc.fetchsize = 50000。如果不指定这些非必须参数,它们将使用Spark JDBC给出的默认值。 34 | 35 | 36 | ##### password [string] 37 | 38 | 密码 39 | 40 | ##### table [string] 41 | 42 | 表名 43 | 44 | 45 | ##### url [string] 46 | 47 | JDBC连接的URL。参考一个案例: `jdbc:postgresql://localhost/test` 48 | 49 | 50 | ##### user [string] 51 | 52 | 用户名 53 | 54 | ##### common options [string] 55 | 56 | `Source` 插件通用参数,详情参照 [Source Plugin](/zh-cn/v2/spark/configuration/source-plugins/) 57 | 58 | 59 | ### Example 60 | 61 | ``` 62 | jdbc { 63 | driver = "com.mysql.jdbc.Driver" 64 | url = "jdbc:mysql://localhost:3306/info" 65 | table = "access" 66 | result_table_name = "access_log" 67 | user = "username" 68 | password = "password" 69 | } 70 | ``` 71 | 72 | > 通过JDBC读取MySQL数据 73 | 74 | ```yaml 75 | jdbc { 76 | driver = "com.mysql.jdbc.Driver" 77 | url = "jdbc:mysql://localhost:3306/info" 78 | table = "access" 79 | result_table_name = "access_log" 80 | user = "username" 81 | password = "password" 82 | jdbc.partitionColumn = "item_id" 83 | jdbc.numPartitions = "10" 84 | jdbc.lowerBound = 0 85 | jdbc.upperBound = 100 86 | } 87 | ``` 88 | > 根据指定字段划分分区 89 | 90 | -------------------------------------------------------------------------------- /zh-cn/v1/configuration/input-plugins/MySQL.md: -------------------------------------------------------------------------------- 1 | ## Input plugin : Mysql 2 | 3 | * Author: InterestingLab 4 | * Homepage: 5 | * Version: 1.0.0 6 | 7 | ### Description 8 | 9 | 读取MySQL的数据 10 | 11 | ### Options 12 | 13 | | name | type | required | default value | 14 | | --- | --- | --- | --- | 15 | | [password](#password-string) | string | yes | - | 16 | | [jdbc.*](#jdbc-string) | string| no | | 17 | | [table](#table-string) | string | yes | - | 18 | | [url](#url-string) | string | yes | - | 19 | | [user](#user-string) | string | yes | - | 20 | | [common-options](#common-options-string)| string | yes | - | 21 | 22 | 23 | ##### password [string] 24 | 25 | 密码 26 | 27 | 28 | ##### jdbc [string] 29 | 30 | 除了以上必须指定的参数外,用户还可以指定多个非必须参数,覆盖了Spark JDBC提供的所有[参数](https://spark.apache.org/docs/2.4.0/sql-programming-guide.html#jdbc-to-other-databases). 31 | 32 | 指定参数的方式是在原参数名称上加上前缀"jdbc.",如指定fetchsize的方式是: jdbc.fetchsize = 50000。如果不指定这些非必须参数,它们将使用Spark JDBC给出的默认值。 33 | 34 | 35 | ##### table [string] 36 | 37 | 表名,或者指定SQL语句用于过滤 38 | 39 | 40 | ##### url [string] 41 | 42 | JDBC连接的URL。参考一个案例:`jdbc:mysql://localhost:3306/info` 43 | 44 | 45 | ##### user [string] 46 | 47 | 用户名 48 | 49 | ##### common options [string] 50 | 51 | `Input` 插件通用参数,详情参照 [Input Plugin](/zh-cn/v1/configuration/input-plugin) 52 | 53 | 54 | ### Example 55 | 56 | ``` 57 | mysql { 58 | url = "jdbc:mysql://localhost:3306/info" 59 | table = "access" 60 | result_table_name = "access_log" 61 | user = "username" 62 | password = "password" 63 | } 64 | ``` 65 | 66 | ``` 67 | mysql { 68 | url = "jdbc:mysql://localhost:3306/info" 69 | table = "(select * from access) AS a" 70 | result_table_name = "access_log" 71 | user = "username" 72 | password = "password" 73 | } 74 | ``` 75 | 76 | > 从MySQL中读取数据 77 | 78 | ``` 79 | mysql { 80 | url = "jdbc:mysql://localhost:3306/info" 81 | table = "access" 82 | result_table_name = "access_log" 83 | user = "username" 84 | password = "password" 85 | jdbc.partitionColumn = "item_id" 86 | jdbc.numPartitions = "10" 87 | jdbc.lowerBound = 0 88 | jdbc.upperBound = 100 89 | } 90 | ``` 91 | 92 | > 根据指定字段划分分区 93 | -------------------------------------------------------------------------------- /zh-cn/v1/contribution.md: -------------------------------------------------------------------------------- 1 | # 为 seatunnel 贡献代码 2 | 3 | ## Coding Style 4 | 5 | Scala Coding Style 参考: 6 | 7 | http://docs.scala-lang.org/style/ 8 | 9 | https://github.com/databricks/scala-style-guide 10 | 11 | 使用sbt插件[scalastyle](http://www.scalastyle.org/)作为coding style检查工具;无法通过coding style检查的代码无法提交. 12 | 13 | 通过scalafmt利用[Cli或者IntelliJ Idea](http://scalameta.org/scalafmt/#IntelliJ)自动完成scala代码的格式化。 14 | 如果使用scalafmt的Idea插件,请在插件安装完后设置`文件保存时自动更正代码格式`,方法 "Preferences" -> "Tools" -> "Scalafmt", 勾选"format on file save" 15 | 16 | ## 代码/文档贡献流程 17 | 18 | * Interesting Lab成员 : 19 | 20 | (1) 从 master上 checkout 出新分支,分支名称要求新功能: 21 | .fea.,修复bug: .fixbug., 文档:.doc. 22 | 23 | (2) 开发, 提交commit 24 | 25 | (3) 在github的项目主页,选中你的分支,点"new pull request",提交pull request 26 | 27 | (3) 经至少1个其他成员审核通过,并且travis-ci的build全部通过后,由审核人merge到master分支中. 28 | 29 | (4) 删除你的分支 30 | 31 | * 非Interesting Lab 成员(常见的github协作流程): 32 | 33 | (1) 在seatunnel主页 fork 这个项目 https://github.com/InterestingLab/seatunnel 34 | 35 | (2) 开发 36 | 37 | (3) 提交commit 38 | 39 | (4) 在你自己的项目主页上,点"new pull request",提交pull request 40 | 41 | (5) Interesting Lab 审核通过后,你的贡献将被纳入项目代码中。 42 | 43 | ## 自动化Build与Test 44 | 45 | 此项目使用 [travis-ci](https://travis-ci.org/) 作为自动化Build工具. 46 | 47 | 所有分支每次commit有更新,都会触发自动化Build,新的pull request也会触发。 48 | 49 | ## 国内sbt加速 50 | 51 | ``` 52 | # 增加全局 repositories 配置, 加速依赖下载 53 | vim ~/.sbt/repository 54 | 55 | [repositories] 56 | local 57 | aliyun-ivy: http://maven.aliyun.com/nexus/content/groups/public, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext] 58 | aliyun-maven: http://maven.aliyun.com/nexus/content/groups/public 59 | typesafe: http://repo.typesafe.com/typesafe/ivy-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly 60 | typesafe2: http://repo.typesafe.com/typesafe/releases/ 61 | sbt-plugin: http://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/ 62 | sonatype: http://oss.sonatype.org/content/repositories/snapshots 63 | uk_maven: http://uk.maven.org/maven2/ 64 | repo2: http://repo2.maven.org/maven2/ 65 | ``` 66 | --------------------------------------------------------------------------------