├── README.md ├── iris_estimator ├── data │ ├── iris_test.csv │ └── iris_training.csv ├── export_raw │ └── 1560841190 │ │ ├── saved_model.pb │ │ └── variables │ │ ├── variables.data-00000-of-00002 │ │ ├── variables.data-00001-of-00002 │ │ └── variables.index ├── grpc_client_for_parsing_receiver.py ├── grpc_client_for_raw_receiver.py ├── iris_dnn.py ├── rest_client_for_raw_receiver.py └── shell.sh ├── mnist ├── make_request.py ├── readme.md ├── running_service.py └── train_saved_model.py ├── model_des.png ├── res.png ├── serving.png ├── serving2.png └── serving_nginx ├── dockerfile ├── make_request.py ├── nginx.conf └── readme.md /README.md: -------------------------------------------------------------------------------- 1 | # TF Serving 介绍、部署和Demo 2 | --- 3 | 4 | tf serving: 5 | 支持模型热更新 6 | 支持版本管理 7 | 扩展性较好 8 | 稳定性,性能较好 9 | 10 | ### 一般工作流: 11 | 12 | 1、hdfs上的数据,使用spark/mapreduce/hive 进行数据分析和预处理 13 | 14 | 2、sub sample一部分数据,选择一个模型,预训练初始参数,交叉验证 15 | 16 | 3、使用全部数据集,spark to tfrecord 使用单机读取hdfs数据训练 or 多机多卡分布式训练 17 | 18 | 4、serving the model 19 | 20 | # 一些解决方案: 21 | --- 22 | 23 | ## 方案1: yarn 3.1+ : 24 | 可以支持docker_image, [还不能提供稳定性保障](https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/DockerContainers.html) 25 | 26 | ![image](serving.png) 27 | 28 | [Docker+GPU support + tf serving + hadoop 3.1](https://community.hortonworks.com/articles/231660/tensorflow-serving-function-as-a-service-faas-with.html) 29 | 30 | 31 | ## 方案2: 模型Serving & 同步 from 美团blog 32 | [参考链接](https://gitbook.cn/books/5b3adc411166b9562e9af3f6/index.html) 33 | 34 | ### 训练:tfrecord存放在hdfs上 (训练时拉取到本地) 35 | ### 预测:线上预估方案 36 | 37 | - 模型同步 38 | 39 | 我们开发了一个高可用的同步组件:用户只需要提供线下训练好的模型的 HDFS 路径,该组件会自动同步到线上服务机器上。该组件基于 HTTPFS 实现,它是美团离线计算组提供的 HDFS 的 HTTP 方式访问接口。同步过程如下: 40 | 41 | 同步前,检查模型 md5 文件,只有该文件更新了,才需要同步。 42 | 同步时,随机链接 HTTPFS 机器并限制下载速度。 43 | 同步后,校验模型文件 md5 值并备份旧模型。 44 | 45 | 同步过程中,如果发生错误或者超时,都会触发报警并重试。依赖这一组件,我们实现了在 2min 内可靠的将模型文件同步到线上。 46 | 47 | - 模型计算 48 | 49 | 主要的问题在于解决网络IO和计算性能。 50 | 51 | 并发请求。一个请求会召回很多符合条件的广告。在客户端多个广告并发请求 TF Serving,可以有效降低整体预估时延。 52 | 特征 ID 化。通过将字符串类型的特征名哈希到 64 位整型空间,可以有效减少传输的数据量,降低使用的带宽。 53 | 定制的模型计算,针对性优化 54 | 55 | 56 | 57 | ## 方案3: Centos 7 + docker + tfserving (当前使用方案) 58 | 59 | ### 训练: 实现细节在我的另一个repo [这里](https://github.com/wangruichens/distributed_training) 60 | 61 | ### 预测:线上预估方案 62 | 63 | #### 1、prerequisit: 安装docker 64 | 65 | 使用tfserving的docker版本。不想去踩编译和GPU功能拓展的坑。 66 | ``` 67 | # 1: 安装相关软件 68 | sudo yum install -y yum-utils device-mapper-persistent-data lvm2 69 | # 2: 添加软件源信息 (阿里镜像) 70 | sudo yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo 71 | # 3: 更新并安装 Docker-CE 72 | sudo yum makecache fast 73 | sudo yum -y install docker-ce 74 | # 4: 开启Docker服务 75 | sudo service docker start 76 | # 5: 关闭Docker服务 77 | sudo service docker stop 78 | ``` 79 | 80 | #### 2、使用训练好的model, 使用hdfs上tfrecord数据训练的手写数字识别model. 具体参见我之前的[配置](https://github.com/wangruichens/samples/tree/master/distribute/tf/spark_tfrecord) 81 | 82 | 模型很简单,参数量大概138w. 通过hdfs上的tfrecord来训练,模型文件保存在hdfs上 83 | 84 | ![image](model_des.png) 85 | 86 | #### 3、docker启动tf serving, 拉取hdfs model 到本地并加载模型 87 | 88 | [参考链接](https://www.tensorflow.org/tfx/serving/docker#serving_with_docker) 89 | 90 | 模型保存后的文件路径: 91 | 92 | ``` 93 | test_serving 94 | └───mnist_model_for_serving 95 | │ └───1 96 | │ │ variables 97 | │ │ saved_model.pb 98 | ``` 99 | 可以到模型目录/1/下查看模型输入输出: 100 | ``` 101 | saved_model_cli show --dir . --all 102 | 103 | MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs: 104 | 105 | signature_def['serving_default']: 106 | The given SavedModel SignatureDef contains the following input(s): 107 | inputs['input_image'] tensor_info: 108 | dtype: DT_FLOAT 109 | shape: (-1, 28, 28, 1) 110 | name: Conv1_input:0 111 | The given SavedModel SignatureDef contains the following output(s): 112 | outputs['Softmax/Softmax:0'] tensor_info: 113 | dtype: DT_FLOAT 114 | shape: (-1, 10) 115 | name: Softmax/Softmax:0 116 | Method name is: tensorflow/serving/predict 117 | 118 | ``` 119 | docker 启动服务: 120 | 121 | ``` 122 | # For gRpc,默认端口8500 123 | docker run -p 8500:8500 --mount type=bind,source=/home/wangrc/test_serving/mnist_model_for_serving,target=/models/mnist -e MODEL_NAME=mnist -t tensorflow/serving 124 | 125 | # For REST, 默认端口8501 126 | docker run -p 8501:8501 --mount type=bind,source=/home/wangrc/test_serving/mnist_model_for_serving,target=/models/mnist -e MODEL_NAME=mnist -t tensorflow/serving 127 | ``` 128 | 当然也可以都启用这两个端口,-p 8500:8500 -p 8501:8501,也可以添加一些自己的config, 细节参考官方文档。 129 | 130 | 实际在模型中执行的命令是: 131 | ``` 132 | tensorflow_model_server --port=8500 --rest_api_port=8501 \ 133 | --model_name=my_model --model_base_path=/models/my_model 134 | ``` 135 | 136 | 服务启动,模型成功加载: 137 | 138 | ![image](serving2.png) 139 | 140 | 查看端口占用和服务状态: 141 | ``` 142 | sudo netstat -nap | grep 8501 143 | curl http://localhost:8501/v1/models/mnist 144 | ``` 145 | 146 | #### 4、使用REST测试模型结果(线上采用gRpc): 147 | ``` 148 | python ./demo2/make_request.py 149 | ``` 150 | 151 | 152 | ![image](res.png) 153 | 154 | 155 | ## 方案4: Centos 7 + tf serving + GPU without Docker 156 | 157 | 愿意踩坑的可以自己使用bazel编译:[参考链接](https://www.dearcodes.com/index.php/archives/25/) 158 | 159 | # tf serving 使用nginx部署负载均衡 160 | [这里](https://github.com/wangruichens/tfserving/blob/master/serving_nginx) 161 | 162 | # 主要问题: 163 | 164 | tensorflow serving nginx 单个服务还能管理。当服务多起来,或者有服务间交互调用等问题时,再使用nginx config来管理就不现实了。需要一套类似spring cloud 的微服务框架。 而tf serving 只有python api, 并没有办法注册到spring cloud 来管理。 165 | 166 | 虽然tensorflow serving用于生产环境部署训练好的模型,但需要自己实现集群功能和健康检查,同时和java应用中间还隔着一个网络通讯的开销。所以最好还是java应用内部直接调用模型。python构建并训练模型+java在线预测还是比较合理的方案。 167 | 168 | 对于tensorflow来说,模型上线一般选择tensorflow serving或者client API库来上线,前者适合于较大的模型和应用场景,后者则适合中小型的模型和应用场景。因此算法工程师使用在产品之前需要做好选择和评估。 169 | 170 | 最合适的方案还是:k8s+docker。容器即服务的概念 -------------------------------------------------------------------------------- /iris_estimator/data/iris_test.csv: -------------------------------------------------------------------------------- 1 | 30,4,setosa,versicolor,virginica 2 | 5.9,3.0,4.2,1.5,1 3 | 6.9,3.1,5.4,2.1,2 4 | 5.1,3.3,1.7,0.5,0 5 | 6.0,3.4,4.5,1.6,1 6 | 5.5,2.5,4.0,1.3,1 7 | 6.2,2.9,4.3,1.3,1 8 | 5.5,4.2,1.4,0.2,0 9 | 6.3,2.8,5.1,1.5,2 10 | 5.6,3.0,4.1,1.3,1 11 | 6.7,2.5,5.8,1.8,2 12 | 7.1,3.0,5.9,2.1,2 13 | 4.3,3.0,1.1,0.1,0 14 | 5.6,2.8,4.9,2.0,2 15 | 5.5,2.3,4.0,1.3,1 16 | 6.0,2.2,4.0,1.0,1 17 | 5.1,3.5,1.4,0.2,0 18 | 5.7,2.6,3.5,1.0,1 19 | 4.8,3.4,1.9,0.2,0 20 | 5.1,3.4,1.5,0.2,0 21 | 5.7,2.5,5.0,2.0,2 22 | 5.4,3.4,1.7,0.2,0 23 | 5.6,3.0,4.5,1.5,1 24 | 6.3,2.9,5.6,1.8,2 25 | 6.3,2.5,4.9,1.5,1 26 | 5.8,2.7,3.9,1.2,1 27 | 6.1,3.0,4.6,1.4,1 28 | 5.2,4.1,1.5,0.1,0 29 | 6.7,3.1,4.7,1.5,1 30 | 6.7,3.3,5.7,2.5,2 31 | 6.4,2.9,4.3,1.3,1 32 | -------------------------------------------------------------------------------- /iris_estimator/data/iris_training.csv: -------------------------------------------------------------------------------- 1 | 120,4,setosa,versicolor,virginica 2 | 6.4,2.8,5.6,2.2,2 3 | 5.0,2.3,3.3,1.0,1 4 | 4.9,2.5,4.5,1.7,2 5 | 4.9,3.1,1.5,0.1,0 6 | 5.7,3.8,1.7,0.3,0 7 | 4.4,3.2,1.3,0.2,0 8 | 5.4,3.4,1.5,0.4,0 9 | 6.9,3.1,5.1,2.3,2 10 | 6.7,3.1,4.4,1.4,1 11 | 5.1,3.7,1.5,0.4,0 12 | 5.2,2.7,3.9,1.4,1 13 | 6.9,3.1,4.9,1.5,1 14 | 5.8,4.0,1.2,0.2,0 15 | 5.4,3.9,1.7,0.4,0 16 | 7.7,3.8,6.7,2.2,2 17 | 6.3,3.3,4.7,1.6,1 18 | 6.8,3.2,5.9,2.3,2 19 | 7.6,3.0,6.6,2.1,2 20 | 6.4,3.2,5.3,2.3,2 21 | 5.7,4.4,1.5,0.4,0 22 | 6.7,3.3,5.7,2.1,2 23 | 6.4,2.8,5.6,2.1,2 24 | 5.4,3.9,1.3,0.4,0 25 | 6.1,2.6,5.6,1.4,2 26 | 7.2,3.0,5.8,1.6,2 27 | 5.2,3.5,1.5,0.2,0 28 | 5.8,2.6,4.0,1.2,1 29 | 5.9,3.0,5.1,1.8,2 30 | 5.4,3.0,4.5,1.5,1 31 | 6.7,3.0,5.0,1.7,1 32 | 6.3,2.3,4.4,1.3,1 33 | 5.1,2.5,3.0,1.1,1 34 | 6.4,3.2,4.5,1.5,1 35 | 6.8,3.0,5.5,2.1,2 36 | 6.2,2.8,4.8,1.8,2 37 | 6.9,3.2,5.7,2.3,2 38 | 6.5,3.2,5.1,2.0,2 39 | 5.8,2.8,5.1,2.4,2 40 | 5.1,3.8,1.5,0.3,0 41 | 4.8,3.0,1.4,0.3,0 42 | 7.9,3.8,6.4,2.0,2 43 | 5.8,2.7,5.1,1.9,2 44 | 6.7,3.0,5.2,2.3,2 45 | 5.1,3.8,1.9,0.4,0 46 | 4.7,3.2,1.6,0.2,0 47 | 6.0,2.2,5.0,1.5,2 48 | 4.8,3.4,1.6,0.2,0 49 | 7.7,2.6,6.9,2.3,2 50 | 4.6,3.6,1.0,0.2,0 51 | 7.2,3.2,6.0,1.8,2 52 | 5.0,3.3,1.4,0.2,0 53 | 6.6,3.0,4.4,1.4,1 54 | 6.1,2.8,4.0,1.3,1 55 | 5.0,3.2,1.2,0.2,0 56 | 7.0,3.2,4.7,1.4,1 57 | 6.0,3.0,4.8,1.8,2 58 | 7.4,2.8,6.1,1.9,2 59 | 5.8,2.7,5.1,1.9,2 60 | 6.2,3.4,5.4,2.3,2 61 | 5.0,2.0,3.5,1.0,1 62 | 5.6,2.5,3.9,1.1,1 63 | 6.7,3.1,5.6,2.4,2 64 | 6.3,2.5,5.0,1.9,2 65 | 6.4,3.1,5.5,1.8,2 66 | 6.2,2.2,4.5,1.5,1 67 | 7.3,2.9,6.3,1.8,2 68 | 4.4,3.0,1.3,0.2,0 69 | 7.2,3.6,6.1,2.5,2 70 | 6.5,3.0,5.5,1.8,2 71 | 5.0,3.4,1.5,0.2,0 72 | 4.7,3.2,1.3,0.2,0 73 | 6.6,2.9,4.6,1.3,1 74 | 5.5,3.5,1.3,0.2,0 75 | 7.7,3.0,6.1,2.3,2 76 | 6.1,3.0,4.9,1.8,2 77 | 4.9,3.1,1.5,0.1,0 78 | 5.5,2.4,3.8,1.1,1 79 | 5.7,2.9,4.2,1.3,1 80 | 6.0,2.9,4.5,1.5,1 81 | 6.4,2.7,5.3,1.9,2 82 | 5.4,3.7,1.5,0.2,0 83 | 6.1,2.9,4.7,1.4,1 84 | 6.5,2.8,4.6,1.5,1 85 | 5.6,2.7,4.2,1.3,1 86 | 6.3,3.4,5.6,2.4,2 87 | 4.9,3.1,1.5,0.1,0 88 | 6.8,2.8,4.8,1.4,1 89 | 5.7,2.8,4.5,1.3,1 90 | 6.0,2.7,5.1,1.6,1 91 | 5.0,3.5,1.3,0.3,0 92 | 6.5,3.0,5.2,2.0,2 93 | 6.1,2.8,4.7,1.2,1 94 | 5.1,3.5,1.4,0.3,0 95 | 4.6,3.1,1.5,0.2,0 96 | 6.5,3.0,5.8,2.2,2 97 | 4.6,3.4,1.4,0.3,0 98 | 4.6,3.2,1.4,0.2,0 99 | 7.7,2.8,6.7,2.0,2 100 | 5.9,3.2,4.8,1.8,1 101 | 5.1,3.8,1.6,0.2,0 102 | 4.9,3.0,1.4,0.2,0 103 | 4.9,2.4,3.3,1.0,1 104 | 4.5,2.3,1.3,0.3,0 105 | 5.8,2.7,4.1,1.0,1 106 | 5.0,3.4,1.6,0.4,0 107 | 5.2,3.4,1.4,0.2,0 108 | 5.3,3.7,1.5,0.2,0 109 | 5.0,3.6,1.4,0.2,0 110 | 5.6,2.9,3.6,1.3,1 111 | 4.8,3.1,1.6,0.2,0 112 | 6.3,2.7,4.9,1.8,2 113 | 5.7,2.8,4.1,1.3,1 114 | 5.0,3.0,1.6,0.2,0 115 | 6.3,3.3,6.0,2.5,2 116 | 5.0,3.5,1.6,0.6,0 117 | 5.5,2.6,4.4,1.2,1 118 | 5.7,3.0,4.2,1.2,1 119 | 4.4,2.9,1.4,0.2,0 120 | 4.8,3.0,1.4,0.1,0 121 | 5.5,2.4,3.7,1.0,1 122 | -------------------------------------------------------------------------------- /iris_estimator/export_raw/1560841190/saved_model.pb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruichens/tfserving/5107b07c0849ca16226f94943e029f3f53b3a19f/iris_estimator/export_raw/1560841190/saved_model.pb -------------------------------------------------------------------------------- /iris_estimator/export_raw/1560841190/variables/variables.data-00000-of-00002: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruichens/tfserving/5107b07c0849ca16226f94943e029f3f53b3a19f/iris_estimator/export_raw/1560841190/variables/variables.data-00000-of-00002 -------------------------------------------------------------------------------- /iris_estimator/export_raw/1560841190/variables/variables.data-00001-of-00002: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruichens/tfserving/5107b07c0849ca16226f94943e029f3f53b3a19f/iris_estimator/export_raw/1560841190/variables/variables.data-00001-of-00002 -------------------------------------------------------------------------------- /iris_estimator/export_raw/1560841190/variables/variables.index: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruichens/tfserving/5107b07c0849ca16226f94943e029f3f53b3a19f/iris_estimator/export_raw/1560841190/variables/variables.index -------------------------------------------------------------------------------- /iris_estimator/grpc_client_for_parsing_receiver.py: -------------------------------------------------------------------------------- 1 | # GRPC remote call using estimator model with build_parsing_serving_input_receiver_fn 2 | 3 | 4 | import tensorflow as tf 5 | from tensorflow_serving.apis import predict_pb2 6 | from tensorflow_serving.apis import prediction_service_pb2_grpc 7 | import grpc 8 | from time import time 9 | import numpy as np 10 | 11 | tf.app.flags.DEFINE_string('server', 'ha05:8556', 12 | 'Server host:port.') 13 | tf.app.flags.DEFINE_string('model', 'iris', 14 | 'Model name.') 15 | FLAGS = tf.app.flags.FLAGS 16 | 17 | 18 | def _float_feature(value): 19 | return tf.train.Feature(float_list=tf.train.FloatList(value=[value])) 20 | 21 | 22 | def _bytes_feature(value): 23 | return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) 24 | 25 | 26 | def main(_): 27 | channel = grpc.insecure_channel(FLAGS.server) 28 | stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) 29 | 30 | request = predict_pb2.PredictRequest() 31 | request.model_spec.name = FLAGS.model 32 | request.model_spec.signature_name = 'predict' 33 | batching = [] 34 | 35 | for i in range(1000): 36 | feature_dict = {'SepalLength': _float_feature(value=np.random.random()), 37 | 'SepalWidth': _float_feature(value=np.random.random()), 38 | 'PetalLength': _float_feature(value=np.random.random()), 39 | 'PetalWidth': _float_feature(value=np.random.random())} 40 | 41 | example = tf.train.Example(features=tf.train.Features(feature=feature_dict)) 42 | serialized = example.SerializeToString() 43 | batching.append(serialized) 44 | 45 | request.inputs['examples'].CopyFrom( 46 | tf.make_tensor_proto(batching, shape=[len(batching)])) 47 | 48 | start = time() 49 | result_future = stub.Predict.future(request, 5.0) 50 | elapsed = (time() - start) 51 | prediction = result_future.result().outputs['probabilities'] 52 | print(prediction) 53 | print("Time used:{0}ms".format(round(elapsed * 1000, 2))) 54 | 55 | 56 | if __name__ == '__main__': 57 | tf.app.run() 58 | -------------------------------------------------------------------------------- /iris_estimator/grpc_client_for_raw_receiver.py: -------------------------------------------------------------------------------- 1 | # GRPC remote call using estimator model with build_raw_serving_input_receiver_fn 2 | 3 | 4 | import tensorflow as tf 5 | from tensorflow_serving.apis import predict_pb2 6 | from tensorflow_serving.apis import prediction_service_pb2_grpc 7 | import grpc 8 | from time import time 9 | import numpy as np 10 | 11 | tf.app.flags.DEFINE_string('server', 'ha05:8555', 12 | 'Server host:port.') 13 | tf.app.flags.DEFINE_string('model', 'iris', 14 | 'Model name.') 15 | FLAGS = tf.app.flags.FLAGS 16 | 17 | 18 | def _float_feature(value): 19 | return tf.train.Feature(float_list=tf.train.FloatList(value=[value])) 20 | 21 | 22 | def _bytes_feature(value): 23 | return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) 24 | 25 | 26 | def main(_): 27 | channel = grpc.insecure_channel(FLAGS.server) 28 | stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) 29 | 30 | request = predict_pb2.PredictRequest() 31 | request.model_spec.name = FLAGS.model 32 | request.model_spec.signature_name = 'predict' 33 | 34 | p = [] 35 | for i in range(1000): 36 | p.append(np.random.random()) 37 | 38 | request.inputs['SepalLength'].CopyFrom( 39 | tf.make_tensor_proto(p, shape=[len(p)])) 40 | request.inputs['SepalWidth'].CopyFrom( 41 | tf.make_tensor_proto(p, shape=[len(p)])) 42 | request.inputs['PetalLength'].CopyFrom( 43 | tf.make_tensor_proto(p, shape=[len(p)])) 44 | request.inputs['PetalWidth'].CopyFrom( 45 | tf.make_tensor_proto(p, shape=[len(p)])) 46 | start = time() 47 | result_future = stub.Predict.future(request, 5.0) 48 | elapsed = (time() - start) 49 | prediction = result_future.result().outputs['probabilities'] 50 | print(prediction) 51 | print("Time used:{0}ms".format(round(elapsed * 1000, 2))) 52 | 53 | 54 | if __name__ == '__main__': 55 | tf.app.run() 56 | -------------------------------------------------------------------------------- /iris_estimator/iris_dnn.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | import pandas as pd 4 | import tensorflow as tf 5 | from tensorflow_estimator import estimator 6 | 7 | COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species'] 8 | SPECIES = ['Setosa', 'Versicolor', 'Virginica'] 9 | BATCH_SIZE = 100 10 | STEPS = 10000 11 | 12 | # load data 13 | y_name = 'Species' 14 | 15 | train = pd.read_csv('data/iris_training.csv', names=COLUMN_NAMES, header=0) 16 | train_x, train_y = train, train.pop(y_name) 17 | 18 | test = pd.read_csv('data/iris_test.csv', names=COLUMN_NAMES, header=0) 19 | test_x, test_y = test, test.pop(y_name) 20 | 21 | 22 | # prepare input / eval fn 23 | def train_input_fn(features, labels, batch_size): 24 | dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels)) 25 | dataset = dataset.shuffle(1000).repeat().batch(batch_size) 26 | return dataset 27 | 28 | 29 | def eval_input_fn(features, labels, batch_size): 30 | features = dict(features) 31 | inputs = (features, labels) if labels is not None else features 32 | dataset = tf.data.Dataset.from_tensor_slices(inputs) 33 | dataset = dataset.batch(batch_size) 34 | return dataset 35 | 36 | hook = estimator.ProfilerHook(save_steps=300, output_dir='./time/', show_memory=True, show_dataflow=True) 37 | feature_columns = [tf.feature_column.numeric_column(key=key) 38 | for key in train_x.keys()] 39 | 40 | test = tf.feature_column.numeric_column(train_x.keys()[0], default_value=0.0) 41 | test = tf.feature_column.bucketized_column(test,[0.1,1,100]) 42 | test_emb = tf.feature_column.embedding_column(test, 10) 43 | feature_columns.append(test_emb) 44 | 45 | session_config = tf.ConfigProto() 46 | 47 | mirrored_strategy = tf.distribute.MirroredStrategy() 48 | 49 | 50 | config = estimator.RunConfig( 51 | train_distribute=mirrored_strategy, 52 | eval_distribute=mirrored_strategy, 53 | ) 54 | 55 | classifier = estimator.DNNClassifier( 56 | feature_columns=feature_columns, 57 | hidden_units=[10, 10], 58 | n_classes=3, 59 | config=config) 60 | 61 | classifier.train( 62 | input_fn=lambda: train_input_fn(train_x, train_y, batch_size=BATCH_SIZE),hooks=[hook], 63 | steps=STEPS) 64 | 65 | # evaluate 66 | eval_result = classifier.evaluate( 67 | input_fn=lambda: eval_input_fn(test_x, test_y, batch_size=BATCH_SIZE)) 68 | 69 | print('Test set accuracy: {accuracy:0.3f}'.format(**eval_result)) 70 | 71 | # predict 72 | expected = ['Setosa', 'Versicolor', 'Virginica'] 73 | predict_x = { 74 | 'SepalLength': [5.1, 5.9, 6.9], 75 | 'SepalWidth': [3.3, 3.0, 3.1], 76 | 'PetalLength': [1.7, 4.2, 5.4], 77 | 'PetalWidth': [0.5, 1.5, 2.1], 78 | } 79 | 80 | predictions = classifier.predict( 81 | input_fn=lambda: eval_input_fn(predict_x, labels=None, batch_size=BATCH_SIZE)) 82 | 83 | for prediction, expect in zip(predictions, expected): 84 | class_id = prediction['class_ids'][0] 85 | probability = prediction['probabilities'][class_id] 86 | print('Prediction is "{}" ({:.1f}%), expected "{}"'.format( 87 | SPECIES[class_id], 100 * probability, expect)) 88 | 89 | # export model 90 | 91 | from tensorflow import FixedLenFeature 92 | 93 | feature_specification = { 94 | 'SepalLength': FixedLenFeature(shape=(1,), dtype=tf.float32, default_value=None), 95 | 'SepalWidth': FixedLenFeature(shape=(1,), dtype=tf.float32, default_value=None), 96 | 'PetalLength': FixedLenFeature(shape=(1,), dtype=tf.float32, default_value=None), 97 | 'PetalWidth': FixedLenFeature(shape=(1,), dtype=tf.float32, default_value=None) 98 | } 99 | # feature_spec = tf.feature_column.make_parse_example_spec(feature_columns) 100 | 101 | features = { 102 | 'SepalLength': tf.placeholder(dtype=tf.float32, shape=(1), name='SepalLength'), 103 | 'SepalWidth': tf.placeholder(dtype=tf.float32, shape=(1), name='SepalWidth'), 104 | 'PetalLength': tf.placeholder(dtype=tf.float32, shape=(1), name='PetalLength'), 105 | 'PetalWidth': tf.placeholder(dtype=tf.float32, shape=(1), name='PetalWidth') 106 | } 107 | 108 | # Can pass the key-value format to http REST api directly. 109 | # curl -d '{"signature_name": "predict","instances": [{"SepalLength":[5.1],"SepalWidth":[3.3],"PetalLength":[1.7],"PetalWidth":[0.5]}]}' -X POST http://localhost:8501/v1/models/iris:predict 110 | 111 | # If using shape=(None,1), the input shape should be (?,1). Means two dimensions 112 | # saved_model_cli run --dir /home/wangrc/github/summaries/serving/estimator/export_raw/1560322385 \ 113 | # --tag_set serve --signature_def predict \ 114 | # --input_exprs 'SepalLength=[[5.1],[5.1]];SepalWidth=[[3.3],[3]];PetalLength=[[1.7],[3]];PetalWidth=[[0.5],[2]]' 115 | 116 | serving_input_receiver_fn = tf.estimator.export.build_raw_serving_input_receiver_fn(features) 117 | export_dir = classifier.export_savedmodel('export_raw', serving_input_receiver_fn) 118 | 119 | # Only works in this way 120 | # If using http REST api, the json can not parse the tf.Example string. 121 | # saved_model_cli run --dir /home/wangrc/Downloads/tf-serve-master/export_parsing/1560302102 \ 122 | # --tag_set serve --signature_def predict \ 123 | # --input_examples 'examples=[{"SepalLength":[5.1],"SepalWidth":[3.3],"PetalLength":[1.7],"PetalWidth":[0.5]}]' 124 | 125 | # serving_input_receiver_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_specification) 126 | # export_dir = classifier.export_savedmodel('export_parsing', serving_input_receiver_fn) 127 | 128 | 129 | print('Exported to {}'.format(export_dir)) 130 | -------------------------------------------------------------------------------- /iris_estimator/rest_client_for_raw_receiver.py: -------------------------------------------------------------------------------- 1 | # REST remote call using estimator model with build_raw_serving_input_receiver_fn 2 | 3 | import json 4 | import requests 5 | import numpy as np 6 | from time import time 7 | 8 | # tensorflow_model_server \ 9 | # --rest_api_port=8501 \ 10 | # --model_name=deepfm \ 11 | # --model_base_path="/home/wangrc/Desktop/" 12 | 13 | batching=[] 14 | for i in range(1000): 15 | p = { 16 | "SepalLength": np.random.random(), 17 | "SepalWidth": np.random.random(), 18 | "PetalLength": np.random.random(), 19 | "PetalWidth": np.random.random() 20 | } 21 | batching.append(p) 22 | 23 | data = json.dumps({"signature_name": "predict", "instances": batching}) 24 | # print(data) 25 | 26 | headers = {"content-type": "application/json"} 27 | 28 | start = time() 29 | json_response = requests.post('http://ha05:8555/v1/models/iris:predict', data=data, headers=headers) 30 | elapsed = (time() - start) 31 | # print(json_response.text) 32 | predictions = json.loads(json_response.text)['predictions'] 33 | print(predictions) 34 | print("Time used:{0}ms".format(round(elapsed * 1000, 2))) 35 | -------------------------------------------------------------------------------- /iris_estimator/shell.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | tensorflow_model_server \ 3 | --port=8500 \ 4 | --rest_api_port=8501 \ 5 | --model_name=iris \ 6 | --model_base_path="/home/wangrc/github/summaries/serving/estimator/export_raw" 7 | 8 | 9 | 10 | docker run -p 8555:8555 --mount type=bind,source=/home/wangrc/export_parsing,target=/models/iris -e MODEL_NAME=iris -t tensorflow/serving 11 | 12 | docker run -p 8500:8500 --mount type=bind,source=/home/wangrc/export/,target=/models/deepfm -e MODEL_NAME=deepfm -t tensorflow/serving 13 | 14 | curl -o - http://ha05:8500/v1/models/deepfm 15 | 16 | 17 | curl -o - http://algorithmsdeepfm.2345.cn:38609/v1/models/deepfm -------------------------------------------------------------------------------- /mnist/make_request.py: -------------------------------------------------------------------------------- 1 | import json 2 | import random 3 | from time import time 4 | 5 | import numpy as np 6 | import requests 7 | from matplotlib import pyplot as plt 8 | from tensorflow import keras 9 | 10 | fashion_mnist = keras.datasets.fashion_mnist 11 | (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data() 12 | 13 | # scale the values to 0.0 to 1.0 14 | train_images = train_images / 255.0 15 | test_images = test_images / 255.0 16 | 17 | # reshape for feeding into the model 18 | train_images = train_images.reshape(train_images.shape[0], 28, 28, 1) 19 | test_images = test_images.reshape(test_images.shape[0], 28, 28, 1) 20 | 21 | class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 22 | 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'] 23 | 24 | 25 | def show(idx, title): 26 | plt.figure() 27 | plt.imshow(test_images[idx].reshape(28, 28)) 28 | plt.axis('off') 29 | plt.title('\n\n{}'.format(title), fontdict={'size': 16}) 30 | plt.show() 31 | 32 | 33 | rando = random.randint(0, len(test_images) - 1) 34 | # show(rando, 'An Example Image: {}'.format(class_names[test_labels[rando]])) 35 | 36 | 37 | data = json.dumps({"signature_name": "serving_default", "instances": [test_images[rando].tolist()]}) 38 | # print('Data: {} ... {}'.format(data[:50], data[len(data)-52:])) 39 | 40 | 41 | headers = {"content-type": "application/json"} 42 | start = time() 43 | json_response = requests.post('http://localhost:8502/v1/models/fashion_model/versions/1:predict', data=data, 44 | headers=headers) 45 | elapsed = (time() - start) 46 | print(json_response.text) 47 | predictions = json.loads(json_response.text)['predictions'] 48 | 49 | show(rando, 'predict: {} , actually: {} '.format( 50 | np.argmax(predictions[0]), test_labels[rando])) 51 | 52 | print('predict: {} , actually: {}'.format( 53 | np.argmax(predictions[0]), test_labels[rando]), 54 | ", Time used:{0}ms".format(round(elapsed * 1000, 2))) 55 | -------------------------------------------------------------------------------- /mnist/readme.md: -------------------------------------------------------------------------------- 1 | train model and save ->running service -> make request 2 | 3 | docker中运行: 4 | 参考 : https://www.tensorflow.org/tfx/serving/docker#serving_with_docker 5 | 6 | 启动cpu版本service 7 | docker run -p 8500:8500 --mount type=bind,source=/home/wangrc/test_serving/mnist_model_for_serving,target=/models/mnist -e MODEL_NAME=mnist -t tensorflow/serving 8 | 9 | 启动gpu版本service 10 | docker run --runtime=nvidia -p 8501:8501 --mount type=bind,source=/home/wangrc/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,target=/models/half_plus_two -e MODEL_NAME=half_plus_two -t tensorflow/serving:latest-gpu 11 | 12 | 13 | 文件夹目录: 14 | mnist_model_for_serving/1/ saved_model.pb variables 15 | 16 | saved_model_cli show --dir . --all 17 | 18 | curl http://localhost:8501/v1/models/mnist 19 | -------------------------------------------------------------------------------- /mnist/running_service.py: -------------------------------------------------------------------------------- 1 | import tempfile 2 | import os 3 | import subprocess 4 | 5 | MODEL_DIR = tempfile.gettempdir() 6 | version = 1 7 | export_path = os.path.join(MODEL_DIR, str(version)) 8 | 9 | print(MODEL_DIR) 10 | 11 | os.environ["MODEL_DIR"] = MODEL_DIR 12 | cmd='nohup tensorflow_model_server \ 13 | --rest_api_port=8502 \ 14 | --model_name=fashion_model \ 15 | --model_base_path="${MODEL_DIR}" >server.log 2>&1' 16 | subprocess.call(cmd,shell=True) -------------------------------------------------------------------------------- /mnist/train_saved_model.py: -------------------------------------------------------------------------------- 1 | # TensorFlow and tf.keras 2 | import tensorflow as tf 3 | from tensorflow import keras 4 | 5 | # Helper libraries 6 | import numpy as np 7 | import matplotlib.pyplot as plt 8 | import os 9 | import subprocess 10 | 11 | tf.logging.set_verbosity(tf.logging.ERROR) 12 | print(tf.__version__) 13 | 14 | 15 | os.environ["CUDA_VISIBLE_DEVICES"] = "0,1" 16 | 17 | config = tf.ConfigProto() 18 | config.gpu_options.allow_growth = True # dynamically grow the memory used on the GPU 19 | # config.log_device_placement = True 20 | 21 | sess = tf.Session(config=config) 22 | keras.backend.set_session(sess) # set this TensorFlow session as the default session for Keras 23 | 24 | 25 | fashion_mnist = keras.datasets.fashion_mnist 26 | (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data() 27 | 28 | # scale the values to 0.0 to 1.0 29 | train_images = train_images / 255.0 30 | test_images = test_images / 255.0 31 | 32 | # reshape for feeding into the model 33 | train_images = train_images.reshape(train_images.shape[0], 28, 28, 1) 34 | test_images = test_images.reshape(test_images.shape[0], 28, 28, 1) 35 | 36 | class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 37 | 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'] 38 | 39 | print('\ntrain_images.shape: {}, of {}'.format(train_images.shape, train_images.dtype)) 40 | print('test_images.shape: {}, of {}'.format(test_images.shape, test_images.dtype)) 41 | 42 | 43 | model = keras.Sequential([ 44 | keras.layers.Conv2D(input_shape=(28,28,1), filters=8, kernel_size=3, 45 | strides=2, activation='relu', name='Conv1'), 46 | keras.layers.Flatten(), 47 | keras.layers.Dense(10, activation=tf.nn.softmax, name='Softmax') 48 | ]) 49 | 50 | model.summary() 51 | 52 | testing = False 53 | epochs = 5 54 | 55 | # model = tf.keras.utils.multi_gpu_model(model, gpus=2) 56 | 57 | model.compile(optimizer=tf.train.AdamOptimizer(), 58 | loss='sparse_categorical_crossentropy', 59 | metrics=['accuracy']) 60 | model.fit(train_images, train_labels, epochs=epochs) 61 | 62 | test_loss, test_acc = model.evaluate(test_images, test_labels) 63 | print('\nTest accuracy: {}'.format(test_acc)) 64 | 65 | # To load our trained model into TensorFlow Serving we first need to save it in SavedModel format. 66 | import tempfile 67 | 68 | MODEL_DIR = tempfile.gettempdir() 69 | version = 1 70 | export_path = os.path.join(MODEL_DIR, str(version)) 71 | print('export_path = {}\n'.format(export_path)) 72 | if os.path.isdir(export_path): 73 | print('\nAlready saved a model, cleaning up\n') 74 | 75 | tf.saved_model.simple_save( 76 | keras.backend.get_session(), 77 | export_path, 78 | inputs={'input_image': model.input}, 79 | outputs={t.name:t for t in model.outputs}) 80 | 81 | print('\nSaved model:') 82 | 83 | #saved_model_cli show --dir . --all 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | -------------------------------------------------------------------------------- /model_des.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruichens/tfserving/5107b07c0849ca16226f94943e029f3f53b3a19f/model_des.png -------------------------------------------------------------------------------- /res.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruichens/tfserving/5107b07c0849ca16226f94943e029f3f53b3a19f/res.png -------------------------------------------------------------------------------- /serving.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruichens/tfserving/5107b07c0849ca16226f94943e029f3f53b3a19f/serving.png -------------------------------------------------------------------------------- /serving2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruichens/tfserving/5107b07c0849ca16226f94943e029f3f53b3a19f/serving2.png -------------------------------------------------------------------------------- /serving_nginx/dockerfile: -------------------------------------------------------------------------------- 1 | FROM nginx 2 | COPY nginx.conf /etc/nginx 3 | -------------------------------------------------------------------------------- /serving_nginx/make_request.py: -------------------------------------------------------------------------------- 1 | ############################## 2 | # 3 | # call tfserving using REST API demo 4 | # 5 | ############################## 6 | 7 | from matplotlib import pyplot as plt 8 | import tensorflow as tf 9 | from tensorflow import keras 10 | import numpy as np 11 | from time import time 12 | 13 | mnist = keras.datasets.mnist 14 | (train_images, train_labels), (test_images, test_labels) = mnist.load_data() 15 | 16 | # reshape for feeding into the model 17 | train_images = train_images.reshape(train_images.shape[0], 28, 28, 1) 18 | test_images = test_images.reshape(test_images.shape[0], 28, 28, 1) 19 | 20 | 21 | def show(idx, title): 22 | plt.figure() 23 | plt.imshow(test_images[idx].reshape(28, 28)) 24 | plt.axis('off') 25 | plt.title('\n\n{}'.format(title), fontdict={'size': 16}) 26 | plt.show() 27 | 28 | 29 | import random 30 | import requests 31 | import json 32 | 33 | correct=0 34 | error=0 35 | def test_post(): 36 | acc=0 37 | rando = random.randint(0, len(test_images) - 1) 38 | data = json.dumps({"signature_name": "serving_default", "instances": test_images[0:400].tolist()}) 39 | # print('Data: {} ... {}'.format(data[:50], data[len(data)-52:])) 40 | headers = {"content-type": "application/json"} 41 | start = time() 42 | json_response = requests.post('http://algorithmsdemo.2345.cn/v1/models/mnist/versions/1:predict', data=data, headers=headers) 43 | elapsed = (time() - start) 44 | # print(json_response.text) 45 | # predictions = json.loads(json_response.text)['predictions'] 46 | 47 | # print("Time used:{0}ms".format(round(elapsed * 1000, 2))) 48 | # print('predict: {} , actually: {} '.format( 49 | # np.argmax(predictions[0]), test_labels[rando])) 50 | # show(rando, 'predict: {} , actually: {} '.format( 51 | # np.argmax(predictions[0]), test_labels[rando])) 52 | return elapsed 53 | 54 | sec=[] 55 | req_count=100 56 | for i in range(req_count): 57 | sec.append(test_post()) 58 | print('average cost time {0} ms'.format(np.sum(sec)/req_count*1000)) -------------------------------------------------------------------------------- /serving_nginx/nginx.conf: -------------------------------------------------------------------------------- 1 | 2 | user nginx; 3 | worker_processes 1; 4 | 5 | error_log /var/log/nginx/error.log warn; 6 | pid /var/run/nginx.pid; 7 | 8 | 9 | events { 10 | worker_connections 1024; 11 | } 12 | 13 | 14 | http { 15 | include /etc/nginx/mime.types; 16 | default_type application/octet-stream; 17 | 18 | log_format main '$upstream_addr $remote_addr - $remote_user [$time_local] "$request" ' 19 | '$status $body_bytes_sent "$http_referer" ' 20 | '"$http_user_agent" "$http_x_forwarded_for"'; 21 | 22 | access_log /var/log/nginx/access.log main; 23 | 24 | sendfile on; 25 | #tcp_nopush on; 26 | 27 | keepalive_timeout 65; 28 | 29 | upstream 172.0.0.1 30 | { 31 | server 172.0.0.1:8501; 32 | server 172.0.0.2:8501; 33 | server 172.0.0.3:8501; 34 | server 172.0.0.4:8501; 35 | server 172.0.0.5:8501; 36 | } 37 | 38 | server 39 | { 40 | listen 8255; 41 | server_name 172.0.0.1; 42 | 43 | location /{ 44 | proxy_pass http://172.0.0.1; 45 | proxy_set_header Host $host; 46 | proxy_set_header X-Real-IP $remote_addr; 47 | proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 48 | } 49 | } 50 | 51 | #gzip on; 52 | 53 | include /etc/nginx/conf.d/*.conf; 54 | } 55 | -------------------------------------------------------------------------------- /serving_nginx/readme.md: -------------------------------------------------------------------------------- 1 | Nginx + TF Serving using Docker 2 | 3 | 为tf serving 配置负载均衡。 4 | 5 | 1、 6 | 首先使用docker pull最新的nginx 镜像, 将其中的nginx.conf拷贝出来,修改成 ./nginx.conf的格式,配置主从节点ip和端口。 7 | * $upstream_addr : log_format配置,当ngnix做负载均衡时,可以查看后台提供真实服务的设备 8 | 9 | 2、 10 | 在每台机器上启动tfserving服务: 11 | ```angular2 12 | docker run -d -p 8501:8501 --mount type=bind,source=/home/wangrc/mnist_model_for_serving,target=/models/mnist -e MODEL_NAME=mnist -t tensorflow/serving 13 | ``` 14 | 15 | 3、 16 | 在主节点 ha05 上执行: 17 | ```angular2 18 | docker run -d -p 8256:80 -p 8255:8255 -v '/home/wangrc/mnist_log:/var/log/nginx' --name nginx_server mynginx 19 | ``` 20 | 映射相关log到本地 21 | 22 | 4、 测试端口是否正常工作。 可以通过make_request.py提交post请求,查看log确认负载均衡搭建成功。 23 | ```angular2 24 | curl http://ha05:8255/v1/models/mnist 25 | ``` 26 | --------------------------------------------------------------------------------