├── big-data ├── ppml │ ├── AZURE.md │ └── AZURE_OCCULUM.md ├── chronos │ ├── Makefile │ ├── docker-compose.yml │ ├── Dockerfile.chronos │ └── README.md ├── friesian │ ├── training │ │ ├── Makefile │ │ ├── Dockerfile.friesian-training │ │ ├── docker-compose.yml │ │ └── README.md │ └── DEVCATALOG.md └── aiok-ray │ ├── training │ ├── Makefile │ └── docker-compose.yml │ └── inference │ ├── Makefile │ ├── docker-compose.yml │ └── README.md ├── analytics ├── classical-ml │ ├── synthetic │ │ └── inference │ │ │ ├── Makefile │ │ │ ├── docker-compose.yml │ │ │ ├── Dockerfile.wafer-insights │ │ │ ├── README.md │ │ │ └── DEVCATALOG.md │ └── recsys │ │ └── training │ │ ├── Makefile │ │ ├── docker-compose.yml │ │ └── README.md └── tensorflow │ └── ssd_resnet34 │ └── inference │ ├── Makefile │ ├── docker-compose.yml │ └── Dockerfile.video-streamer ├── language_modeling └── pytorch │ ├── bert_base │ ├── training │ │ ├── Makefile │ │ ├── docker-compose.yml │ │ └── DEVCATALOG.md │ └── inference │ │ ├── Makefile │ │ └── docker-compose.yml │ └── bert_large │ └── training │ ├── chart │ ├── .helmignore │ ├── values.yaml │ ├── Chart.yaml │ ├── README.md │ └── templates │ │ ├── _helpers.tpl │ │ └── workflowTemplate.yaml │ ├── docker-compose.yml │ ├── Dockerfile.hugging-face-dlsa │ └── Makefile ├── .github ├── CODEOWNERS ├── linters │ └── .markdownlint.json └── workflows │ └── linter.yml ├── template ├── Makefile ├── docker-compose.yml ├── Dockerfile.PIPELINE_NAME ├── TEMPLATE.md └── DEVCATALOG_TEMPLATE.md ├── SECURITY.md ├── transfer_learning └── tensorflow │ └── resnet50 │ ├── training │ ├── chart │ │ ├── .helmignore │ │ ├── values.yaml │ │ ├── Chart.yaml │ │ ├── README.md │ │ └── templates │ │ │ ├── _helpers.tpl │ │ │ └── workflowTemplate.yaml │ ├── docker-compose.yml │ ├── Makefile │ └── Dockerfile.vision-transfer-learning │ └── inference │ ├── Makefile │ ├── docker-compose.yml │ ├── Dockerfile.vision-transfer-learning │ └── README.md ├── .gitignore ├── protein-folding └── pytorch │ └── alphafold2 │ └── inference │ ├── Makefile │ ├── Dockerfile.protein-prediction │ └── docker-compose.yml ├── classification └── tensorflow │ └── bert_base │ └── inference │ ├── docker-compose.yml │ ├── Makefile │ ├── README.md │ └── DEVCATALOG.md ├── CONTRIBUTING.md ├── .gitmodules ├── README.md ├── CODE_OF_CONDUCT.md └── LICENSE /big-data/ppml/AZURE.md: -------------------------------------------------------------------------------- 1 | bigDL-ppml/docs/readthedocs/source/doc/PPML/Overview/azure_ppml.md -------------------------------------------------------------------------------- /big-data/ppml/AZURE_OCCULUM.md: -------------------------------------------------------------------------------- 1 | bigDL-ppml/docs/readthedocs/source/doc/PPML/Overview/azure_ppml_occlum.md -------------------------------------------------------------------------------- /big-data/chronos/Makefile: -------------------------------------------------------------------------------- 1 | FINAL_IMAGE_NAME ?= chronos 2 | 3 | chronos: 4 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 5 | docker compose up chronos --build 6 | 7 | clean: 8 | docker compose down 9 | -------------------------------------------------------------------------------- /analytics/classical-ml/synthetic/inference/Makefile: -------------------------------------------------------------------------------- 1 | OUTPUT_DIR ?= /output 2 | FINAL_IMAGE_NAME ?= wafer-insights 3 | 4 | wafer-insight: 5 | @OUTPUT_DIR=${OUTPUT_DIR} \ 6 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 7 | docker compose up wafer-insight --build 8 | 9 | clean: 10 | docker compose down 11 | -------------------------------------------------------------------------------- /language_modeling/pytorch/bert_base/training/Makefile: -------------------------------------------------------------------------------- 1 | AZURE_CONFIG_FILE ?= $$(pwd)/config.json 2 | FINAL_IMAGE_NAME ?= nlp-azure 3 | 4 | nlp-azure: 5 | AZURE_CONFIG_FILE=${AZURE_CONFIG_FILE} \ 6 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 7 | docker compose up nlp-azure --build 8 | 9 | clean: 10 | docker compose down 11 | -------------------------------------------------------------------------------- /.github/CODEOWNERS: -------------------------------------------------------------------------------- 1 | # Adding Default CODEOWNERS 2 | * jitendra.patil@intel.com srikanth.ramakrishna@intel.com sharvil.shah@intel.com tyler.titsworth@intel.com 3 | 4 | # Add code owners for DevCatalog READMEs anywhere 5 | DEVCATALOG.md louie.tsai@intel.com marina.zubova@intel.com isha.ghosh@intel.com david.b.kinder@intel.com luis.real.novo@intel.com 6 | 7 | -------------------------------------------------------------------------------- /template/Makefile: -------------------------------------------------------------------------------- 1 | ?= 2 | ?= 3 | ?= 4 | FINAL_IMAGE_NAME ?= 5 | OUTPUT_DIR ?= /output 6 | 7 | : 8 | @=${} \ 9 | =${} \ 10 | =${} \ 11 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 12 | OUTPUT_DIR=${OUTPUT_DIR} \ 13 | docker compose up --build 14 | 15 | clean: 16 | docker compose down 17 | -------------------------------------------------------------------------------- /language_modeling/pytorch/bert_large/training/chart/.helmignore: -------------------------------------------------------------------------------- 1 | # Patterns to ignore when building packages. 2 | # This supports shell glob matching, relative path matching, and 3 | # negation (prefixed with !). Only one pattern per line. 4 | .DS_Store 5 | # Common VCS dirs 6 | .git/ 7 | .gitignore 8 | .bzr/ 9 | .bzrignore 10 | .hg/ 11 | .hgignore 12 | .svn/ 13 | # Common backup files 14 | *.swp 15 | *.bak 16 | *.tmp 17 | *.orig 18 | *~ 19 | # Various IDEs 20 | .project 21 | .idea/ 22 | *.tmproj 23 | .vscode/ 24 | -------------------------------------------------------------------------------- /SECURITY.md: -------------------------------------------------------------------------------- 1 | # Security Policy 2 | 3 | ## Report a Vulnerability 4 | 5 | Please report security issues or vulnerabilities to the [Intel® Security Center]. 6 | 7 | For more information on how Intel® works to resolve security issues, see 8 | [Vulnerability Handling Guidelines]. 9 | 10 | [Intel® Security Center]:https://www.intel.com/content/www/us/en/security-center/default.html 11 | 12 | [Vulnerability Handling Guidelines]:https://www.intel.com/content/www/us/en/security-center/vulnerability-handling-guidelines.html -------------------------------------------------------------------------------- /transfer_learning/tensorflow/resnet50/training/chart/.helmignore: -------------------------------------------------------------------------------- 1 | # Patterns to ignore when building packages. 2 | # This supports shell glob matching, relative path matching, and 3 | # negation (prefixed with !). Only one pattern per line. 4 | .DS_Store 5 | # Common VCS dirs 6 | .git/ 7 | .gitignore 8 | .bzr/ 9 | .bzrignore 10 | .hg/ 11 | .hgignore 12 | .svn/ 13 | # Common backup files 14 | *.swp 15 | *.bak 16 | *.tmp 17 | *.orig 18 | *~ 19 | # Various IDEs 20 | .project 21 | .idea/ 22 | *.tmproj 23 | .vscode/ 24 | -------------------------------------------------------------------------------- /language_modeling/pytorch/bert_base/inference/Makefile: -------------------------------------------------------------------------------- 1 | AZURE_CONFIG_FILE ?= $$(pwd)/config.json 2 | FINAL_IMAGE_NAME ?= nlp-azure 3 | FP32_TRAINED_MODEL ?= $$(pwd)/../training/azureml/notebooks/fp32_model_output 4 | 5 | nlp-azure: 6 | mkdir -p ./azureml/notebooks/fp32_model_output && cp -r ${FP32_TRAINED_MODEL} ./azureml/notebooks/ 7 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 8 | AZURE_CONFIG_FILE=${AZURE_CONFIG_FILE} \ 9 | docker compose up nlp-azure --build 10 | 11 | clean: 12 | docker compose down 13 | rm -rf ./azureml/notebooks/fp32_model_output 14 | -------------------------------------------------------------------------------- /transfer_learning/tensorflow/resnet50/inference/Makefile: -------------------------------------------------------------------------------- 1 | CHECKPOINT_DIR ?= /output/colorectal 2 | DATASET_DIR ?= /data 3 | FINAL_IMAGE_NAME ?= vision-transfer-learning 4 | OUTPUT_DIR ?= /output 5 | PLATFORM ?= None 6 | PRECISION ?= FP32 7 | SCRIPT ?= colorectal 8 | 9 | vision-transfer-learning: 10 | @CHECKPOINT_DIR=${CHECKPOINT_DIR} \ 11 | DATASET_DIR=${DATASET_DIR} \ 12 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 13 | OUTPUT_DIR=${OUTPUT_DIR} \ 14 | PLATFORM=${PLATFORM} \ 15 | PRECISION=${PRECISION} \ 16 | SCRIPT=${SCRIPT} \ 17 | docker compose up vision-transfer-learning --build 18 | 19 | clean: 20 | docker compose down 21 | -------------------------------------------------------------------------------- /analytics/classical-ml/recsys/training/Makefile: -------------------------------------------------------------------------------- 1 | DATASET_DIR ?= /data/recsys2021 2 | FINAL_IMAGE_NAME ?= recsys-challenge 3 | OUTPUT_DIR ?= /output 4 | 5 | recsys-challenge: 6 | ./analytics-with-python/hadoop-folder-prep.sh . 7 | if ! docker network inspect hadoop ; then \ 8 | docker network create --driver=bridge hadoop; \ 9 | fi 10 | @DATASET_DIR=${DATASET_DIR} \ 11 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 12 | OUTPUT_DIR=${OUTPUT_DIR} \ 13 | docker compose up recsys-challenge --build 14 | 15 | clean: 16 | sudo rm -rf tmp 17 | docker network rm hadoop 18 | DATASET_DIR=${DATASET_DIR} CONFIG_DIR=${CONFIG_DIR} docker compose down 19 | -------------------------------------------------------------------------------- /.github/linters/.markdownlint.json: -------------------------------------------------------------------------------- 1 | { 2 | "MD001": false, 3 | "MD004": false, 4 | "MD007": false, 5 | "MD009": false, 6 | "MD010": false, 7 | "MD012": false, 8 | "MD013": false, 9 | "MD014": false, 10 | "MD022": false, 11 | "MD023": false, 12 | "MD024": false, 13 | "MD025": false, 14 | "MD026": false, 15 | "MD028": false, 16 | "MD030": false, 17 | "MD031": false, 18 | "MD032": false, 19 | "MD033": false, 20 | "MD034": false, 21 | "MD036": false, 22 | "MD038": false, 23 | "MD039": false, 24 | "MD040": false, 25 | "MD046": false, 26 | "MD047": false, 27 | "MD050": false, 28 | "MD051": false, 29 | "MD052": false 30 | } 31 | -------------------------------------------------------------------------------- /big-data/friesian/training/Makefile: -------------------------------------------------------------------------------- 1 | DATASET_DIR ?= /dataset 2 | FINAL_IMAGE_NAME ?= friesian-training 3 | MODEL_OUTPUT ?= /model_output 4 | 5 | friesian-training: 6 | wget https://labs.criteo.com/wp-content/uploads/2015/04/dac_sample.tar.gz 7 | tar -xvzf dac_sample.tar.gz 8 | mkdir -p ${DATASET_DIR}/data-csv 9 | mv dac_sample.txt ${DATASET_DIR}/data-csv/day_0.csv 10 | rm dac_sample.tar.gz 11 | @DATASET_DIR=${DATASET_DIR} \ 12 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 13 | MODEL_OUTPUT=${MODEL_OUTPUT} \ 14 | docker compose up friesian-training --build 15 | 16 | clean: 17 | @DATASET_DIR=${DATASET_DIR} \ 18 | OUTPUT_DIR=${MODEL_OUTPUT} \ 19 | docker compose down 20 | -------------------------------------------------------------------------------- /analytics/tensorflow/ssd_resnet34/inference/Makefile: -------------------------------------------------------------------------------- 1 | FINAL_IMAGE_NAME ?= vdms-video-streamer 2 | OUTPUT_DIR ?= /output 3 | VIDEO_PATH ?= $$(pwd)/classroom.mp4 4 | MODEL_DIR ?= $$(pwd)/models 5 | VIDEO = $(shell basename ${VIDEO_PATH}) 6 | 7 | vdms: 8 | numactl --physcpubind=51-55 --membind=1 docker compose up -d vdms 9 | 10 | video-streamer: vdms 11 | mkdir -p ./video-streamer/models && cp -r ${MODEL_DIR}/* ./video-streamer/models 12 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 13 | OUTPUT_DIR=${OUTPUT_DIR} \ 14 | VIDEO=${VIDEO} \ 15 | VIDEO_PATH=${VIDEO_PATH} \ 16 | docker compose up video-streamer --build 17 | 18 | clean: 19 | docker compose down 20 | rm -rf ./video-streamer/models ${VIDEO} 21 | -------------------------------------------------------------------------------- /language_modeling/pytorch/bert_large/training/chart/values.yaml: -------------------------------------------------------------------------------- 1 | metadata: 2 | name: document-level-sentiment-analysis 3 | 4 | proxy: nil 5 | 6 | dataset: 7 | type: nfs 8 | logsKey: nil 9 | datasetKey: sst 10 | nfs: 11 | server: nil 12 | path: nil 13 | subPath: nil 14 | readOnly: true 15 | 16 | workflow: 17 | dataset: sst2 18 | model: bert-large-uncased 19 | num_nodes: 2 20 | process_per_node: 2 21 | ref: v1.0.0 22 | repo: https://github.com/intel/document-level-sentiment-analysis 23 | 24 | volumeClaimTemplates: 25 | workspace: 26 | resources: 27 | requests: 28 | storage: 2Gi 29 | output_dir: 30 | resources: 31 | requests: 32 | storage: 1Gi 33 | 34 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | analytics/classical-ml/recsys/training/analytics-with-python 2 | analytics/classical-ml/synthetic/inference/wafer-insights 3 | analytics/tensorflow/ssd_resnet34/inference/video-streamer 4 | big-data/aiok-ray/training/AIOK_Ray/ 5 | big-data/aiok-ray/inference/AIOK_Ray/ 6 | big-data/ppml/chronos 7 | big-data/friesian/training/BigDL 8 | big-data/ppml/bigDL-ppml 9 | classification/tensorflow/bert_base/inference/aws_sagemaker 10 | language_modeling/pytorch/bert_base/inference/azureml 11 | language_modeling/pytorch/bert_base/training/azureml 12 | language_modeling/pytorch/bert_large/training/dlsa 13 | transfer_learning/tensorflow/resnet50/inference/transfer-learning-inference 14 | transfer_learning/tensorflow/resnet50/training/transfer-learning-training 15 | -------------------------------------------------------------------------------- /transfer_learning/tensorflow/resnet50/training/chart/values.yaml: -------------------------------------------------------------------------------- 1 | metadata: 2 | name: vision-transfer-learning 3 | 4 | proxy: nil 5 | workflow: 6 | ref: v1.0.1 7 | repo: https://github.com/intel/vision-based-transfer-learning-and-inference 8 | dataset_dir: /data 9 | platform: None 10 | precision: FP32 11 | script: colorectal 12 | batch_size: 32 13 | num_epochs: 100 14 | 15 | dataset: 16 | type: s3 17 | logsKey: nil 18 | s3: 19 | datasetKey: resisc45 20 | nfs: 21 | readOnly: true 22 | server: nil 23 | path: nil 24 | 25 | sidecars: 26 | image: ubuntu:20.04 27 | 28 | volumeClaimTemplates: 29 | workspace: 30 | resources: 31 | requests: 32 | storage: 2Gi 33 | output_dir: 34 | resources: 35 | requests: 36 | storage: 4Gi 37 | 38 | -------------------------------------------------------------------------------- /template/docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | : 3 | build: 4 | args: 5 | : ${} 6 | : ${} 7 | http_proxy: ${http_proxy} 8 | https_proxy: ${https_proxy} 9 | no_proxy: ${no_proxy} 10 | dockerfile: Dockerfile. 11 | command: /workspace//.sh ${} 12 | environment: 13 | - ${}=${} 14 | - ${}=${} 15 | - http_proxy=${http_proxy} 16 | - https_proxy=${https_proxy} 17 | - no_proxy=${no_proxy} 18 | image: ${FINAL_IMAGE_NAME}:- 19 | privileged: true 20 | volumes: 21 | - ${OUTPUT_DIR}:${OUTPUT_DIR} 22 | - ./:/workspace/ 23 | working_dir: /workspace/ 24 | -------------------------------------------------------------------------------- /language_modeling/pytorch/bert_base/training/docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | nlp-azure: 3 | build: 4 | args: 5 | http_proxy: ${http_proxy} 6 | https_proxy: ${https_proxy} 7 | no_proxy: ${no_proxy} 8 | dockerfile: ./azureml/Dockerfile 9 | command: sh -c "jupyter nbconvert --to python 1.0-intel-azureml-training.ipynb && python3 1.0-intel-azureml-training.py" 10 | environment: 11 | - http_proxy=${http_proxy} 12 | - https_proxy=${https_proxy} 13 | - no_proxy=${no_proxy} 14 | image: ${FINAL_IMAGE_NAME}:training-ubuntu-20.04 15 | network_mode: "host" 16 | privileged: true 17 | volumes: 18 | - ./azureml/notebooks:/root/notebooks 19 | - ./azureml/src:/root/src 20 | - /${AZURE_CONFIG_FILE}:/root/config.json 21 | working_dir: /root/notebooks 22 | -------------------------------------------------------------------------------- /language_modeling/pytorch/bert_base/inference/docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | nlp-azure: 3 | build: 4 | args: 5 | http_proxy: ${http_proxy} 6 | https_proxy: ${https_proxy} 7 | no_proxy: ${no_proxy} 8 | dockerfile: ./azureml/Dockerfile 9 | command: sh -c "jupyter nbconvert --to python 1.0-intel-azureml-inference.ipynb && python3 1.0-intel-azureml-inference.py" 10 | environment: 11 | - http_proxy=${http_proxy} 12 | - https_proxy=${https_proxy} 13 | - no_proxy=${no_proxy} 14 | image: ${FINAL_IMAGE_NAME}:inference-ubuntu-20.04 15 | network_mode: "host" 16 | privileged: true 17 | volumes: 18 | - ./azureml/notebooks:/root/notebooks 19 | - ./azureml/src:/root/src 20 | - /${AZURE_CONFIG_FILE}:/root/notebooks/config.json 21 | working_dir: /root/notebooks 22 | -------------------------------------------------------------------------------- /protein-folding/pytorch/alphafold2/inference/Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: protein-prediction 2 | DATASET_DIR ?= /dataset 3 | EXPERIMENT_NAME ?= testing 4 | FINAL_IMAGE_NAME ?= protein-structure-prediction 5 | MODEL ?= model_1 6 | OUTPUT_DIR ?= /output 7 | 8 | protein-prediction: 9 | mkdir -p '${OUTPUT_DIR}/weights/extracted' '${OUTPUT_DIR}/logs' '${OUTPUT_DIR}/samples' '${OUTPUT_DIR}/experiments/${EXPERIMENT_NAME}' 10 | curl -o ${OUTPUT_DIR}/samples/sample.fa https://rest.uniprot.org/uniprotkb/Q6UWK7.fasta 11 | @EXPERIMENT_NAME=${EXPERIMENT_NAME} \ 12 | DATASET_DIR=${DATASET_DIR} \ 13 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 14 | MODEL=${MODEL} \ 15 | OUTPUT_DIR=${OUTPUT_DIR} \ 16 | docker compose up protein-prediction-inference --build 17 | 18 | clean: 19 | @DATASET_DIR=${DATASET_DIR} \ 20 | OUTPUT_DIR=${OUTPUT_DIR} \ 21 | docker compose down 22 | -------------------------------------------------------------------------------- /big-data/chronos/docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | chronos: 3 | build: 4 | args: 5 | http_proxy: ${http_proxy} 6 | https_proxy: ${https_proxy} 7 | no_proxy: ${no_proxy} 8 | dockerfile: Dockerfile.chronos 9 | command: sh -c "jupyter nbconvert --to python chronos_nyc_taxi_tsdataset_forecaster.ipynb && \ 10 | sed '26,40d' chronos_nyc_taxi_tsdataset_forecaster.py > chronos_taxi_forecaster.py && \ 11 | python chronos_taxi_forecaster.py" 12 | environment: 13 | - http_proxy=${http_proxy} 14 | - https_proxy=${https_proxy} 15 | - no_proxy=${no_proxy} 16 | image: ${FINAL_IMAGE_NAME}:training-ubuntu-20.04 17 | network_mode: "host" 18 | privileged: true 19 | volumes: 20 | - ./BigDL:/workspace/BigDL 21 | working_dir: /workspace/BigDL/python/chronos/colab-notebook 22 | -------------------------------------------------------------------------------- /language_modeling/pytorch/bert_large/training/docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | hugging-face-dlsa: 3 | build: 4 | args: 5 | http_proxy: ${http_proxy} 6 | https_proxy: ${https_proxy} 7 | no_proxy: ${no_proxy} 8 | dockerfile: Dockerfile.hugging-face-dlsa 9 | command: fine-tuning/run_dist.sh -np ${NUM_NODES} -ppn ${PROCESS_PER_NODE} fine-tuning/run_ipex_native.sh 10 | environment: 11 | - DATASET=${DATASET} 12 | - MODEL_NAME_OR_PATH=${MODEL} 13 | - OUTPUT_DIR=${OUTPUT_DIR}/fine_tuned 14 | - http_proxy=${http_proxy} 15 | - https_proxy=${https_proxy} 16 | - no_proxy=${no_proxy} 17 | image: ${FINAL_IMAGE_NAME}:training-intel-optimized-pytorch-1.12.100-oneccl-inc 18 | privileged: true 19 | volumes: 20 | - ${OUTPUT_DIR}:${OUTPUT_DIR} 21 | - ./dlsa:/workspace/dlsa 22 | working_dir: /workspace/dlsa/profiling-transformers 23 | -------------------------------------------------------------------------------- /analytics/classical-ml/synthetic/inference/docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | wafer-insight: 3 | build: 4 | args: 5 | http_proxy: ${http_proxy} 6 | https_proxy: ${https_proxy} 7 | no_proxy: ${no_proxy} 8 | dockerfile: Dockerfile.wafer-insights 9 | command: 10 | - | 11 | conda run -n WI python src/loaders/synthetic_loader/loader.py 12 | conda run --no-capture-output -n WI python src/dashboard/app.py 13 | entrypoint: ["/bin/bash", "-c"] 14 | environment: 15 | - PYTHONPATH=$PYTHONPATH:$PWD 16 | - http_proxy=${http_proxy} 17 | - https_proxy=${https_proxy} 18 | - no_proxy=${no_proxy} 19 | image: ${FINAL_IMAGE_NAME}:inference-ubuntu-20.04 20 | ports: 21 | - 8050:8050 22 | privileged: true 23 | volumes: 24 | - ${OUTPUT_DIR}:/data 25 | - ./wafer-insights:/workspace/wafer-insights 26 | working_dir: /workspace/wafer-insights 27 | -------------------------------------------------------------------------------- /classification/tensorflow/bert_base/inference/docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | nlp-sagemaker: 3 | build: 4 | context: ./aws_sagemaker/ 5 | args: 6 | http_proxy: ${http_proxy} 7 | https_proxy: ${https_proxy} 8 | no_proxy: ${no_proxy} 9 | dockerfile: ./Dockerfile 10 | command: sh -c "jupyter nbconvert --to python 1.0-intel-sagemaker-inference.ipynb && python3 1.0-intel-sagemaker-inference.py" 11 | environment: 12 | - http_proxy=${http_proxy} 13 | - https_proxy=${https_proxy} 14 | - no_proxy=${no_proxy} 15 | - AWS_PROFILE=${AWS_PROFILE} 16 | image: ${FINAL_IMAGE_NAME}:inference-ubuntu-20.04 17 | network_mode: "host" 18 | privileged: true 19 | volumes: 20 | - ${OUTPUT_DIR}:${OUTPUT_DIR} 21 | - ./aws_sagemaker/notebooks:/root/notebooks 22 | - ./aws_sagemaker/src:/root/src 23 | - ./aws_data/.aws:/root/.aws 24 | working_dir: /root/notebooks 25 | -------------------------------------------------------------------------------- /classification/tensorflow/bert_base/inference/Makefile: -------------------------------------------------------------------------------- 1 | AWS_CSV_FILE ?= credentials.csv 2 | AWS_DATA=$$(pwd)/aws_data 3 | FINAL_IMAGE_NAME ?= nlp-sagemaker 4 | OUTPUT_DIR ?= /output 5 | ROLE ?= role 6 | S3_MODEL_URI ?= link 7 | 8 | export AWS_PROFILE := $(shell cat ${AWS_CSV_FILE} | awk -F',' 'NR==2{print $$1}') 9 | export REGION ?= us-west-2 10 | 11 | nlp-sagemaker: 12 | ./aws_sagemaker/scripts/setup.sh aws_sagemaker/ 13 | mkdir -p ${AWS_DATA} && cp -r ${HOME}/.aws ${AWS_DATA}/.aws/ 14 | @AWS_PROFILE=${AWS_PROFILE} \ 15 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 16 | OUTPUT_DIR=${OUTPUT_DIR} \ 17 | docker compose up --build nlp-sagemaker 18 | clean: 19 | if [ -d ${AWS_DATA} ]; then \ 20 | rm -rf ${AWS_DATA}; \ 21 | fi; \ 22 | if [ -d aws/ ]; then \ 23 | rm -rf aws/; \ 24 | fi; \ 25 | if [ -d aws-cli/ ]; then \ 26 | rm -rf aws-cli/; \ 27 | fi; \ 28 | if [ -f awscliv2.zip ]; then \ 29 | rm -f awscliv2.zip; \ 30 | fi 31 | docker compose down 32 | -------------------------------------------------------------------------------- /transfer_learning/tensorflow/resnet50/training/docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | vision-transfer-learning: 3 | build: 4 | args: 5 | http_proxy: ${http_proxy} 6 | https_proxy: ${https_proxy} 7 | no_proxy: ${no_proxy} 8 | dockerfile: Dockerfile.vision-transfer-learning 9 | command: conda run --no-capture-output -n transfer_learning ./${SCRIPT}.sh 10 | environment: 11 | - BATCH_SIZE=${BATCH_SIZE} 12 | - DATASET_DIR=/workspace/data 13 | - OUTPUT_DIR=${OUTPUT_DIR}/${SCRIPT} 14 | - NUM_EPOCHS=${NUM_EPOCHS} 15 | - PLATFORM=${PLATFORM} 16 | - PRECISION=${PRECISION} 17 | - http_proxy=${http_proxy} 18 | - https_proxy=${https_proxy} 19 | - no_proxy=${no_proxy} 20 | image: ${FINAL_IMAGE_NAME}:training-ubuntu-20.04 21 | privileged: true 22 | volumes: 23 | - /${DATASET_DIR}:/workspace/data 24 | - ${OUTPUT_DIR}:${OUTPUT_DIR} 25 | - ./transfer-learning-training:/workspace/transfer-learning 26 | working_dir: /workspace/transfer-learning 27 | -------------------------------------------------------------------------------- /transfer_learning/tensorflow/resnet50/inference/docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | vision-transfer-learning: 3 | build: 4 | args: 5 | http_proxy: ${http_proxy} 6 | https_proxy: ${https_proxy} 7 | no_proxy: ${no_proxy} 8 | dockerfile: Dockerfile.vision-transfer-learning 9 | command: conda run --no-capture-output -n transfer_learning ./${SCRIPT}.sh --inference -cp "/workspace/checkpoint" 10 | environment: 11 | - DATASET_DIR=/workspace/data 12 | - OUTPUT_DIR=${OUTPUT_DIR}/${SCRIPT} 13 | - PLATFORM=${PLATFORM} 14 | - PRECISION=${PRECISION} 15 | - http_proxy=${http_proxy} 16 | - https_proxy=${https_proxy} 17 | - no_proxy=${no_proxy} 18 | image: ${FINAL_IMAGE_NAME}:inference-ubuntu-20.04 19 | privileged: true 20 | volumes: 21 | - /${CHECKPOINT_DIR}:/workspace/checkpoint 22 | - /${DATASET_DIR}:/workspace/data 23 | - ${OUTPUT_DIR}:${OUTPUT_DIR} 24 | - ./transfer-learning-inference:/workspace/transfer-learning 25 | working_dir: /workspace/transfer-learning 26 | -------------------------------------------------------------------------------- /big-data/aiok-ray/training/Makefile: -------------------------------------------------------------------------------- 1 | DATASET_DIR ?= ./data 2 | FINAL_IMAGE_NAME ?= recommendation-ray 3 | OUTPUT_DIR ?= /output 4 | RUN_MODE ?= kaggle 5 | DOCKER_NETWORK_NAME = ray-training 6 | 7 | recommendation-ray: 8 | if [ ! -d "AIOK_Ray/dlrm_all/dlrm/dlrm" ]; then \ 9 | CWD=${PWD}; \ 10 | cd AIOK_Ray/; \ 11 | sh dlrm_all/dlrm/patch_dlrm.sh; \ 12 | cd ${CWD}; \ 13 | fi 14 | @wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.12.0-Linux-x86_64.sh \ 15 | -P AIOK_Ray/Dockerfile-ubuntu18.04/ \ 16 | -O AIOK_Ray/Dockerfile-ubuntu18.04/miniconda.sh 17 | if [ ! "$(shell docker network ls | grep ${DOCKER_NETWORK_NAME})" ]; then \ 18 | docker network create --driver=bridge ${DOCKER_NETWORK_NAME}; \ 19 | fi 20 | @DATASET_DIR=${DATASET_DIR} \ 21 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 22 | OUTPUT_DIR=${OUTPUT_DIR} \ 23 | RUN_MODE=${RUN_MODE} \ 24 | docker compose up recommendation-ray --build 25 | 26 | clean: 27 | docker network rm ${DOCKER_NETWORK_NAME} 28 | OUTPUT_DIR=${OUTPUT_DIR} DATASET_DIR=${DATASET_DIR} docker compose down 29 | sudo rm -rf ${OUTPUT_DIR} 30 | -------------------------------------------------------------------------------- /big-data/aiok-ray/inference/Makefile: -------------------------------------------------------------------------------- 1 | DATASET_DIR ?= ./data 2 | FINAL_IMAGE_NAME ?= recommendation-ray 3 | CHECKPOINT_DIR ?= /output 4 | RUN_MODE ?= kaggle 5 | DOCKER_NETWORK_NAME = ray-inference 6 | 7 | recommendation-ray: 8 | if [ ! -d "AIOK_Ray/dlrm_all/dlrm/dlrm" ]; then \ 9 | CWD=${PWD}; \ 10 | cd AIOK_Ray/; \ 11 | sh dlrm_all/dlrm/patch_dlrm.sh; \ 12 | cd ${CWD}; \ 13 | fi 14 | @wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.12.0-Linux-x86_64.sh \ 15 | -P AIOK_Ray/Dockerfile-ubuntu18.04/ \ 16 | -O AIOK_Ray/Dockerfile-ubuntu18.04/miniconda.sh 17 | if [ ! "$(shell docker network ls | grep ${DOCKER_NETWORK_NAME})" ]; then \ 18 | docker network create --driver=bridge ${DOCKER_NETWORK_NAME} ; \ 19 | fi 20 | @DATASET_DIR=${DATASET_DIR} \ 21 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 22 | CHECKPOINT_DIR=${CHECKPOINT_DIR} \ 23 | RUN_MODE=${RUN_MODE} \ 24 | docker compose up recommendation-ray --build 25 | 26 | clean: 27 | docker network rm ${DOCKER_NETWORK_NAME} 28 | DATASET_DIR=${DATASET_DIR} OUTPUT_DIR=${OUTPUT_DIR} docker compose down 29 | sudo rm -rf ${OUTPUT_DIR} 30 | -------------------------------------------------------------------------------- /big-data/aiok-ray/training/docker-compose.yml: -------------------------------------------------------------------------------- 1 | networks: 2 | ray-training: 3 | external: true 4 | services: 5 | recommendation-ray: 6 | build: 7 | args: 8 | http_proxy: ${http_proxy} 9 | https_proxy: ${https_proxy} 10 | no_proxy: ${no_proxy} 11 | dockerfile: DockerfilePytorch 12 | context: AIOK_Ray/Dockerfile-ubuntu18.04 13 | command: 14 | - /bin/bash 15 | - -c 16 | - | 17 | bash $$APP_DIR/scripts/run_train_docker.sh $RUN_MODE 18 | container_name: ray-training 19 | hostname: ray 20 | networks: 21 | - ray-training 22 | environment: 23 | - http_proxy=${http_proxy} 24 | - https_proxy=${https_proxy} 25 | - no_proxy=${no_proxy} 26 | - RUN_MODE=${RUN_MODE} 27 | - APP_DIR=/home/vmagent/app/e2eaiok 28 | - OUTPUT_DIR=/output 29 | image: ${FINAL_IMAGE_NAME}:training-ubuntu-18.04 30 | privileged: true 31 | devices: 32 | - /dev/dri 33 | volumes: 34 | - ${DATASET_DIR}:/home/vmagent/app/dataset/criteo 35 | - ./AIOK_Ray:/home/vmagent/app/e2eaiok 36 | - ${OUTPUT_DIR}:/output 37 | working_dir: /home/vmagent/app/e2eaiok/dlrm_all/dlrm/ 38 | shm_size: 300g 39 | -------------------------------------------------------------------------------- /big-data/aiok-ray/inference/docker-compose.yml: -------------------------------------------------------------------------------- 1 | networks: 2 | ray-inference: 3 | external: true 4 | services: 5 | recommendation-ray: 6 | build: 7 | args: 8 | http_proxy: ${http_proxy} 9 | https_proxy: ${https_proxy} 10 | no_proxy: ${no_proxy} 11 | dockerfile: DockerfilePytorch 12 | context: AIOK_Ray/Dockerfile-ubuntu18.04 13 | command: 14 | - /bin/bash 15 | - -c 16 | - | 17 | bash $$APP_DIR/scripts/run_inference_docker.sh $RUN_MODE 18 | container_name: ray-inference 19 | hostname: ray 20 | networks: 21 | - ray-inference 22 | environment: 23 | - http_proxy=${http_proxy} 24 | - https_proxy=${https_proxy} 25 | - no_proxy=${no_proxy} 26 | - RUN_MODE=${RUN_MODE} 27 | - APP_DIR=/home/vmagent/app/e2eaiok 28 | - OUTPUT_DIR=/output 29 | image: ${FINAL_IMAGE_NAME}:inference-ubuntu-18.04 30 | privileged: true 31 | devices: 32 | - /dev/dri 33 | volumes: 34 | - ${DATASET_DIR}:/home/vmagent/app/dataset/criteo 35 | - ./AIOK_Ray:/home/vmagent/app/e2eaiok 36 | - ${CHECKPOINT_DIR}:/output 37 | working_dir: /home/vmagent/app/e2eaiok/dlrm_all/dlrm/ 38 | shm_size: 300g 39 | -------------------------------------------------------------------------------- /analytics/classical-ml/recsys/training/docker-compose.yml: -------------------------------------------------------------------------------- 1 | networks: 2 | hadoop: 3 | external: true 4 | services: 5 | recsys-challenge: 6 | build: 7 | args: 8 | http_proxy: ${http_proxy} 9 | https_proxy: ${https_proxy} 10 | no_proxy: ${no_proxy} 11 | dockerfile: analytics-with-python/Dockerfile 12 | command: 13 | - | 14 | service ssh start 15 | /mnt/code/run-all.sh 16 | container_name: hadoop-master 17 | environment: 18 | - http_proxy=${http_proxy} 19 | - https_proxy=${https_proxy} 20 | - no_proxy=${no_proxy} 21 | entrypoint: ["/bin/bash", "-c"] 22 | hostname: hadoop-master 23 | image: ${FINAL_IMAGE_NAME}:training-python-3.7-buster 24 | networks: 25 | - hadoop 26 | ports: 27 | - 8088:8088 28 | - 8888:8888 29 | - 8080:8080 30 | - 9870:9870 31 | - 9864:9864 32 | - 4040:4040 33 | - 18081:18081 34 | - 12345:12345 35 | privileged: true 36 | volumes: 37 | - ${OUTPUT_DIR}:${OUTPUT_DIR} 38 | - /${DATASET_DIR}:/mnt/data 39 | - ./tmp:/mnt 40 | - ./analytics-with-python/config:/mnt/config 41 | - ./analytics-with-python:/mnt/code 42 | working_dir: /mnt/code 43 | -------------------------------------------------------------------------------- /language_modeling/pytorch/bert_large/training/Dockerfile.hugging-face-dlsa: -------------------------------------------------------------------------------- 1 | ARG BASE_IMAGE_NAME="intel/intel-optimized-pytorch" 2 | ARG BASE_IMAGE_TAG="1.12.100-oneccl-inc" 3 | # Inherit IPEX 4 | FROM ${BASE_IMAGE_NAME}:${BASE_IMAGE_TAG} 5 | 6 | ENV DEBIAN_FRONTEND=noninteractive 7 | 8 | RUN apt-get update && apt-get install --no-install-recommends --fix-missing -y \ 9 | ca-certificates \ 10 | git \ 11 | libgomp1 \ 12 | numactl \ 13 | patch \ 14 | wget \ 15 | mpich 16 | # Default Workspace 17 | RUN mkdir -p /workspace 18 | 19 | ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib 20 | # Install reqs via pip for transformers 21 | RUN pip install --upgrade pip && \ 22 | pip install astunparse \ 23 | cffi \ 24 | cmake \ 25 | dataclasses \ 26 | datasets==2.3.2 \ 27 | future \ 28 | impi-rt \ 29 | mkl \ 30 | mkl-include \ 31 | ninja \ 32 | numpy \ 33 | pyyaml \ 34 | requests \ 35 | setuptools \ 36 | six \ 37 | transformers==4.20.1 \ 38 | typing_extensions 39 | 40 | ENV CCL_ATL_TRANSPORT=ofi 41 | -------------------------------------------------------------------------------- /transfer_learning/tensorflow/resnet50/training/chart/Chart.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v2 2 | name: vision-transfer-learning 3 | description: A Helm chart for Kubernetes 4 | # A chart can be either an 'application' or a 'library' chart. 5 | # 6 | # Application charts are a collection of templates that can be packaged into versioned archives 7 | # to be deployed. 8 | # 9 | # Library charts provide useful utilities or functions for the chart developer. They're included as 10 | # a dependency of application charts to inject those utilities and functions into the rendering 11 | # pipeline. Library charts do not define any templates and therefore cannot be deployed. 12 | type: application 13 | # This is the chart version. This version number should be incremented each time you make changes 14 | # to the chart and its templates, including the app version. 15 | # Versions are expected to follow Semantic Versioning (https://semver.org/) 16 | version: 0.1.0 17 | # This is the version number of the application being deployed. This version number should be 18 | # incremented each time you make changes to the application. Versions are not expected to 19 | # follow Semantic Versioning. They should reflect the version the application is using. 20 | # It is recommended to use it with quotes. 21 | appVersion: "0.1.0" 22 | -------------------------------------------------------------------------------- /language_modeling/pytorch/bert_large/training/chart/Chart.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v2 2 | name: document-level-sentiment-analysis 3 | description: A Helm chart for Kubernetes 4 | # A chart can be either an 'application' or a 'library' chart. 5 | # 6 | # Application charts are a collection of templates that can be packaged into versioned archives 7 | # to be deployed. 8 | # 9 | # Library charts provide useful utilities or functions for the chart developer. They're included as 10 | # a dependency of application charts to inject those utilities and functions into the rendering 11 | # pipeline. Library charts do not define any templates and therefore cannot be deployed. 12 | type: application 13 | 14 | # This is the chart version. This version number should be incremented each time you make changes 15 | # to the chart and its templates, including the app version. 16 | # Versions are expected to follow Semantic Versioning (https://semver.org/) 17 | version: 0.1.0 18 | 19 | # This is the version number of the application being deployed. This version number should be 20 | # incremented each time you make changes to the application. Versions are not expected to 21 | # follow Semantic Versioning. They should reflect the version the application is using. 22 | # It is recommended to use it with quotes. 23 | appVersion: "1.16.0" 24 | -------------------------------------------------------------------------------- /analytics/tensorflow/ssd_resnet34/inference/docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | vdms: 3 | image: vuiseng9/intellabs-vdms:demo-191220 4 | network_mode: "host" 5 | ports: 6 | - "55555:55555" 7 | ## Base 8 | video-streamer: 9 | build: 10 | args: 11 | http_proxy: ${http_proxy} 12 | https_proxy: ${https_proxy} 13 | no_proxy: ${no_proxy} 14 | dockerfile: Dockerfile.video-streamer 15 | command: sh -c "./benchmark.sh && cp -r ../*.txt ${OUTPUT_DIR}" 16 | depends_on: 17 | - vdms 18 | environment: 19 | - OUTPUT_DIR=${OUTPUT_DIR} 20 | - VIDEO_FILE=/workspace/video-streamer/${VIDEO} 21 | - VIDEO_PATH=${VIDEO_PATH} 22 | - http_proxy=${http_proxy} 23 | - https_proxy=${https_proxy} 24 | - no_proxy=${no_proxy} 25 | healthcheck: 26 | test: netstat -lnpt | grep 55555 || exit 1 27 | interval: 10s 28 | timeout: 5s 29 | retries: 5 30 | image: ${FINAL_IMAGE_NAME}:inference-centos-8 31 | network_mode: "host" 32 | ports: 33 | - "55555:55555" 34 | privileged: true 35 | volumes: 36 | - ./video-streamer:/workspace/video-streamer 37 | - /${OUTPUT_DIR}:${OUTPUT_DIR} 38 | - /${VIDEO_PATH}:/workspace/video-streamer/${VIDEO} 39 | working_dir: /workspace/video-streamer 40 | -------------------------------------------------------------------------------- /big-data/chronos/Dockerfile.chronos: -------------------------------------------------------------------------------- 1 | FROM ubuntu:20.04 2 | ARG DEBIAN_FRONTEND=noninteractive 3 | 4 | SHELL ["/bin/bash", "-c"] 5 | ENV LANG C.UTF-8 6 | 7 | RUN apt-get update --fix-missing && \ 8 | apt-get install -y apt-utils vim curl nano wget unzip git && \ 9 | apt-get install -y gcc g++ make && \ 10 | apt-get install -y libsm6 libxext6 libxrender-dev && \ 11 | apt-get install -y openjdk-8-jre && \ 12 | rm /bin/sh && \ 13 | ln -sv /bin/bash /bin/sh && \ 14 | echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \ 15 | chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \ 16 | # Install Miniconda 17 | wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.12.0-Linux-x86_64.sh && \ 18 | chmod +x Miniconda3-py37_4.12.0-Linux-x86_64.sh && \ 19 | ./Miniconda3-py37_4.12.0-Linux-x86_64.sh -b -f -p /usr/local && \ 20 | rm Miniconda3-py37_4.12.0-Linux-x86_64.sh 21 | 22 | ENV PATH /usr/local/envs/chronos/bin:$PATH 23 | 24 | RUN conda create -y -n chronos python=3.7 setuptools=58.0.4 && source activate chronos && \ 25 | pip install --no-cache-dir --pre --upgrade bigdl-chronos[pytorch,automl] matplotlib notebook==6.4.12 && \ 26 | pip uninstall -y torchtext 27 | 28 | RUN echo "source activate chronos" > ~/.bashrc 29 | RUN echo "export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/" >> ~/.bashrc 30 | -------------------------------------------------------------------------------- /analytics/classical-ml/synthetic/inference/Dockerfile.wafer-insights: -------------------------------------------------------------------------------- 1 | ARG BASE_IMAGE_NAME="ubuntu" 2 | ARG BASE_IMAGE_TAG="20.04" 3 | # Inherit <___> 4 | FROM ${BASE_IMAGE_NAME}:${BASE_IMAGE_TAG} 5 | 6 | ENV DEBIAN_FRONTEND=noninteractive 7 | 8 | RUN apt-get update && apt-get install --no-install-recommends --fix-missing -y \ 9 | ca-certificates \ 10 | git \ 11 | wget 12 | # Default Workspace 13 | RUN mkdir -p /workspace 14 | 15 | SHELL ["/bin/bash", "-c"] 16 | ARG CONDA_INSTALL_PATH=/opt/conda 17 | ARG MINICONDA_VERSION="latest" 18 | # Miniconda Installation 19 | RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -O miniconda.sh && \ 20 | bash miniconda.sh -b -p ${CONDA_INSTALL_PATH} && \ 21 | $CONDA_INSTALL_PATH/bin/conda clean -ya && \ 22 | rm miniconda.sh && \ 23 | ln -s ${CONDA_INSTALL_PATH}/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \ 24 | echo ". ${CONDA_INSTALL_PATH}/etc/profile.d/conda.sh" >> ~/.bashrc && \ 25 | echo "conda activate WI" >> ~/.bashrc 26 | 27 | ARG PYTHON_VERSION="3.9" 28 | ENV PATH="${CONDA_INSTALL_PATH}/bin:${PATH}" 29 | # Create Conda Environment + Install reqs via conda 30 | RUN conda create -yn WI python=${PYTHON_VERSION} && \ 31 | source activate WI && \ 32 | conda install -y scikit-learn pandas pyarrow && \ 33 | pip install dash colorlover && \ 34 | conda clean -ya 35 | -------------------------------------------------------------------------------- /transfer_learning/tensorflow/resnet50/training/Makefile: -------------------------------------------------------------------------------- 1 | BATCH_SIZE ?= 32 2 | DATASET_DIR ?= /workspace/data 3 | FINAL_IMAGE_NAME ?= vision-transfer-learning 4 | NAMESPACE ?= argo 5 | NUM_EPOCHS ?= 100 6 | OUTPUT_DIR ?= /output 7 | PLATFORM ?= None 8 | PRECISION ?= FP32 9 | SCRIPT ?= colorectal 10 | 11 | vision-transfer-learning: 12 | @BATCH_SIZE=${BATCH_SIZE} \ 13 | DATASET_DIR=${DATASET_DIR} \ 14 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 15 | NUM_EPOCHS=${NUM_EPOCHS} \ 16 | OUTPUT_DIR=${OUTPUT_DIR} \ 17 | PLATFORM=${PLATFORM} \ 18 | PRECISION=${PRECISION} \ 19 | SCRIPT=${SCRIPT} \ 20 | docker compose up vision-transfer-learning --build 21 | 22 | argo-single-node: 23 | helm install \ 24 | --namespace ${NAMESPACE} \ 25 | --set proxy=${http_proxy} \ 26 | --set workflow.batch_size=${BATCH_SIZE} \ 27 | --set workflow.num_epochs=${NUM_EPOCHS} \ 28 | --set workflow.platform=${PLATFORM} \ 29 | --set workflow.precision=${PRECISION} \ 30 | --set workflow.dataset_dir=${DATASET_DIR} \ 31 | --set workflow.script=${SCRIPT} \ 32 | ${FINAL_IMAGE_NAME} ./chart 33 | argo submit --from wftmpl/${FINAL_IMAGE_NAME} --namespace=${NAMESPACE} 34 | 35 | workflow-log: 36 | argo logs @latest -f -c output-log 37 | 38 | clean: 39 | docker compose down 40 | 41 | helm-clean: 42 | kubectl delete wftmpl ${FINAL_IMAGE_NAME} --namespace=${NAMESPACE} 43 | helm uninstall ${FINAL_IMAGE_NAME} --namespace=${NAMESPACE} 44 | -------------------------------------------------------------------------------- /language_modeling/pytorch/bert_large/training/Makefile: -------------------------------------------------------------------------------- 1 | DATASET ?= sst2 2 | DATASET_DIR ?= /data 3 | FINAL_IMAGE_NAME ?= document-level-sentiment-analysis 4 | MODEL ?= bert-large-uncased 5 | NAMESPACE ?= argo 6 | NUM_NODES ?= 2 7 | OUTPUT_DIR ?= /output 8 | PROCESS_PER_NODE ?= 2 9 | 10 | hugging-face-dlsa: 11 | mkdir ./dlsa/profiling-transformers/datasets && cp -r ${DATASET_DIR} ./dlsa/profiling-transformers/datasets 12 | @DATASET=${DATASET} \ 13 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 14 | MODEL=${MODEL} \ 15 | NUM_NODES=${NUM_NODES} \ 16 | OUTPUT_DIR=${OUTPUT_DIR} \ 17 | PROCESS_PER_NODE=${PROCESS_PER_NODE} \ 18 | docker compose up hugging-face-dlsa --build 19 | rm -rf ./dlsa/profiling-transformers/datasets 20 | 21 | argo-single-node: 22 | helm install \ 23 | --namespace ${NAMESPACE} \ 24 | --set proxy=${http_proxy} \ 25 | --set workflow.dataset=${DATASET} \ 26 | --set workflow.num_nodes=${NUM_NODES} \ 27 | --set workflow.model=${MODEL} \ 28 | --set workflow.process_per_node=${PROCESS_PER_NODE} \ 29 | ${FINAL_IMAGE_NAME} ./chart 30 | argo submit --from wftmpl/${FINAL_IMAGE_NAME} --namespace=${NAMESPACE} 31 | 32 | workflow-log: 33 | argo logs @latest -f 34 | 35 | clean: 36 | rm -rf ./dlsa/profiling-transformers/datasets 37 | docker compose down 38 | 39 | helm-clean: 40 | kubectl delete wftmpl ${FINAL_IMAGE_NAME} --namespace=${NAMESPACE} 41 | helm uninstall ${FINAL_IMAGE_NAME} --namespace=${NAMESPACE} 42 | -------------------------------------------------------------------------------- /.github/workflows/linter.yml: -------------------------------------------------------------------------------- 1 | --- 2 | ################################# 3 | ################################# 4 | ## Super Linter GitHub Actions ## 5 | ################################# 6 | ################################# 7 | name: Markdown Linter 8 | 9 | ############################# 10 | # Start the job on all push # 11 | ############################# 12 | on: 13 | pull_request 14 | 15 | ############### 16 | # Set the Job # 17 | ############### 18 | jobs: 19 | build: 20 | # Name the Job 21 | name: Markdown Linter 22 | # Set the agent to run on 23 | runs-on: [Linux, k8-runners] 24 | 25 | ################## 26 | # Load all steps # 27 | ################## 28 | steps: 29 | ########################## 30 | # Checkout the code base # 31 | ########################## 32 | - name: Checkout Code 33 | uses: actions/checkout@v3 34 | with: 35 | # Full git history is needed to get a proper 36 | # list of changed files within `super-linter` 37 | fetch-depth: 0 38 | 39 | ################################ 40 | # Run Linter against code base # 41 | ################################ 42 | - name: Lint Code 43 | uses: github/super-linter@v4 44 | env: 45 | VALIDATE_ALL_CODEBASE: false 46 | VALIDATE_MARKDOWN: true 47 | DEFAULT_BRANCH: develop 48 | LINTER_RULES_PATH: .github/linters 49 | MARKDOWN_CONFIG_FILE: .markdownlint.json 50 | GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 51 | -------------------------------------------------------------------------------- /template/Dockerfile.PIPELINE_NAME: -------------------------------------------------------------------------------- 1 | ARG BASE_IMAGE_NAME="ubuntu" 2 | ARG BASE_IMAGE_TAG="20.04" 3 | # Inherit <___> 4 | FROM ${BASE_IMAGE_NAME}:${BASE_IMAGE_TAG} 5 | 6 | ENV DEBIAN_FRONTEND=noninteractive 7 | 8 | RUN apt-get update && apt-get install --no-install-recommends --fix-missing -y \ 9 | ca-certificates \ 10 | git \ 11 | \ 12 | 13 | # Default Workspace 14 | RUN mkdir -p /workspace 15 | 16 | SHELL ["/bin/bash", "-c"] 17 | # Install reqs via pip 18 | RUN pip install --upgrade pip && \ 19 | pip install \ 20 | \ 21 | 22 | ### OR 23 | ARG CONDA_INSTALL_PATH=/opt/conda 24 | ARG MINICONDA_VERSION="latest" 25 | # Miniconda Installation 26 | RUN apt-get update && \ 27 | wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -O miniconda.sh && \ 28 | bash miniconda.sh -b -p ${CONDA_INSTALL_PATH} && \ 29 | rm miniconda.sh && \ 30 | ln -s ${CONDA_INSTALL_PATH}/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \ 31 | echo ". ${CONDA_INSTALL_PATH}/etc/profile.d/conda.sh" >> ~/.bashrc && \ 32 | echo "conda activate " >> ~/.bashrc 33 | 34 | ARG PYTHON_VERSION="3.9" 35 | ENV PATH="${CONDA_INSTALL_PATH}/bin:${PATH}" 36 | # Create Conda Environment + Install reqs via conda 37 | RUN conda create -yn python=${PYTHON_VERSION} && \ 38 | source activate && \ 39 | conda install -y -c conda-forge && \ 40 | conda install -y && \ 41 | conda clean -ya && \ 42 | -------------------------------------------------------------------------------- /big-data/friesian/training/Dockerfile.friesian-training: -------------------------------------------------------------------------------- 1 | ARG BASE_IMAGE_NAME="ubuntu" 2 | ARG BASE_IMAGE_TAG="20.04" 3 | 4 | FROM ${BASE_IMAGE_NAME}:${BASE_IMAGE_TAG} 5 | 6 | ENV DEBIAN_FRONTEND=noninteractive 7 | 8 | RUN apt-get update && apt-get install --no-install-recommends --fix-missing -y \ 9 | ca-certificates \ 10 | vim \ 11 | wget 12 | 13 | RUN wget --no-check-certificate -q https://repo.huaweicloud.com/java/jdk/8u201-b09/jdk-8u201-linux-x64.tar.gz && \ 14 | tar -zxvf jdk-8u201-linux-x64.tar.gz && \ 15 | mv jdk1.8.0_201 /opt/jdk1.8.0_201 && \ 16 | rm jdk-8u201-linux-x64.tar.gz 17 | 18 | SHELL ["/bin/bash", "-c"] 19 | 20 | ARG CONDA_INSTALL_PATH=/opt/conda 21 | ARG MINICONDA_VERSION="latest" 22 | 23 | RUN apt-get update && \ 24 | wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -O miniconda.sh && \ 25 | bash miniconda.sh -b -p ${CONDA_INSTALL_PATH} && \ 26 | $CONDA_INSTALL_PATH/bin/conda clean -ya && \ 27 | rm miniconda.sh && \ 28 | ln -s ${CONDA_INSTALL_PATH}/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \ 29 | echo ". ${CONDA_INSTALL_PATH}/etc/profile.d/conda.sh" >> ~/.bashrc && \ 30 | echo "conda activate bigdl" >> ~/.bashrc 31 | 32 | ARG PYTHON_VERSION="3.7.5" 33 | ENV JAVA_HOME=/opt/jdk1.8.0_201 34 | ENV JRE_HOME=$JAVA_HOME/jre 35 | ENV PATH="${CONDA_INSTALL_PATH}/bin:${PATH}:/opt/conda/envs/bigdl/bin:$JAVA_HOME/bin:$JRE_HOME" 36 | 37 | RUN conda create -yn bigdl python=${PYTHON_VERSION} && \ 38 | source activate bigdl && \ 39 | conda update -y -n base -c defaults conda && \ 40 | conda clean -ya && \ 41 | pip install tensorflow==2.9.0 \ 42 | --pre --upgrade bigdl-friesian[train] 43 | 44 | RUN mkdir -p /workspace 45 | -------------------------------------------------------------------------------- /transfer_learning/tensorflow/resnet50/training/chart/README.md: -------------------------------------------------------------------------------- 1 | # Vision Transfer Learning 2 | 3 | ![Version: 0.1.0](https://img.shields.io/badge/Version-0.1.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 0.1.0](https://img.shields.io/badge/AppVersion-0.1.0-informational?style=flat-square) 4 | 5 | A Helm chart for Kubernetes 6 | 7 | ## Values 8 | 9 | | Key | Type | Default | Description | 10 | |-----|------|---------|-------------| 11 | | dataset.nfs.path | string | `"nil"` | | 12 | | dataset.nfs.readOnly | bool | `true` | | 13 | | dataset.nfs.server | string | `"nil"` | | 14 | | dataset.s3.datasetKey | string | `"resisc45"` | path in bucket to dataset | 15 | | dataset.logsKey | string | `"nil"` | path to store log outputs | 16 | | dataset.type | string | `"s3"` | either `s3` or `nfs` | 17 | | kubernetesClusterDomain | string | `"cluster.local"` | | 18 | | metadata.name | string | `"vision-transfer-learning"` | | 19 | | proxy | string | `"nil"` | | 20 | | sidecars.image | string | `"ubuntu:20.04"` | | 21 | | volumeClaimTemplates.output_dir.resources.requests.storage | string | `"4Gi"` | | 22 | | volumeClaimTemplates.workspace.resources.requests.storage | string | `"2Gi"` | | 23 | | workflow.batch_size | int | `32` | | 24 | | workflow.dataset_dir | string | `"/data"` | Placeholder for local dataset directory supplied by s3 or nfs | 25 | | workflow.num_epochs | int | `100` | | 26 | | workflow.platform | string | `"None"` | | 27 | | workflow.precision | string | `"FP32"` | | 28 | | workflow.ref | string | `"v1.0.1"` | | 29 | | workflow.repo | string | `"https://github.com/intel/vision-based-transfer-learning-and-inference"` | | 30 | | workflow.script | string | `"colorectal"` | | 31 | 32 | -------------------------------------------------------------------------------- /language_modeling/pytorch/bert_large/training/chart/README.md: -------------------------------------------------------------------------------- 1 | # Document Level Sentiment Analysis 2 | 3 | ![Version: 0.1.0](https://img.shields.io/badge/Version-0.1.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.16.0](https://img.shields.io/badge/AppVersion-1.16.0-informational?style=flat-square) 4 | 5 | A Helm chart for Kubernetes 6 | 7 | ## Values 8 | 9 | | Key | Type | Default | Description | 10 | |-----|------|---------|-------------| 11 | | dataset.datasetKey | string | `"sst"` | path to dataset; either `sst` or `aclImdb` | 12 | | dataset.logsKey | string | `"sst"` | path to save output logs | 13 | | dataset.nfs.path | string | `"nil"` | path to nfs share | 14 | | dataset.nfs.readOnly | bool | `true` | | 15 | | dataset.nfs.server | string | `"nil"` | | 16 | | dataset.nfs.subPath | string | `"nil"` | subpath to dataset directory in nfs share | 17 | | dataset.type | string | `"nfs"` | either `nfs` or `s3` | 18 | | metadata.name | string | `"document-level-sentiment-analysis"` | | 19 | | proxy | string | `"nil"` | | 20 | | volumeClaimTemplates.output_dir.resources.requests.storage | string | `"1Gi"` | | 21 | | volumeClaimTemplates.workspace.resources.requests.storage | string | `"2Gi"` | | 22 | | workflow.dataset | string | `"sst2"` | `sst2` for the `sst` dataset or `imdb` for the `aclImdb` dataset | 23 | | workflow.model | string | `"bert-large-uncased"` | Model name on Hugging Face | 24 | | workflow.num_nodes | int | `2` | \# of Nodes | 25 | | workflow.process_per_node | int | `2` | \# of Instances Per Node | 26 | | workflow.ref | string | `"v1.0.0"` | | 27 | | workflow.repo | string | `"https://github.com/intel/document-level-sentiment-analysis"` | | 28 | 29 | ---------------------------------------------- 30 | Autogenerated from chart metadata using [helm-docs v1.11.0](https://github.com/norwoodj/helm-docs/releases/v1.11.0) 31 | -------------------------------------------------------------------------------- /transfer_learning/tensorflow/resnet50/training/Dockerfile.vision-transfer-learning: -------------------------------------------------------------------------------- 1 | ARG BASE_IMAGE_TAG="20.04" 2 | # Inherit Python3 3 | FROM ubuntu:${BASE_IMAGE_TAG} 4 | 5 | ENV DEBIAN_FRONTEND=noninteractive 6 | 7 | RUN apt-get update && apt-get install --no-install-recommends --fix-missing -y \ 8 | build-essential \ 9 | ca-certificates \ 10 | git \ 11 | gcc \ 12 | numactl \ 13 | wget 14 | # Set Conda PATHs 15 | ARG CONDA_INSTALL_PATH=/opt/conda 16 | ARG CONDA_PREFIX=/opt/conda/envs/transfer_learning 17 | ARG MINICONDA_VERSION="latest" 18 | # Miniconda Installation 19 | RUN apt-get update && \ 20 | wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -O miniconda.sh && \ 21 | bash miniconda.sh -b -p ${CONDA_INSTALL_PATH} && \ 22 | rm miniconda.sh && \ 23 | ln -s ${CONDA_INSTALL_PATH}/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \ 24 | ${CONDA_INSTALL_PATH}/bin/conda clean --all && \ 25 | echo ". ${CONDA_INSTALL_PATH}/etc/profile.d/conda.sh" >> ~/.bashrc && \ 26 | echo "conda activate transfer_learning" >> ~/.bashrc 27 | # PATH prefers env over default conda 28 | ENV PATH="${CONDA_PREFIX}/bin:${CONDA_INSTALL_PATH}/bin:${PATH}" 29 | 30 | ARG PYTHON_VERSION="3.8" 31 | 32 | SHELL ["/bin/bash", "-c"] 33 | # Create env and install requirements via conda + pip 34 | RUN conda create -y -n transfer_learning python=${PYTHON_VERSION} && \ 35 | source activate transfer_learning && \ 36 | conda install -y -c conda-forge gperftools && \ 37 | conda install -y intel-openmp pip && \ 38 | conda clean -ya && \ 39 | pip install intel-tensorflow \ 40 | matplotlib \ 41 | Pillow \ 42 | scikit-learn \ 43 | tensorflow_datasets \ 44 | tensorflow_hub 45 | # Overwrite for newest GLIBCXX version 46 | ENV LD_LIBRARY_PATH="/opt/conda/envs/transfer_learning/lib/:${LD_LIBRARY_PATH}" 47 | # Default Workspace 48 | RUN mkdir -p /workspace/transfer-learning 49 | -------------------------------------------------------------------------------- /transfer_learning/tensorflow/resnet50/inference/Dockerfile.vision-transfer-learning: -------------------------------------------------------------------------------- 1 | ARG BASE_IMAGE_TAG="20.04" 2 | # Inherit Python3 3 | FROM ubuntu:${BASE_IMAGE_TAG} 4 | 5 | ENV DEBIAN_FRONTEND=noninteractive 6 | 7 | RUN apt-get update && apt-get install --no-install-recommends --fix-missing -y \ 8 | build-essential \ 9 | ca-certificates \ 10 | git \ 11 | gcc \ 12 | numactl \ 13 | wget 14 | # Set Conda PATHs 15 | ARG CONDA_INSTALL_PATH=/opt/conda 16 | ARG CONDA_PREFIX=/opt/conda/envs/transfer_learning 17 | ARG MINICONDA_VERSION="latest" 18 | # Miniconda Installation 19 | RUN apt-get update && \ 20 | wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -O miniconda.sh && \ 21 | bash miniconda.sh -b -p ${CONDA_INSTALL_PATH} && \ 22 | rm miniconda.sh && \ 23 | ln -s ${CONDA_INSTALL_PATH}/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \ 24 | ${CONDA_INSTALL_PATH}/bin/conda clean --all && \ 25 | echo ". ${CONDA_INSTALL_PATH}/etc/profile.d/conda.sh" >> ~/.bashrc && \ 26 | echo "conda activate transfer_learning" >> ~/.bashrc 27 | # PATH prefers env over default conda 28 | ENV PATH="${CONDA_PREFIX}/bin:${CONDA_INSTALL_PATH}/bin:${PATH}" 29 | 30 | ARG PYTHON_VERSION="3.8" 31 | 32 | SHELL ["/bin/bash", "-c"] 33 | # Create env and install requirements via conda + pip 34 | RUN conda create -y -n transfer_learning python=${PYTHON_VERSION} && \ 35 | source activate transfer_learning && \ 36 | conda install -y -c conda-forge gperftools && \ 37 | conda install -y intel-openmp pip && \ 38 | conda clean -ya && \ 39 | pip install intel-tensorflow \ 40 | matplotlib \ 41 | Pillow \ 42 | scikit-learn \ 43 | tensorflow_datasets \ 44 | tensorflow_hub 45 | # Overwrite for newest GLIBCXX version 46 | ENV LD_LIBRARY_PATH="/opt/conda/envs/transfer_learning/lib/:${LD_LIBRARY_PATH}" 47 | # Default Workspace 48 | RUN mkdir -p /workspace/transfer-learning 49 | -------------------------------------------------------------------------------- /transfer_learning/tensorflow/resnet50/training/chart/templates/_helpers.tpl: -------------------------------------------------------------------------------- 1 | {{/* 2 | Expand the name of the chart. 3 | */}} 4 | {{- define "demo.name" -}} 5 | {{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} 6 | {{- end }} 7 | 8 | {{/* 9 | Create a default fully qualified app name. 10 | We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec). 11 | If release name contains chart name it will be used as a full name. 12 | */}} 13 | {{- define "demo.fullname" -}} 14 | {{- if .Values.fullnameOverride }} 15 | {{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }} 16 | {{- else }} 17 | {{- $name := default .Chart.Name .Values.nameOverride }} 18 | {{- if contains $name .Release.Name }} 19 | {{- .Release.Name | trunc 63 | trimSuffix "-" }} 20 | {{- else }} 21 | {{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }} 22 | {{- end }} 23 | {{- end }} 24 | {{- end }} 25 | 26 | {{/* 27 | Create chart name and version as used by the chart label. 28 | */}} 29 | {{- define "demo.chart" -}} 30 | {{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} 31 | {{- end }} 32 | 33 | {{/* 34 | Common labels 35 | */}} 36 | {{- define "demo.labels" -}} 37 | helm.sh/chart: {{ include "demo.chart" . }} 38 | {{ include "demo.selectorLabels" . }} 39 | {{- if .Chart.AppVersion }} 40 | app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} 41 | {{- end }} 42 | app.kubernetes.io/managed-by: {{ .Release.Service }} 43 | {{- end }} 44 | 45 | {{/* 46 | Selector labels 47 | */}} 48 | {{- define "demo.selectorLabels" -}} 49 | app.kubernetes.io/name: {{ include "demo.name" . }} 50 | app.kubernetes.io/instance: {{ .Release.Name }} 51 | {{- end }} 52 | 53 | {{/* 54 | Create the name of the service account to use 55 | */}} 56 | {{- define "demo.serviceAccountName" -}} 57 | {{- if .Values.serviceAccount.create }} 58 | {{- default (include "demo.fullname" .) .Values.serviceAccount.name }} 59 | {{- else }} 60 | {{- default "default" .Values.serviceAccount.name }} 61 | {{- end }} 62 | {{- end }} 63 | -------------------------------------------------------------------------------- /language_modeling/pytorch/bert_large/training/chart/templates/_helpers.tpl: -------------------------------------------------------------------------------- 1 | {{/* 2 | Expand the name of the chart. 3 | */}} 4 | {{- define "chart.name" -}} 5 | {{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} 6 | {{- end }} 7 | 8 | {{/* 9 | Create a default fully qualified app name. 10 | We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec). 11 | If release name contains chart name it will be used as a full name. 12 | */}} 13 | {{- define "chart.fullname" -}} 14 | {{- if .Values.fullnameOverride }} 15 | {{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }} 16 | {{- else }} 17 | {{- $name := default .Chart.Name .Values.nameOverride }} 18 | {{- if contains $name .Release.Name }} 19 | {{- .Release.Name | trunc 63 | trimSuffix "-" }} 20 | {{- else }} 21 | {{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }} 22 | {{- end }} 23 | {{- end }} 24 | {{- end }} 25 | 26 | {{/* 27 | Create chart name and version as used by the chart label. 28 | */}} 29 | {{- define "chart.chart" -}} 30 | {{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} 31 | {{- end }} 32 | 33 | {{/* 34 | Common labels 35 | */}} 36 | {{- define "chart.labels" -}} 37 | helm.sh/chart: {{ include "chart.chart" . }} 38 | {{ include "chart.selectorLabels" . }} 39 | {{- if .Chart.AppVersion }} 40 | app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} 41 | {{- end }} 42 | app.kubernetes.io/managed-by: {{ .Release.Service }} 43 | {{- end }} 44 | 45 | {{/* 46 | Selector labels 47 | */}} 48 | {{- define "chart.selectorLabels" -}} 49 | app.kubernetes.io/name: {{ include "chart.name" . }} 50 | app.kubernetes.io/instance: {{ .Release.Name }} 51 | {{- end }} 52 | 53 | {{/* 54 | Create the name of the service account to use 55 | */}} 56 | {{- define "chart.serviceAccountName" -}} 57 | {{- if .Values.serviceAccount.create }} 58 | {{- default (include "chart.fullname" .) .Values.serviceAccount.name }} 59 | {{- else }} 60 | {{- default "default" .Values.serviceAccount.name }} 61 | {{- end }} 62 | {{- end }} 63 | -------------------------------------------------------------------------------- /protein-folding/pytorch/alphafold2/inference/Dockerfile.protein-prediction: -------------------------------------------------------------------------------- 1 | ARG BASE_IMAGE_NAME="ubuntu" 2 | ARG BASE_IMAGE_TAG="20.04" 3 | 4 | FROM ${BASE_IMAGE_NAME}:${BASE_IMAGE_TAG} 5 | 6 | ENV DEBIAN_FRONTEND=noninteractive 7 | 8 | RUN apt-get update && apt-get install --no-install-recommends --fix-missing -y \ 9 | ca-certificates \ 10 | git \ 11 | numactl \ 12 | vim \ 13 | wget 14 | 15 | SHELL ["/bin/bash", "-c"] 16 | 17 | ARG CONDA_INSTALL_PATH=/opt/conda 18 | ARG MINICONDA_VERSION="latest" 19 | 20 | RUN apt-get update && \ 21 | wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -O miniconda.sh && \ 22 | bash miniconda.sh -b -p ${CONDA_INSTALL_PATH} && \ 23 | $CONDA_INSTALL_PATH/bin/conda clean -ya && \ 24 | rm miniconda.sh && \ 25 | ln -s ${CONDA_INSTALL_PATH}/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \ 26 | echo ". ${CONDA_INSTALL_PATH}/etc/profile.d/conda.sh" >> ~/.bashrc && \ 27 | echo "conda activate alphafold2" >> ~/.bashrc 28 | 29 | ARG PYTHON_VERSION="3.9.7" 30 | ENV PATH="${CONDA_INSTALL_PATH}/bin:${PATH}:/opt/conda/envs/alphafold2/bin" 31 | 32 | RUN conda create -yn alphafold2 python=${PYTHON_VERSION} && \ 33 | source activate alphafold2 && \ 34 | conda update -y -n base -c defaults conda && \ 35 | conda install -y -c intel python intelpython && \ 36 | conda install -y -c conda-forge openmm pdbfixer aria2 && \ 37 | conda install -y -c conda-forge -c bioconda hmmer kalign2 hhsuite && \ 38 | conda install -y -c pytorch pytorch cpuonly && \ 39 | conda install -y jemalloc && \ 40 | conda clean -ya && \ 41 | pip install absl-py \ 42 | biopython \ 43 | chex \ 44 | dm-haiku \ 45 | dm-tree \ 46 | immutabledict \ 47 | intel_extension_for_pytorch \ 48 | jax \ 49 | jaxlib \ 50 | joblib \ 51 | ml-collections \ 52 | numpy \ 53 | scipy \ 54 | tensorflow \ 55 | pandas \ 56 | psutil \ 57 | tqdm \ 58 | -f https://storage.googleapis.com/jax-releases/jax_releases.html 59 | 60 | RUN mkdir -p /workspace 61 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | ### License 4 | 5 | is licensed under the terms in [LICENSE]. By contributing to the project, you agree to the license and copyright terms therein and release your contribution under these terms. 6 | 7 | ### Sign your work 8 | 9 | Please use the sign-off line at the end of the patch. Your signature certifies that you wrote the patch or otherwise have the right to pass it on as an open-source patch. The rules are pretty simple: if you can certify 10 | the below (from [developercertificate.org](http://developercertificate.org/)): 11 | 12 | ``` 13 | Developer Certificate of Origin 14 | Version 1.1 15 | 16 | Copyright (C) 2004, 2006 The Linux Foundation and its contributors. 17 | 660 York Street, Suite 102, 18 | San Francisco, CA 94110 USA 19 | 20 | Everyone is permitted to copy and distribute verbatim copies of this 21 | license document, but changing it is not allowed. 22 | 23 | Developer's Certificate of Origin 1.1 24 | 25 | By making a contribution to this project, I certify that: 26 | 27 | (a) The contribution was created in whole or in part by me and I 28 | have the right to submit it under the open source license 29 | indicated in the file; or 30 | 31 | (b) The contribution is based upon previous work that, to the best 32 | of my knowledge, is covered under an appropriate open source 33 | license and I have the right under that license to submit that 34 | work with modifications, whether created in whole or in part 35 | by me, under the same open source license (unless I am 36 | permitted to submit under a different license), as indicated 37 | in the file; or 38 | 39 | (c) The contribution was provided directly to me by some other 40 | person who certified (a), (b) or (c) and I have not modified 41 | it. 42 | 43 | (d) I understand and agree that this project and the contribution 44 | are public and that a record of the contribution (including all 45 | personal information I submit with it, including my sign-off) is 46 | maintained indefinitely and may be redistributed consistent with 47 | this project or the open source license(s) involved. 48 | ``` 49 | 50 | Then you just add a line to every git commit message: 51 | 52 | Signed-off-by: Joe Smith 53 | 54 | Use your real name (sorry, no pseudonyms or anonymous contributions.) 55 | 56 | If you set your `user.name` and `user.email` git configs, you can sign your 57 | commit automatically with `git commit -s`. -------------------------------------------------------------------------------- /analytics/tensorflow/ssd_resnet34/inference/Dockerfile.video-streamer: -------------------------------------------------------------------------------- 1 | FROM centos:8 AS centos-intel-base 2 | SHELL ["/bin/bash", "-c"] 3 | 4 | # Fixe for “Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist" 5 | RUN sed -i.bak '/^mirrorlist=/s/mirrorlist=/#mirrorlist=/g' /etc/yum.repos.d/CentOS-Linux-* && \ 6 | sed -i.bak 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-Linux-* && \ 7 | yum distro-sync -y && \ 8 | yum --disablerepo '*' --enablerepo=extras swap centos-linux-repos centos-stream-repos -y && \ 9 | yum distro-sync -y && \ 10 | yum clean all 11 | 12 | # See http://bugs.python.org/issue19846 13 | ENV LANG C.UTF-8 14 | ARG PY_VER="38" 15 | ARG MINICONDA_VER="py38_4.12.0-Linux-x86_64" 16 | 17 | RUN yum update -y && yum install -y \ 18 | git \ 19 | mesa-libGL \ 20 | net-tools \ 21 | numactl \ 22 | python${PY_VER} \ 23 | python${PY_VER}-pip \ 24 | wget \ 25 | which && \ 26 | yum clean all 27 | 28 | # Some TF tools expect a "python" binary 29 | RUN ln -sf $(which python3) /usr/local/bin/python && \ 30 | ln -sf $(which python3) /usr/local/bin/python3 && \ 31 | ln -sf $(which python3) /usr/bin/python 32 | 33 | RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-${MINICONDA_VER}.sh -O miniconda.sh && \ 34 | chmod +x miniconda.sh && \ 35 | ./miniconda.sh -b -p ~/conda && \ 36 | rm ./miniconda.sh && \ 37 | ln -s ~/conda ~/miniconda3 && \ 38 | ~/conda/bin/conda create -yn vdms-test python=3.8 && \ 39 | export PATH=~/conda/bin/:${PATH} && \ 40 | source activate vdms-test && \ 41 | python -m pip --no-cache-dir install --upgrade pip \ 42 | opencv-python==4.5.5.64 \ 43 | protobuf==3.20.1 \ 44 | pyyaml \ 45 | setuptools \ 46 | vdms \ 47 | wheel && \ 48 | conda clean --all 49 | 50 | ENV PATH ~/conda/bin/:${PATH} 51 | ENV LD_LIBRARY_PATH /lib64/:/usr/lib64/:/usr/local/lib64:/root/conda/envs/vdms-test/lib:${LD_LIBRARY_PATH} 52 | ENV BASH_ENV=/root/.bash_profile 53 | 54 | RUN source activate vdms-test && \ 55 | pip install tensorflow-cpu && \ 56 | conda install -y -c conda-forge gst-libav==1.18.4 gst-plugins-good=1.18.4 gst-plugins-bad=1.18.4 gst-plugins-ugly=1.18.4 gst-python=1.18.4 pygobject=3.40.1 && \ 57 | conda clean --all 58 | 59 | RUN echo "source ~/conda/etc/profile.d/conda.sh" >> /root/.bash_profile && \ 60 | echo "conda activate vdms-test" >> /root/.bash_profile 61 | 62 | RUN mkdir -p /workspace 63 | #WORKDIR /workspace 64 | 65 | RUN yum clean all 66 | -------------------------------------------------------------------------------- /template/TEMPLATE.md: -------------------------------------------------------------------------------- 1 | # - 2 | ## Description 3 | This document contains instructions on how to run pipelines with make and docker compose. 4 | ## Project Structure 5 | ``` 6 | ├── @ 7 | ├── DEVCATALOG.md 8 | ├── docker-compose.yml 9 | ├── Dockerfile. 10 | ├── Makefile 11 | └── README.md 12 | ``` 13 | [_Makefile_](Makefile) 14 | ``` 15 | ?= 16 | ?= 17 | ?= 18 | FINAL_IMAGE_NAME ?= 19 | OUTPUT_DIR ?= /output 20 | 21 | : 22 | @=${} \ 23 | =${} \ 24 | =${} \ 25 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 26 | OUTPUT_DIR=${OUTPUT_DIR} \ 27 | docker compose up --build 28 | 29 | clean: 30 | docker compose down 31 | ``` 32 | [_docker-compose.yml_](docker-compose.yml) 33 | ``` 34 | services: 35 | : 36 | build: 37 | args: 38 | : ${} 39 | : ${} 40 | http_proxy: ${http_proxy} 41 | https_proxy: ${https_proxy} 42 | no_proxy: ${no_proxy} 43 | dockerfile: Dockerfile. 44 | command: /workspace//.sh ${} 45 | environment: 46 | - ${}=${} 47 | - ${}=${} 48 | - http_proxy=${http_proxy} 49 | - https_proxy=${https_proxy} 50 | - no_proxy=${no_proxy} 51 | image: ${FINAL_IMAGE_NAME}:-- 52 | privileged: true 53 | volumes: 54 | - ${OUTPUT_DIR}:${OUTPUT_DIR} 55 | - ./:/workspace/ 56 | working_dir: /workspace/ 57 | ``` 58 | 59 | # 60 | End2End AI Workflow utilizing . More information [here]() 61 | 62 | ## Quick Start 63 | * Pull and configure the dependent repo submodule `git submodule update --init --recursive`. 64 | 65 | * Install [Pipeline Repository Dependencies](https://github.com/intel/ai-workflows/blob/main/pipelines/README.md) 66 | 67 | * Other variables: 68 | 69 | | Variable Name | Default | Notes | 70 | | --- | --- | --- | 71 | | FINAL_IMAGE_NAME | `` | Final Docker image name | 72 | | OUTPUT_DIR | `/output` | Output directory | 73 | | | `` | | 74 | | | `` | | 75 | | | `` | | 76 | ## Build and Run 77 | Build and Run with defaults: 78 | ``` 79 | make 80 | ``` 81 | ## Build and Run Example 82 | ``` 83 | 84 | ``` 85 | ... 86 | ``` 87 | 88 | ``` 89 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "analytics/classical-ml/recsys/training/analytics-with-python"] 2 | path = analytics/classical-ml/recsys/training/analytics-with-python 3 | url = https://github.com/intel/recommender-system-with-classical-ml/ 4 | [submodule "analytics/classical-ml/synthetic/inference/wafer-insights"] 5 | path = analytics/classical-ml/synthetic/inference/wafer-insights 6 | url = https://github.com/intel/wafer-insights-with-classical-ml 7 | [submodule "analytics/tensorflow/ssd_resnet34/inference/video-streamer"] 8 | path = analytics/tensorflow/ssd_resnet34/inference/video-streamer 9 | url = https://github.com/intel/video-streamer 10 | [submodule "big-data/chronos/BigDL"] 11 | path = big-data/chronos/BigDL 12 | url = https://github.com/intel-analytics/BigDL.git 13 | [submodule "big-data/friesian/training/BigDL"] 14 | path = big-data/friesian/training/BigDL 15 | url = https://github.com/intel-analytics/BigDL 16 | [submodule "big-data/ppml/bigDL-ppml"] 17 | path = big-data/ppml/bigDL-ppml 18 | url = https://github.com/intel-analytics/BigDL 19 | [submodule "classification/tensorflow/bert_base/inference/aws_sagemaker"] 20 | path = classification/tensorflow/bert_base/inference/aws_sagemaker 21 | url = https://github.com/intel/NLP-Workflow-with-AWS 22 | [submodule "language_modeling/pytorch/bert_large/training/dlsa"] 23 | path = language_modeling/pytorch/bert_large/training/dlsa 24 | url = https://github.com/intel/document-level-sentiment-analysis 25 | [submodule "language_modeling/pytorch/bert_base/inference/azureml"] 26 | path = language_modeling/pytorch/bert_base/inference/azureml 27 | url = https://github.com/intel/Intel-NLP-workflow-for-Azure-ML.git 28 | [submodule "language_modeling/pytorch/bert_base/training/azureml"] 29 | path = language_modeling/pytorch/bert_base/training/azureml 30 | url = https://github.com/intel/Intel-NLP-workflow-for-Azure-ML.git 31 | [submodule "protein-folding/pytorch/alphafold2/inference/protein-prediction"] 32 | path = protein-folding/pytorch/alphafold2/inference/protein-prediction 33 | url = https://github.com/IntelAI/models 34 | [submodule "transfer_learning/tensorflow/resnet50/inference/transfer-learning-inference"] 35 | path = transfer_learning/tensorflow/resnet50/inference/transfer-learning-inference 36 | url = https://github.com/intel/vision-based-transfer-learning-and-inference 37 | [submodule "transfer_learning/tensorflow/resnet50/training/transfer-learning-training"] 38 | path = transfer_learning/tensorflow/resnet50/training/transfer-learning-training 39 | url = https://github.com/intel/vision-based-transfer-learning-and-inference 40 | [submodule "big-data/aiok-ray/inference/AIOK_Ray"] 41 | path = big-data/aiok-ray/inference/AIOK_Ray 42 | url = https://github.com/intel/e2eAIOK/ 43 | [submodule "big-data/aiok-ray/training/AIOK_Ray"] 44 | path = big-data/aiok-ray/training/AIOK_Ray 45 | url = https://github.com/intel/e2eAIOK/ 46 | -------------------------------------------------------------------------------- /big-data/friesian/training/docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | csv-to-parquet: 3 | build: 4 | args: 5 | http_proxy: ${http_proxy} 6 | https_proxy: ${https_proxy} 7 | no_proxy: ${no_proxy} 8 | dockerfile: Dockerfile.friesian-training 9 | command: conda run -n bigdl --no-capture-output conda run -n bigdl --no-capture-output python3 csv_to_parquet.py --input /dataset/data-csv/day_0.csv --output /dataset/data-parquet/day_0.parquet 10 | environment: 11 | - DATASET_DIR=${DATASET_DIR} 12 | - MODEL_OUTPUT=${MODEL_OUTPUT} 13 | - http_proxy=${http_proxy} 14 | - https_proxy=${https_proxy} 15 | - no_proxy=${no_proxy} 16 | image: ${FINAL_IMAGE_NAME}:training-ubuntu-20.04 17 | privileged: true 18 | volumes: 19 | - ${DATASET_DIR}:/dataset 20 | - ${MODEL_OUTPUT}:/model 21 | - $PWD:/workspace 22 | working_dir: /workspace/BigDL/python/friesian/example/wnd 23 | preprocessing: 24 | build: 25 | args: 26 | http_proxy: ${http_proxy} 27 | https_proxy: ${https_proxy} 28 | no_proxy: ${no_proxy} 29 | dockerfile: Dockerfile.friesian-training 30 | command: conda run -n bigdl --no-capture-output python wnd_preprocessing.py --executor_cores 36 --executor_memory 50g --days 0-0 --input_folder /dataset/data-parquet --output_folder /dataset/data-processed --frequency_limit 15 --cross_sizes 10000,10000 31 | depends_on: 32 | csv-to-parquet: 33 | condition: service_completed_successfully 34 | environment: 35 | - DATASET_DIR=${DATASET_DIR} 36 | - MODEL_OUTPUT=${MODEL_OUTPUT} 37 | - http_proxy=${http_proxy} 38 | - https_proxy=${https_proxy} 39 | - no_proxy=${no_proxy} 40 | image: ${FINAL_IMAGE_NAME}:training-ubuntu-20.04 41 | privileged: true 42 | volumes: 43 | - ${DATASET_DIR}:/dataset 44 | - ${MODEL_OUTPUT}:/model 45 | - $PWD:/workspace 46 | working_dir: /workspace/BigDL/python/friesian/example/wnd 47 | friesian-training: 48 | build: 49 | args: 50 | http_proxy: ${http_proxy} 51 | https_proxy: ${https_proxy} 52 | no_proxy: ${no_proxy} 53 | dockerfile: Dockerfile.friesian-training 54 | command: conda run -n bigdl --no-capture-output python wnd_train.py --executor_cores 36 --executor_memory 50g --data_dir /dataset/data-processed --model_dir /model 55 | depends_on: 56 | preprocessing: 57 | condition: service_completed_successfully 58 | environment: 59 | - DATASET_DIR=${DATASET_DIR} 60 | - MODEL_OUTPUT=${MODEL_OUTPUT} 61 | - http_proxy=${http_proxy} 62 | - https_proxy=${https_proxy} 63 | - no_proxy=${no_proxy} 64 | image: ${FINAL_IMAGE_NAME}:training-ubuntu-20.04 65 | privileged: true 66 | volumes: 67 | - ${DATASET_DIR}:/dataset 68 | - ${MODEL_OUTPUT}:/model 69 | - $PWD:/workspace 70 | working_dir: /workspace/BigDL/python/friesian/example/wnd 71 | -------------------------------------------------------------------------------- /protein-folding/pytorch/alphafold2/inference/docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | param: 3 | build: 4 | args: 5 | http_proxy: ${http_proxy} 6 | https_proxy: ${https_proxy} 7 | no_proxy: ${no_proxy} 8 | dockerfile: Dockerfile.protein-prediction 9 | command: conda run -n alphafold2 --no-capture-output python extract_params.py --input /dataset/params/params_${MODEL}.npz --output_dir /output/weights/extracted/${MODEL} 10 | environment: 11 | - DATASET_DIR=${DATASET_DIR} 12 | - EXPERIMENT_NAME=${EXPERIMENT_NAME} 13 | - MODEL=${MODEL} 14 | - OUTPUT_DIR=${OUTPUT_DIR} 15 | - http_proxy=${http_proxy} 16 | - https_proxy=${https_proxy} 17 | - no_proxy=${no_proxy} 18 | image: ${FINAL_IMAGE_NAME}:inference-ubuntu-20.04 19 | privileged: true 20 | volumes: 21 | - ${DATASET_DIR}:/dataset 22 | - $PWD:/workspace 23 | - ${OUTPUT_DIR}:/output 24 | working_dir: /workspace/protein-prediction/models/aidd/pytorch/alphafold2/inference 25 | protein-prediction-preproc: 26 | build: 27 | args: 28 | http_proxy: ${http_proxy} 29 | https_proxy: ${https_proxy} 30 | no_proxy: ${no_proxy} 31 | dockerfile: Dockerfile.protein-prediction 32 | command: conda run -n alphafold2 --no-capture-output bash online_preproc_baremetal.sh /output /dataset /output/samples /output/experiments/${EXPERIMENT_NAME} 33 | depends_on: 34 | param: 35 | condition: service_completed_successfully 36 | environment: 37 | - DATASET_DIR=${DATASET_DIR} 38 | - EXPERIMENT_NAME=${EXPERIMENT_NAME} 39 | - MODEL=${MODEL} 40 | - OUTPUT_DIR=${OUTPUT_DIR} 41 | - http_proxy=${http_proxy} 42 | - https_proxy=${https_proxy} 43 | - no_proxy=${no_proxy} 44 | image: ${FINAL_IMAGE_NAME}:inference-ubuntu-20.04 45 | privileged: true 46 | volumes: 47 | - ${DATASET_DIR}:/dataset 48 | - $PWD:/workspace 49 | - ${OUTPUT_DIR}:/output 50 | working_dir: /workspace/protein-prediction/quickstart/aidd/pytorch/alphafold2/inference 51 | protein-prediction-inference: 52 | build: 53 | args: 54 | http_proxy: ${http_proxy} 55 | https_proxy: ${https_proxy} 56 | no_proxy: ${no_proxy} 57 | dockerfile: Dockerfile.protein-prediction 58 | command: conda run -n alphafold2 --no-capture-output bash online_inference_baremetal.sh /opt/conda/envs/alphafold2 /output /dataset /output/samples /output/experiments/${EXPERIMENT_NAME} ${MODEL} 59 | depends_on: 60 | protein-prediction-preproc: 61 | condition: service_completed_successfully 62 | environment: 63 | - DATASET_DIR=${DATASET_DIR} 64 | - EXPERIMENT_NAME=${EXPERIMENT_NAME} 65 | - MODEL=${MODEL} 66 | - OUTPUT_DIR=${OUTPUT_DIR} 67 | - http_proxy=${http_proxy} 68 | - https_proxy=${https_proxy} 69 | - no_proxy=${no_proxy} 70 | image: ${FINAL_IMAGE_NAME}:inference-ubuntu-20.04 71 | privileged: true 72 | volumes: 73 | - ${DATASET_DIR}:/dataset 74 | - $PWD:/workspace 75 | - ${OUTPUT_DIR}:/output 76 | working_dir: /workspace/protein-prediction/quickstart/aidd/pytorch/alphafold2/inference 77 | -------------------------------------------------------------------------------- /big-data/friesian/DEVCATALOG.md: -------------------------------------------------------------------------------- 1 | # Building Large-Scale End-to-End Recommendation Systems with BigDL Friesian 2 | 3 | ## Overview 4 | 5 | [BigDL Friesian](https://bigdl.readthedocs.io/en/latest/doc/Friesian/index.html) is an application framework for building optimized large-scale recommender solutions optimized on Intel Xeon. This workflow demonstrates how to use Friesian to easily build an end-to-end [Wide & Deep Learning](https://arxiv.org/abs/1606.07792) recommennder system on a real-world large dataset provided by Twitter. 6 | 7 | ## How it Works 8 | 9 | - Friesian provides various built-in distributed feature engineering operations and the distributed training of popular recommendation algorithms based on [BigDL Orca](https://bigdl.readthedocs.io/en/latest/doc/Orca/index.html) and Spark. 10 | - Friesian provides a complete, highly available and scalable pipeline for online serving (including recall and ranking) as well as nearline updates based on gRPC services. 11 | 12 | 13 | The overall architecture of Friesian is shown in the following diagram: 14 | 15 | 16 | 17 | 18 | ## Get Started 19 | 20 | ### Dataset Preparation 21 | You can download Twitter Recsys Challenge 2021 dataset from [here](https://recsys-twitter.com/data/show-downloads#). Or you can run the script [`generate_dummy_data.py`]([./generate_dummy_data.py](https://github.com/intel-analytics/BigDL/blob/main/apps/wide-deep-recommendation/generate_dummy_data.py)) to generate a dummy dataset. 22 | 23 | To run on a Kubernetes cluster, you may need to put the downloaded data to a shared volume. Please refer to [here](https://bigdl.readthedocs.io/en/latest/doc/Orca/Tutorial/k8s.html#load-data-from-network-file-systems-nfs) for more details. 24 | 25 | ### Docker 26 | 27 | - Please refer to [here](https://bigdl.readthedocs.io/en/latest/doc/Orca/Tutorial/k8s.html#pull-docker-image) for the docker image for BigDL on K8s. 28 | - Please refer to [here](https://bigdl.readthedocs.io/en/latest/doc/Orca/Tutorial/k8s.html#create-a-k8s-client-container) to create a client container for the Kubernetes cluster. 29 | 30 | ### Environment Preparation 31 | Please follow the steps [here](https://bigdl.readthedocs.io/en/latest/doc/Orca/Tutorial/k8s.html#prepare-environment) to prepare the Python environment on the client container. 32 | 33 | ### How to run 34 | 35 | - Please refer to [here](https://bigdl.readthedocs.io/en/latest/doc/Orca/Tutorial/k8s.html#run-jobs-on-k8s) to run the distributed feature engineering and training workload on a Kubernetes cluster. The scripts are [here](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/wnd/recsys2021). 36 | - Please refer to [here](https://github.com/intel-analytics/BigDL/tree/main/scala/friesian) to run the online serving workload. 37 | 38 | ## Recommended Hardware 39 | The hardware below is recommended for use with this reference implementation. 40 | 41 | - Intel® 4th Gen Xeon® Scalable Performance processors 42 | 43 | ## Learn More 44 | 45 | - Please check the notebooks [here](https://github.com/intel-analytics/BigDL/tree/main/apps/wide-deep-recommendation) for more detailed descriptions for distributed feature engineering and training. 46 | - Please check [here](https://bigdl.readthedocs.io/en/latest/doc/Friesian/examples.html) for more reference use cases. 47 | - Please check [here](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Friesian/index.html) for more detailed API documentations. 48 | 49 | ## Known Issues 50 | NA 51 | 52 | ## Troubleshooting 53 | NA 54 | 55 | ## Support Forum 56 | Please submit issues [here](https://github.com/intel-analytics/BigDL/issues) and we will track and respond to them daily. 57 | -------------------------------------------------------------------------------- /language_modeling/pytorch/bert_large/training/chart/templates/workflowTemplate.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: argoproj.io/v1alpha1 2 | kind: WorkflowTemplate 3 | metadata: 4 | name: {{ .Values.metadata.name }} 5 | labels: 6 | {{- include "chart.labels" . | nindent 4 }} 7 | spec: 8 | arguments: 9 | parameters: 10 | - name: ref 11 | value: {{ .Values.workflow.ref }} 12 | - name: repo 13 | value: {{ .Values.workflow.repo }} 14 | - name: http_proxy 15 | value: {{ .Values.proxy }} 16 | - enum: 17 | - sst2 18 | - imdb 19 | name: dataset 20 | value: {{ .Values.workflow.dataset }} 21 | - name: model 22 | value: {{ .Values.workflow.model }} 23 | - name: num_nodes 24 | value: {{ .Values.workflow.num_nodes }} 25 | - name: process_per_node 26 | value: {{ .Values.workflow.process_per_node }} 27 | entrypoint: main 28 | templates: 29 | - steps: 30 | - - name: git-clone 31 | template: git-clone 32 | - - name: hugging-face-dlsa-training 33 | template: hugging-face-dlsa-training 34 | name: main 35 | - container: 36 | args: 37 | - clone 38 | - -b 39 | - '{{"{{workflow.parameters.ref}}"}}' 40 | - '{{"{{workflow.parameters.repo}}"}}' 41 | - workspace 42 | command: 43 | - git 44 | env: 45 | - name: http_proxy 46 | value: '{{"{{workflow.parameters.http_proxy}}"}}' 47 | - name: https_proxy 48 | value: '{{"{{workflow.parameters.http_proxy}}"}}' 49 | image: intel/ai-workflows:document-level-sentiment-analysis 50 | volumeMounts: 51 | - mountPath: /workspace 52 | name: workspace 53 | workingDir: / 54 | name: git-clone 55 | - container: 56 | args: 57 | - '-c' 58 | - >- 59 | fine-tuning/run_dist.sh fine-tuning/run_ipex_native.sh 60 | # Keeping until multi-node support is added 61 | #-np '{{"{{workflow.parameters.num_nodes}}"}}' \ 62 | #-ppn '{{"{{workflow.parameters.process_per_node}}"}}' \ 63 | #fine-tuning/run_ipex_native.sh 64 | command: 65 | - sh 66 | env: 67 | - name: DATASET 68 | value: '{{"{{workflow.parameters.dataset}}"}}' 69 | - name: MODEL 70 | value: '{{"{{workflow.parameters.model}}"}}' 71 | - name: OUTPUT_DIR 72 | value: /output 73 | - name: http_proxy 74 | value: '{{"{{workflow.parameters.http_proxy}}"}}' 75 | - name: https_proxy 76 | value: '{{"{{workflow.parameters.http_proxy}}"}}' 77 | image: intel/ai-workflows:document-level-sentiment-analysis 78 | volumeMounts: 79 | - mountPath: /workspace 80 | name: workspace 81 | - mountPath: /output 82 | name: output-dir 83 | {{- if eq .Values.dataset.type "nfs" }} 84 | - mountPath: /workspace/profiling-transformers/datasets/{{ .Values.dataset.key }} 85 | name: dataset 86 | subPath: {{ .Values.dataset.nfs.subPath }} 87 | {{ end }} 88 | workingDir: /workspace/profiling-transformers 89 | {{- if eq .Values.dataset.type "s3" }} 90 | inputs: 91 | artifacts: 92 | - name: dataset 93 | path: /workspace/profiling-transformers/datasets/{{ .Values.dataset.key }} 94 | s3: 95 | key: {{ .Values.dataset.datasetKey }} 96 | {{ end }} 97 | name: hugging-face-dlsa-training 98 | outputs: 99 | artifacts: 100 | - name: checkpoint 101 | path: /output 102 | s3: 103 | key: {{ .Values.dataset.logsKey }} 104 | {{- if eq .Values.dataset.type "nfs" }} 105 | volumes: 106 | - name: dataset 107 | nfs: 108 | server: {{ .Values.dataset.nfs.server }} 109 | path: {{ .Values.dataset.nfs.path }} 110 | readOnly: {{ .Values.dataset.nfs.readOnly }} 111 | {{ end }} 112 | volumeClaimTemplates: 113 | - metadata: 114 | name: workspace 115 | name: workspace 116 | spec: 117 | accessModes: 118 | - ReadWriteOnce 119 | resources: 120 | requests: 121 | storage: {{ .Values.volumeClaimTemplates.workspace.resources.requests.storage }} 122 | - metadata: 123 | name: output-dir 124 | name: output-dir 125 | spec: 126 | accessModes: 127 | - ReadWriteOnce 128 | resources: 129 | requests: 130 | storage: {{ .Values.volumeClaimTemplates.output_dir.resources.requests.storage }} 131 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | PROJECT NOT UNDER ACTIVE MANAGEMENT 2 | 3 | This project will no longer be maintained by Intel. 4 | 5 | Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project. 6 | 7 | Intel no longer accepts patches to this project. 8 | 9 | If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project. 10 | 11 | Contact: webadmin@linux.intel.com 12 | # AI Workflows Infrastructure for Intel® Architecture 13 | 14 | ## Description 15 | On this page you will find details and instructions on how to set up an environment that supports Intel's AI Pipelines container build and test infrastructure. 16 | 17 | ## Dependency Requirements 18 | Only Linux systems are currently supported. Please make sure the following are installed in your package manager of choice: 19 | - `make` 20 | - `docker.io` 21 | 22 | A full installation of [docker engine](https://docs.docker.com/engine/install/) with docker CLI is required. The recommended docker engine version is `19.03.0+`. 23 | 24 | - `docker-compose` 25 | 26 | The Docker Compose CLI can be [installed](https://docs.docker.com/compose/install/compose-plugin/#installing-compose-on-linux-systems) both manually and via package manager. 27 | 28 | ``` 29 | $ DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker} 30 | $ mkdir -p $DOCKER_CONFIG/cli-plugins 31 | $ curl -SL https://github.com/docker/compose/releases/download/v2.7.0/docker-compose-linux-x86_64 -o $DOCKER_CONFIG/cli-plugins/docker-compose 32 | $ chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose 33 | 34 | $ docker compose version 35 | Docker Compose version v2.7.0 36 | ``` 37 | 38 | ## Build and Run Workflows 39 | Each pipeline will contain specific requirements and instructions for how to provide its specific dependencies and what customization options are possible. Generally, pipelines are run with the following format: 40 | 41 | ```git submodule update --init --recursive``` 42 | 43 | This will pull the dependent repo containing the scripts to run the end2end pipeline's inference and/or training. 44 | 45 | ```= ... = make ``` 46 | 47 | Where `KEY` and `VALUE` pairs are environment variables that can be used to customize both the pipeline's script options and the resulting container. For more information about the valid `KEY` and `VALUE` pairs, see the README.md file in the folder for each workflow container: 48 | 49 | |AI Workflow|Framework/Tool|Mode| 50 | |-|-|-| 51 | |Chronos Time Series Forecasting|Chronos and PyTorch*|[Training](./big-data/chronos/DEVCATALOG.md) 52 | |Document-Level Sentiment Analysis|PyTorch*|[Training](./language_modeling/pytorch/bert_large/training/)| 53 | |Friesian Recommendation System|Spark with TensorFlow|[Training](./big-data/friesian/training/) \| [Inference](./big-data/friesian/DEVCATALOG.md)| 54 | |Habana® Gaudi® Processor Training and Inference using OpenVINO™ Toolkit for U-Net 2D Model|OpenVINO™|[Training and Inference](https://github.com/intel/cv-training-and-inference-openvino/tree/v1.0.0/gaudi-segmentation-unet-ptq)| 55 | |Privacy Preservation|Spark with TensorFlow and PyTorch*|[Training and Inference](./big-data/ppml/DEVCATALOG.md)| 56 | |NLP workflow for AWS Sagemaker|TensorFlow and Jupyter|[Inference](./classification/tensorflow/bert_base/inference/)| 57 | |NLP workflow for Azure ML|PyTorch* and Jupyter|[Training](./language_modeling/pytorch/bert_base/training/) \| [Inference](./language_modeling/pytorch/bert_base/inference/)| 58 | |Protein Structure Prediction|PyTorch*|[Inference](./protein-folding/pytorch/alphafold2/inference/) 59 | |Quantization Aware Training and Inference|OpenVINO™|[Quantization Aware Training(QAT)](https://github.com/intel/nlp-training-and-inference-openvino/tree/v1.0/question-answering-bert-qat)| 60 | |Ray Recommendation System|Ray with PyTorch*|[Training](./big-data/aiok-ray/training/) \| [Inference](./big-data/aiok-ray/inference)| 61 | |RecSys Challenge Analytics With Python|Hadoop and Spark|[Training](./analytics/classical-ml/recsys/training/)| 62 | |Video Streamer|TensorFlow|[Inference](./analytics/tensorflow/ssd_resnet34/inference/)| 63 | |Vision Based Transfer Learning|TensorFlow|[Training](./transfer_learning/tensorflow/resnet50/training/) \| [Inference](./transfer_learning/tensorflow/resnet50/inference/)| 64 | |Wafer Insights|SKLearn|[Inference](./analytics/classical-ml/synthetic/inference/)| 65 | 66 | 67 | ### Cleanup 68 | Each pipeline can remove all resources allocated by executing `make clean`. 69 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | We as members, contributors, and leaders pledge to make participation in our 6 | community a harassment-free experience for everyone, regardless of age, body 7 | size, visible or invisible disability, ethnicity, sex characteristics, gender 8 | identity and expression, level of experience, education, socio-economic status, 9 | nationality, personal appearance, race, caste, color, religion, or sexual 10 | identity and orientation. 11 | 12 | We pledge to act and interact in ways that contribute to an open, welcoming, 13 | diverse, inclusive, and healthy community. 14 | 15 | ## Our Standards 16 | 17 | Examples of behavior that contributes to a positive environment for our 18 | community include: 19 | 20 | * Demonstrating empathy and kindness toward other people 21 | * Being respectful of differing opinions, viewpoints, and experiences 22 | * Giving and gracefully accepting constructive feedback 23 | * Accepting responsibility and apologizing to those affected by our mistakes, 24 | and learning from the experience 25 | * Focusing on what is best not just for us as individuals, but for the overall 26 | community 27 | 28 | Examples of unacceptable behavior include: 29 | 30 | * The use of sexualized language or imagery, and sexual attention or advances of 31 | any kind 32 | * Trolling, insulting or derogatory comments, and personal or political attacks 33 | * Public or private harassment 34 | * Publishing others' private information, such as a physical or email address, 35 | without their explicit permission 36 | * Other conduct which could reasonably be considered inappropriate in a 37 | professional setting 38 | 39 | ## Enforcement Responsibilities 40 | 41 | Community leaders are responsible for clarifying and enforcing our standards of 42 | acceptable behavior and will take appropriate and fair corrective action in 43 | response to any behavior that they deem inappropriate, threatening, offensive, 44 | or harmful. 45 | 46 | Community leaders have the right and responsibility to remove, edit, or reject 47 | comments, commits, code, wiki edits, issues, and other contributions that are 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation 49 | decisions when appropriate. 50 | 51 | ## Scope 52 | 53 | This Code of Conduct applies within all community spaces, and also applies when 54 | an individual is officially representing the community in public spaces. 55 | Examples of representing our community include using an official e-mail address, 56 | posting via an official social media account, or acting as an appointed 57 | representative at an online or offline event. 58 | 59 | ## Enforcement 60 | 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 62 | reported to the community leaders responsible for enforcement at 63 | CommunityCodeOfConduct AT intel DOT com. 64 | All complaints will be reviewed and investigated promptly and fairly. 65 | 66 | All community leaders are obligated to respect the privacy and security of the 67 | reporter of any incident. 68 | 69 | ## Enforcement Guidelines 70 | 71 | Community leaders will follow these Community Impact Guidelines in determining 72 | the consequences for any action they deem in violation of this Code of Conduct: 73 | 74 | ### 1. Correction 75 | 76 | **Community Impact**: Use of inappropriate language or other behavior deemed 77 | unprofessional or unwelcome in the community. 78 | 79 | **Consequence**: A private, written warning from community leaders, providing 80 | clarity around the nature of the violation and an explanation of why the 81 | behavior was inappropriate. A public apology may be requested. 82 | 83 | ### 2. Warning 84 | 85 | **Community Impact**: A violation through a single incident or series of 86 | actions. 87 | 88 | **Consequence**: A warning with consequences for continued behavior. No 89 | interaction with the people involved, including unsolicited interaction with 90 | those enforcing the Code of Conduct, for a specified period of time. This 91 | includes avoiding interactions in community spaces as well as external channels 92 | like social media. Violating these terms may lead to a temporary or permanent 93 | ban. 94 | 95 | ### 3. Temporary Ban 96 | 97 | **Community Impact**: A serious violation of community standards, including 98 | sustained inappropriate behavior. 99 | 100 | **Consequence**: A temporary ban from any sort of interaction or public 101 | communication with the community for a specified period of time. No public or 102 | private interaction with the people involved, including unsolicited interaction 103 | with those enforcing the Code of Conduct, is allowed during this period. 104 | Violating these terms may lead to a permanent ban. 105 | 106 | ### 4. Permanent Ban 107 | 108 | **Community Impact**: Demonstrating a pattern of violation of community 109 | standards, including sustained inappropriate behavior, harassment of an 110 | individual, or aggression toward or disparagement of classes of individuals. 111 | 112 | **Consequence**: A permanent ban from any sort of public interaction within the 113 | community. 114 | 115 | ## Attribution 116 | 117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], 118 | version 2.1, available at 119 | [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1]. 120 | 121 | Community Impact Guidelines were inspired by 122 | [Mozilla's code of conduct enforcement ladder][Mozilla CoC]. 123 | 124 | For answers to common questions about this code of conduct, see the FAQ at 125 | [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at 126 | [https://www.contributor-covenant.org/translations][translations]. 127 | 128 | [homepage]: https://www.contributor-covenant.org 129 | [v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html 130 | [Mozilla CoC]: https://github.com/mozilla/diversity 131 | [FAQ]: https://www.contributor-covenant.org/faq -------------------------------------------------------------------------------- /transfer_learning/tensorflow/resnet50/training/chart/templates/workflowTemplate.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: argoproj.io/v1alpha1 2 | kind: WorkflowTemplate 3 | metadata: 4 | name: {{ .Values.metadata.name }} 5 | labels: 6 | {{- include "demo.labels" . | nindent 4 }} 7 | spec: 8 | arguments: 9 | parameters: 10 | - name: ref 11 | value: {{ .Values.workflow.ref }} 12 | - name: repo 13 | value: {{ .Values.workflow.repo }} 14 | - name: dataset-dir 15 | value: {{ .Values.workflow.dataset_dir }} 16 | - enum: 17 | - SPR 18 | - None 19 | name: platform 20 | value: {{ .Values.workflow.platform }} 21 | - enum: 22 | - FP32 23 | - bf16 24 | name: precision 25 | value: {{ .Values.workflow.precision }} 26 | - enum: 27 | - colorectal 28 | - resisc 29 | - sports 30 | name: script 31 | value: {{ .Values.workflow.script }} 32 | - name: http_proxy 33 | value: {{ .Values.proxy }} 34 | - name: num-epochs 35 | value: {{ .Values.workflow.num_epochs }} 36 | - name: batch-size 37 | value: {{ .Values.workflow.batch_size }} 38 | entrypoint: main 39 | templates: 40 | - steps: 41 | - - name: git-clone 42 | template: git-clone 43 | - - name: vision-transfer-learning-training 44 | template: vision-transfer-learning-training 45 | - - name: vision-transfer-learning-inference 46 | template: vision-transfer-learning-inference 47 | name: main 48 | - container: 49 | args: 50 | - clone 51 | - -b 52 | - '{{"{{workflow.parameters.ref}}"}}' 53 | - '{{"{{workflow.parameters.repo}}"}}' 54 | - workspace 55 | command: 56 | - git 57 | env: 58 | - name: http_proxy 59 | value: '{{"{{workflow.parameters.http_proxy}}"}}' 60 | - name: https_proxy 61 | value: '{{"{{workflow.parameters.http_proxy}}"}}' 62 | image: intel/ai-workflows:vision-transfer-learning-training 63 | volumeMounts: 64 | - mountPath: /workspace 65 | name: workspace 66 | workingDir: / 67 | name: git-clone 68 | - container: 69 | args: 70 | - run 71 | - --no-capture-output 72 | - -n 73 | - transfer_learning 74 | - '{{"./{{workflow.parameters.script}}.sh"}}' 75 | command: 76 | - conda 77 | env: 78 | - name: DATASET_DIR 79 | value: /dataset 80 | - name: PRECISION 81 | value: '{{"{{workflow.parameters.precision}}"}}' 82 | - name: PLATFORM 83 | value: '{{"{{workflow.parameters.platform}}"}}' 84 | - name: OUTPUT_DIR 85 | value: /output 86 | - name: NUM_EPOCHS 87 | value: '{{"{{workflow.parameters.num-epochs}}"}}' 88 | - name: BATCH_SIZE 89 | value: '{{"{{workflow.parameters.batch-size}}"}}' 90 | - name: http_proxy 91 | value: '{{"{{workflow.parameters.http_proxy}}"}}' 92 | - name: https_proxy 93 | value: '{{"{{workflow.parameters.http_proxy}}"}}' 94 | image: intel/ai-workflows:vision-transfer-learning-training 95 | volumeMounts: 96 | - mountPath: /workspace 97 | name: workspace 98 | - mountPath: /output 99 | name: output-dir 100 | {{- if eq .Values.dataset.type "nfs" }} 101 | - mountPath: /dataset 102 | name: dataset-dir 103 | {{ end }} 104 | workingDir: /workspace 105 | {{- if eq .Values.dataset.type "s3" }} 106 | inputs: 107 | artifacts: 108 | - name: dataset 109 | path: /dataset 110 | s3: 111 | key: {{ .Values.dataset.s3.datasetKey }} 112 | {{ end }} 113 | name: vision-transfer-learning-training 114 | outputs: 115 | artifacts: 116 | - name: checkpoint 117 | path: /output 118 | s3: 119 | key: {{ .Values.dataset.logsKey }} 120 | sidecars: 121 | - args: 122 | - -c 123 | - while ! tail -f /output/result.txt ; do sleep 5 ; done 124 | command: 125 | - sh 126 | container: null 127 | image: {{ .Values.sidecars.image }} 128 | mirrorVolumeMounts: true 129 | name: output-log 130 | workingDir: /output 131 | {{- if eq .Values.dataset.type "nfs" }} 132 | volumes: 133 | - name: dataset-dir 134 | nfs: 135 | server: {{ .Values.dataset.nfs.server }} 136 | path: {{ .Values.dataset.nfs.path }} 137 | readOnly: {{ .Values.dataset.nfs.readOnly }} 138 | {{ end }} 139 | - container: 140 | args: 141 | - run 142 | - --no-capture-output 143 | - -n 144 | - transfer_learning 145 | - '{{"./{{workflow.parameters.script}}.sh"}}' 146 | - --inference 147 | - -cp 148 | - /output 149 | command: 150 | - conda 151 | env: 152 | - name: DATASET_DIR 153 | value: /dataset 154 | - name: PRECISION 155 | value: '{{"{{workflow.parameters.precision}}"}}' 156 | - name: PLATFORM 157 | value: '{{"{{workflow.parameters.platform}}"}}' 158 | - name: OUTPUT_DIR 159 | value: /output/inference 160 | - name: http_proxy 161 | value: '{{"{{workflow.parameters.http_proxy}}"}}' 162 | - name: https_proxy 163 | value: '{{"{{workflow.parameters.http_proxy}}"}}' 164 | image: intel/ai-workflows:vision-transfer-learning-inference 165 | volumeMounts: 166 | - mountPath: /workspace 167 | name: workspace 168 | - mountPath: /output 169 | name: output-dir 170 | {{- if eq .Values.dataset.type "nfs" }} 171 | - mountPath: /dataset 172 | name: dataset-dir 173 | {{ end }} 174 | workingDir: /workspace 175 | {{- if eq .Values.dataset.type "s3" }} 176 | inputs: 177 | artifacts: 178 | - name: dataset 179 | path: /dataset 180 | s3: 181 | key: {{ .Values.dataset.s3.datasetKey }} 182 | {{ end }} 183 | name: vision-transfer-learning-inference 184 | outputs: 185 | artifacts: 186 | - name: logs 187 | path: /output/inference 188 | s3: 189 | key: {{ .Values.dataset.logsKey }} 190 | sidecars: 191 | - args: 192 | - -c 193 | - while ! tail -f /output/result.txt ; do sleep 5 ; done 194 | command: 195 | - sh 196 | container: null 197 | image: {{ .Values.sidecars.image }} 198 | mirrorVolumeMounts: true 199 | name: output-log 200 | workingDir: /output 201 | {{- if eq .Values.dataset.type "nfs" }} 202 | volumes: 203 | - name: dataset-dir 204 | nfs: 205 | server: {{ .Values.dataset.nfs.server }} 206 | path: {{ .Values.dataset.nfs.path }} 207 | readOnly: {{ .Values.dataset.nfs.readOnly }} 208 | {{ end }} 209 | volumeClaimTemplates: 210 | - metadata: 211 | name: workspace 212 | name: workspace 213 | spec: 214 | accessModes: 215 | - ReadWriteOnce 216 | resources: 217 | requests: 218 | storage: {{ .Values.volumeClaimTemplates.workspace.resources.requests.storage }} 219 | - metadata: 220 | name: output-dir 221 | name: output-dir 222 | spec: 223 | accessModes: 224 | - ReadWriteOnce 225 | resources: 226 | requests: 227 | storage: {{ .Values.volumeClaimTemplates.output_dir.resources.requests.storage }} 228 | -------------------------------------------------------------------------------- /big-data/chronos/README.md: -------------------------------------------------------------------------------- 1 | # **BigDL Chronos TRAINING - Time Series Forecasting** 2 | 3 | ## **Description** 4 | 5 | This pipeline provides instructions on how to train a Temporal Convolution Neural Network on BigDL Chronos framework using time series dataset with make and docker compose. For more information on the workload visit [BigDL](https://github.com/intel-analytics/BigDL/tree/main) repository. 6 | 7 | ## **Project Structure** 8 | ``` 9 | ├── BigDL @ ai-workflow 10 | ├── DEVCATALOG.md 11 | ├── Dockerfile.chronos 12 | ├── Makefile 13 | └── docker-compose.yml 14 | ``` 15 | [*Makefile*](Makefile) 16 | 17 | ``` 18 | FINAL_IMAGE_NAME ?= chronos 19 | 20 | chronos: 21 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 22 | docker compose up chronos --build 23 | 24 | clean: 25 | docker compose down 26 | 27 | ``` 28 | [*docker-compose.yml*](docker-compose.yml) 29 | 30 | ``` 31 | services: 32 | chronos: 33 | build: 34 | args: 35 | http_proxy: ${http_proxy} 36 | https_proxy: ${https_proxy} 37 | no_proxy: ${no_proxy} 38 | dockerfile: Dockerfile.chronos 39 | command: sh -c "jupyter nbconvert --to python chronos_nyc_taxi_tsdataset_forecaster.ipynb && \ 40 | sed '26,40d' chronos_nyc_taxi_tsdataset_forecaster.py > chronos_taxi_forecaster.py && \ 41 | python chronos_taxi_forecaster.py" 42 | environment: 43 | - http_proxy=${http_proxy} 44 | - https_proxy=${https_proxy} 45 | - no_proxy=${no_proxy} 46 | image: ${FINAL_IMAGE_NAME}:training-ubuntu-20.04 47 | network_mode: "host" 48 | privileged: true 49 | volumes: 50 | - ./BigDL:/workspace/BigDL 51 | working_dir: /workspace/BigDL/python/chronos/colab-notebook 52 | 53 | ``` 54 | # **Time Series Forcasting** 55 | 56 | Training pipeline that uses the BigDL Chronos framework for time series forecasting using a Temporal Convlutional Neural Network. More information [here](https://github.com/intel-analytics/BigDL/tree/main). 57 | 58 | ## **Quick Start** 59 | 60 | * Make sure that the enviroment setup pre-requisites are satisfied per the document [here](../../README.md) 61 | 62 | * Pull and configure the dependent repo submodule ```git submodule update --init --recursive ``` 63 | 64 | * Install [Pipeline Repository Dependencies](../../README.md) 65 | 66 | * Other Variables: 67 | 68 | Variable Name | Default |Notes | 69 | :---------------:|:-------------------: | :------------------------------------: | 70 | FINAL_IMAGE_NAME | chronos | Final Docker Image Name | 71 | 72 | ## **Build and Run** 73 | 74 | Build and run with defaults: 75 | 76 | ```$ make chronos``` 77 | 78 | ## **Build and Run Example** 79 | 80 | ``` 81 | #1 [internal] load build definition from Dockerfile.chronos 82 | #1 transferring dockerfile: 55B done 83 | #1 DONE 0.0s 84 | 85 | #2 [internal] load .dockerignore 86 | #2 transferring context: 2B done 87 | #2 DONE 0.0s 88 | 89 | #3 [internal] load metadata for docker.io/library/ubuntu:20.04 90 | #3 DONE 0.0s 91 | 92 | #4 [1/5] FROM docker.io/library/ubuntu:20.04 93 | #4 DONE 0.0s 94 | 95 | #5 [2/5] RUN apt-get update --fix-missing && apt-get install -y apt-utils vim curl nano wget unzip git && apt-get install -y gcc g++ make && apt-get install -y libsm6 libxext6 libxrender-dev && apt-get install -y openjdk-8-jre && rm /bin/sh && ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.12.0-Linux-x86_64.sh && chmod +x Miniconda3-py37_4.12.0-Linux-x86_64.sh && ./Miniconda3-py37_4.12.0-Linux-x86_64.sh -b -f -p /usr/local && rm Miniconda3-py37_4.12.0-Linux-x86_64.sh 96 | #5 CACHED 97 | 98 | #6 [4/5] RUN echo "source activate chronos" > ~/.bashrc 99 | #6 CACHED 100 | 101 | #7 [3/5] RUN conda create -y -n chronos python=3.7 setuptools=58.0.4 && source activate chronos && pip install --no-cache-dir --pre --upgrade bigdl-chronos[pytorch,automl] matplotlib notebook==6.4.12 && pip uninstall -y torchtext 102 | #7 CACHED 103 | 104 | #8 [5/5] RUN echo "export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/" >> ~/.bashrc 105 | #8 CACHED 106 | 107 | #9 exporting to image 108 | #9 exporting layers done 109 | #9 writing image sha256:329995e99da4001c6d57e243085145acfce61f5bddabd9459aa598b846eae331 done 110 | #9 naming to docker.io/library/time-series-chronos:training-ubuntu-20.04 done 111 | #9 DONE 0.0s 112 | Attaching to training-time-series-chronos-1 113 | training-time-series-chronos-1 | [NbConvertApp] Converting notebook chronos_nyc_taxi_tsdataset_chronos.ipynb to python 114 | training-time-series-chronos-1 | [NbConvertApp] Writing 10692 bytes to chronos_nyc_taxi_tsdataset_chronos.py 115 | training-time-series-chronos-1 | Global seed set to 1 116 | training-time-series-chronos-1 | Global seed set to 1 117 | training-time-series-chronos-1 | /usr/local/envs/chronos/lib/python3.7/site-packages/bigdl/chronos/forecaster/utils.py:157: UserWarning: 'batch_size' cannot be divided with no remainder by 'self.num_processes'. We got 'batch_size' = 32 and 'self.num_processes' = 7 118 | training-time-series-chronos-1 | format(batch_size, num_processes)) 119 | training-time-series-chronos-1 | GPU available: False, used: False 120 | training-time-series-chronos-1 | TPU available: False, using: 0 TPU cores 121 | training-time-series-chronos-1 | IPU available: False, using: 0 IPUs 122 | training-time-series-chronos-1 | HPU available: False, using: 0 HPUs 123 | training-time-series-chronos-1 | Global seed set to 1 124 | training-time-series-chronos-1 | Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/7 125 | ``` 126 | ... 127 | 128 | ``` 129 | 98%|█████████▊| 287/294 [00:09<00:00, 31.55it/s, loss=0.0155] 130 | Epoch 2: 98%|█████████▊| 288/294 [00:09<00:00, 31.55it/s, loss=0.0155] 131 | Epoch 2: 98%|█████████▊| 288/294 [00:09<00:00, 31.55it/s, loss=0.0155] 132 | Epoch 2: 98%|█████████▊| 288/294 [00:09<00:00, 31.55it/s, loss=0.0157] 133 | Epoch 2: 98%|█████████▊| 289/294 [00:09<00:00, 31.56it/s, loss=0.0157] 134 | Epoch 2: 98%|█████████▊| 289/294 [00:09<00:00, 31.56it/s, loss=0.0162] 135 | Epoch 2: 99%|█████████▊| 290/294 [00:09<00:00, 31.57it/s, loss=0.0162] 136 | Epoch 2: 99%|█████████▊| 290/294 [00:09<00:00, 31.57it/s, loss=0.0165] 137 | Epoch 2: 99%|█████████▉| 291/294 [00:09<00:00, 31.58it/s, loss=0.0165] 138 | Epoch 2: 99%|█████████▉| 291/294 [00:09<00:00, 31.58it/s, loss=0.0164] 139 | Epoch 2: 99%|█████████▉| 292/294 [00:09<00:00, 31.59it/s, loss=0.0164] 140 | Epoch 2: 99%|█████████▉| 292/294 [00:09<00:00, 31.59it/s, loss=0.0164] 141 | Epoch 2: 99%|█████████▉| 292/294 [00:09<00:00, 31.59it/s, loss=0.0164] 142 | Epoch 2: 100%|█████████▉| 293/294 [00:09<00:00, 31.58it/s, loss=0.0164] 143 | Epoch 2: 100%|█████████▉| 293/294 [00:09<00:00, 31.58it/s, loss=0.0175] 144 | Epoch 2: 100%|██████████| 294/294 [00:09<00:00, 31.64it/s, loss=0.0175] 145 | Epoch 2: 100%|██████████| 294/294 [00:09<00:00, 31.64it/s, loss=0.017] 146 | Epoch 2: 100%|██████████| 294/294 [00:09<00:00, 31.64it/s, loss=0.017] 147 | Epoch 2: 100%|██████████| 294/294 [00:09<00:00, 31.60it/s, loss=0.017] 148 | training-time-series-forecaster-1 | Global seed set to 1 149 | training-time-series-forecaster-1 | GPU available: False, used: False 150 | training-time-series-forecaster-1 | TPU available: False, using: 0 TPU cores 151 | training-time-series-forecaster-1 | IPU available: False, using: 0 IPUs 152 | training-time-series-forecaster-1 | HPU available: False, using: 0 HPUs 153 | training-time-series-forecaster-1 exited with code 0 154 | ``` 155 | -------------------------------------------------------------------------------- /analytics/classical-ml/synthetic/inference/README.md: -------------------------------------------------------------------------------- 1 | # Classical ML Synthetic INFERENCE - Wafer Insights 2 | ## Description 3 | This document contains instructions on how to run Wafer Insights pipelines with make and docker compose. 4 | ## Project Structure 5 | ``` 6 | ├── wafer-insights @ v1.0.0 7 | ├── DEVCATALOG.md 8 | ├── docker-compose.yml 9 | ├── Dockerfile.wafer-insights 10 | ├── Makefile 11 | └── README.md 12 | ``` 13 | [_Makefile_](Makefile) 14 | ``` 15 | OUTPUT_DIR ?= /output 16 | FINAL_IMAGE_NAME ?= wafer-insights 17 | 18 | wafer-insight: 19 | @OUTPUT_DIR=${OUTPUT_DIR} \ 20 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 21 | docker compose up wafer-insight --build 22 | 23 | clean: 24 | docker compose down 25 | ``` 26 | [_docker-compose.yml_](docker-compose.yml) 27 | ``` 28 | services: 29 | wafer-insight: 30 | build: 31 | args: 32 | http_proxy: ${http_proxy} 33 | https_proxy: ${https_proxy} 34 | no_proxy: ${no_proxy} 35 | dockerfile: Dockerfile.wafer-insights 36 | command: 37 | - | 38 | conda run -n WI python src/loaders/synthetic_loader/loader.py 39 | conda run --no-capture-output -n WI python src/dashboard/app.py 40 | entrypoint: ["/bin/bash", "-c"] 41 | environment: 42 | - PYTHONPATH=$PYTHONPATH:$PWD 43 | - http_proxy=${http_proxy} 44 | - https_proxy=${https_proxy} 45 | - no_proxy=${no_proxy} 46 | image: ${FINAL_IMAGE_NAME}:inference-ubuntu-20.04 47 | ports: 48 | - 8050:8050 49 | privileged: true 50 | volumes: 51 | - ${OUTPUT_DIR}:/data 52 | - ./wafer-insights:/workspace/wafer-insights 53 | working_dir: /workspace/wafer-insights 54 | ``` 55 | 56 | # Wafer Insights 57 | End2End AI Workflow utilizing a Flask dashboard to allow users to predict FMAX/IDV tokens based on synthetically generated FAB data sources. More information [here](https://github.com/intel/wafer-insights-with-classical-ml) 58 | 59 | ## Quick Start 60 | * Pull and configure the dependent repo submodule `git submodule update --init --recursive`. 61 | 62 | * Install [Pipeline Repository Dependencies](../../../../README.md) 63 | 64 | * Get the public IP address of your machine with either `ip a` or https://whatismyipaddress.com/ 65 | 66 | * Other variables: 67 | 68 | | Variable Name | Default | Notes | 69 | | --- | --- | --- | 70 | | FINAL_IMAGE_NAME | `wafer-insights` | Final Docker image name | 71 | | OUTPUT_DIR | `/output` | Output directory | 72 | 73 | ## Build and Run 74 | Build and Run with defaults: 75 | ``` 76 | make wafer-insight 77 | ``` 78 | 79 | * Visit the dashboard at your public ip address and port `8050`. 80 | 81 | ## Build and Run Example 82 | ``` 83 | $ make wafer-insight 84 | WARN[0000] The "PYTHONPATH" variable is not set. Defaulting to a blank string. 85 | [+] Building 0.1s (9/9) FINISHED 86 | => [internal] load build definition from Dockerfile.wafer-insights 0.0s 87 | => => transferring dockerfile: 47B 0.0s 88 | => [internal] load .dockerignore 0.0s 89 | => => transferring context: 2B 0.0s 90 | => [internal] load metadata for docker.io/library/ubuntu:20.04 0.0s 91 | => [1/5] FROM docker.io/library/ubuntu:20.04 0.0s 92 | => CACHED [2/5] RUN apt-get update && apt-get install --no-install-recommends --fix-missing -y ca-cer 0.0s 93 | => CACHED [3/5] RUN mkdir -p /workspace 0.0s 94 | => CACHED [4/5] RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O 0.0s 95 | => CACHED [5/5] RUN conda create -yn WI python=3.9 && source activate WI && conda install -y scik 0.0s 96 | => exporting to image 0.0s 97 | => => exporting layers 0.0s 98 | => => writing image sha256:dfa7411736694db4d3c8d0032f424fc88f0af98fabd163a659a90d0cc2dfe587 0.0s 99 | => => naming to docker.io/library/wafer-insights:inference-ubuntu-20.04 0.0s 100 | WARN[0000] Found orphan containers ([inference-wafer-analytics-1]) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up. 101 | [+] Running 1/1 102 | ⠿ Container inference-wafer-insight-1 Recreated 0.1s 103 | Attaching to inference-wafer-insight-1 104 | inference-wafer-insight-1 | [[-5.39543860e-04 2.39971569e-03 -3.42210731e-04 ... -2.35041980e-03 105 | inference-wafer-insight-1 | -1.81397056e-04 -2.09303234e-03] 106 | inference-wafer-insight-1 | [-1.00075542e-04 -5.41824409e-04 -2.38435358e-04 ... 3.39901582e-03 107 | inference-wafer-insight-1 | 3.35075678e-04 2.04678475e-03] 108 | inference-wafer-insight-1 | [-5.14076633e-04 -2.28770984e-03 3.52836617e-04 ... -3.59841471e-03 109 | inference-wafer-insight-1 | -2.57484490e-03 5.23169035e-04] 110 | inference-wafer-insight-1 | ... 111 | inference-wafer-insight-1 | [-3.13805323e-03 -3.16870576e-03 1.28447995e-03 ... -8.94258047e-05 112 | inference-wafer-insight-1 | 8.13668371e-04 -5.02239567e-04] 113 | inference-wafer-insight-1 | [-7.28863425e-04 2.32030465e-03 1.57134892e-03 ... 2.64884040e-04 114 | inference-wafer-insight-1 | -2.12739801e-03 -1.98500740e-04] 115 | inference-wafer-insight-1 | [-1.79534321e-03 6.97006847e-04 4.70415219e-04 ... -4.21349858e-04 116 | inference-wafer-insight-1 | 2.88895727e-03 4.20368128e-04]] 117 | inference-wafer-insight-1 | fcol`feature_0 fcol`feature_1 ... fcol`feature_1999 TEST_END_DATE 118 | inference-wafer-insight-1 | 0 -0.000540 0.002400 ... -0.002093 2022-06-24 17:57:44.060832 119 | inference-wafer-insight-1 | 1 -0.000100 -0.000542 ... 0.002047 2022-06-24 18:02:55.100832 120 | inference-wafer-insight-1 | 2 -0.000514 -0.002288 ... 0.000523 2022-06-24 18:08:06.140832 121 | inference-wafer-insight-1 | 3 -0.000020 -0.003073 ... 0.001036 2022-06-24 18:13:17.180832 122 | inference-wafer-insight-1 | 4 -0.001280 0.001955 ... -0.000343 2022-06-24 18:18:28.220832 123 | inference-wafer-insight-1 | 124 | inference-wafer-insight-1 | [5 rows x 2001 columns] 125 | inference-wafer-insight-1 | started_stacking 126 | inference-wafer-insight-1 | LOT7 WAFER3 PROCESS ... MEDIAN DEVREVSTEP TESTNAME`STRUCTURE_NAME 127 | inference-wafer-insight-1 | 0 DG0000000001 0 1234 ... -0.000540 DPMLD fcol`feature_0 128 | inference-wafer-insight-1 | 1 DG0000000001 1 1234 ... -0.000100 DPMLD fcol`feature_0 129 | inference-wafer-insight-1 | 2 DG0000000001 2 1234 ... -0.000514 DPMLD fcol`feature_0 130 | inference-wafer-insight-1 | 3 DG0000000001 3 1234 ... -0.000020 DPMLD fcol`feature_0 131 | inference-wafer-insight-1 | 4 DG0000000001 4 1234 ... -0.001280 DPMLD fcol`feature_0 132 | inference-wafer-insight-1 | 133 | inference-wafer-insight-1 | [5 rows x 10 columns] 134 | inference-wafer-insight-1 | 135 | inference-wafer-insight-1 | Dash is running on http://0.0.0.0:8050/ 136 | inference-wafer-insight-1 | 137 | inference-wafer-insight-1 | * Serving Flask app 'app' 138 | inference-wafer-insight-1 | * Debug mode: on 139 | ``` 140 | -------------------------------------------------------------------------------- /classification/tensorflow/bert_base/inference/README.md: -------------------------------------------------------------------------------- 1 | # **TensorFlow BERT Base INFERENCE - AWS SageMaker** 2 | 3 | ## **Description** 4 | 5 | This pipeline provides instructions on how to run inference using BERT Base model on infrastructure provided by AWS SageMaker with make and docker compose. 6 | 7 | ## **Project Structure** 8 | ``` 9 | ├── aws_sagemaker @ v0.2.0 10 | ├── Makefile 11 | ├── README.md 12 | └── docker-compose.yml 13 | ``` 14 | 15 | [*Makefile*](Makefile) 16 | ``` 17 | AWS_CSV_FILE ?= credentials.csv 18 | AWS_DATA=$$(pwd)/aws_data 19 | FINAL_IMAGE_NAME ?= nlp-sagemaker 20 | OUTPUT_DIR ?= /output 21 | ROLE ?= role 22 | S3_MODEL_URI ?= link 23 | 24 | export AWS_PROFILE := $(shell cat ${AWS_CSV_FILE} | awk -F',' 'NR==2{print $$1}') 25 | export REGION ?= us-west-2 26 | 27 | nlp-sagemaker: 28 | ./aws_sagemaker/scripts/setup.sh aws_sagemaker/ 29 | mkdir -p ${AWS_DATA} && cp -r ${HOME}/.aws ${AWS_DATA}/.aws/ 30 | @AWS_PROFILE=${AWS_PROFILE} \ 31 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 32 | OUTPUT_DIR=${OUTPUT_DIR} \ 33 | docker compose up --build nlp-sagemaker 34 | clean: 35 | if [ -d ${AWS_DATA} ]; then \ 36 | rm -rf ${AWS_DATA}; \ 37 | fi; \ 38 | if [ -d aws/ ]; then \ 39 | rm -rf aws/; \ 40 | fi; \ 41 | if [ -d aws-cli/ ]; then \ 42 | rm -rf aws-cli/; \ 43 | fi; \ 44 | if [ -f awscliv2.zip ]; then \ 45 | rm -f awscliv2.zip; \ 46 | fi 47 | docker compose down 48 | ``` 49 | 50 | [*docker-compose.yml*](docker-compose.yml) 51 | ``` 52 | services: 53 | nlp-sagemaker: 54 | build: 55 | context: ./aws_sagemaker/ 56 | args: 57 | http_proxy: ${http_proxy} 58 | https_proxy: ${https_proxy} 59 | no_proxy: ${no_proxy} 60 | dockerfile: ./Dockerfile 61 | command: sh -c "jupyter nbconvert --to python 1.0-intel-sagemaker-inference.ipynb && python3 1.0-intel-sagemaker-inference.py" 62 | environment: 63 | - http_proxy=${http_proxy} 64 | - https_proxy=${https_proxy} 65 | - no_proxy=${no_proxy} 66 | - AWS_PROFILE=${AWS_PROFILE} 67 | image: ${FINAL_IMAGE_NAME}:inference-ubuntu-20.04 68 | network_mode: "host" 69 | privileged: true 70 | volumes: 71 | - ${OUTPUT_DIR}:${OUTPUT_DIR} 72 | - ./aws_sagemaker/notebooks:/root/notebooks 73 | - ./aws_sagemaker/src:/root/src 74 | - ./aws_data/.aws:/root/.aws 75 | working_dir: /root/notebooks 76 | ``` 77 | 78 | # **AWS SageMaker** 79 | 80 | End-to-End AI workflow using the AWS SageMaker Cloud Infrastructure for inference of the BERT base model. More Information [here](https://github.com/intel/NLP-Workflow-with-AWS.git). The pipeline runs the `1.0-intel-sagemaker-inference.ipynb` of the [Intel's AWS SageMaker Workflow](https://github.com/intel/NLP-Workflow-with-AWS/blob/v0.2.0/notebooks/1.0-intel-sagemaker-inference.ipynb) project. 81 | 82 | ## **Quick Start** 83 | 84 | * Pull and configure the dependent repo submodule ```git submodule update --init --recursive ```. 85 | 86 | * Install [Pipeline Repository Dependencies](../../../../README.md). 87 | 88 | * Setup your pipeline specific variables 89 | * Please, create a key pair using the instructions from this [link](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html#cli-configure-quickstart-creds-create). NOTE: Please, download the csv file as described in the 7th step of the instructions. 90 | * Before you start, you need to add execution role for AWS SageMaker. For more information on how to do it follow instructions from "Create execution role" section of this [link](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html). **Note down the role name or the ARN for the role.** 91 | * You need to have the S3 link to the quantized model generated from training. 92 | 93 | * Variables in Makefile: Use variables from the table below to run make command. 94 | 95 | Variable Name | Default | Notes | 96 | :----------------:|:------------------: | :--------------------------------------:| 97 | AWS_CSV_FILE | `credentials.csv` | Location of the AWS account credentials file | 98 | FINAL_IMAGE_NAME | `nlp-sagemaker` | Final Docker Image Name | 99 | OUTPUT_DIR | `./output` | Output directory | 100 | REGION | `us-west-2` | Region of yor S3 bucket and profile | 101 | ROLE | `role` | Name or ARN of the role you created for SageMaker | 102 | S3_MODEL_URI | `link` | URI of the trained and quantized model checkpoint | 103 | 104 | ## **Build and Run** 105 | Build and run with defaults: 106 | 107 | ```make nlp-sagemaker``` 108 | 109 | ## **Build and Run Example** 110 | ``` 111 | $ AWS_CSV_FILE=./aws_config.csv S3_MODEL_URI="s3://model.tar.gz" ROLE="role" make nlp-sagemaker 112 | [+] Building 0.1s (9/9) FINISHED 113 | => [internal] load build definition from Dockerfile 0.0s 114 | => => transferring dockerfile: 32B 0.0s 115 | => [internal] load .dockerignore 0.0s 116 | => => transferring context: 2B 0.0s 117 | => [internal] load metadata for docker.io/library/ubuntu:20.04 0.0s 118 | => [1/5] FROM docker.io/library/ubuntu:20.04 0.0s 119 | => CACHED [2/5] RUN apt-get update && DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get -y install tzdata=2022c-0ubuntu0.20.04.0 --no-install-recommends && rm -rf /var/lib/apt/lists/* 0.0s 120 | => CACHED [3/5] RUN apt-get -y update && apt-get install -y --no-install-recommends wget=1.20.3-1ubuntu2 nginx=1.18.0-0ubuntu1.3 cmake=3.16.3-1ubuntu1 software-prope 0.0s 121 | => CACHED [4/5] RUN pip install --no-cache-dir boto3==1.24.15 && pip install --no-cache-dir sagemaker==2.96.0 && pip install --no-cache-dir tensorflow-cpu==2.9.1 && pip install --no-cache-dir 0.0s 122 | => CACHED [5/5] RUN pip install --no-cache-dir virtualenv==20.14.1 && virtualenv intel_neural_compressor_venv && . intel_neural_compressor_venv/bin/activate && pip install --no-cache-dir Cy 0.0s 123 | => exporting to image 0.0s 124 | => => exporting layers 0.0s 125 | => => writing image sha256:91b43c6975feab4db06cf34a9635906d2781102a05d406b93c5bf2eb87c30a94 0.0s 126 | => => naming to docker.io/library/intel_amazon_cloud_trainandinf:inference-ubuntu-20.04 0.0s 127 | [+] Running 1/0 128 | ⠿ Container bert_uncased_base-aws-sagemaker-1 Created 0.0s 129 | Attaching to bert_uncased_base-aws-sagemaker-1 130 | bert_uncased_base-aws-sagemaker-1 | [NbConvertApp] Converting notebook 1.0-intel-sagemaker-inference.ipynb to python 131 | bert_uncased_base-aws-sagemaker-1 | [NbConvertApp] Writing 3597 bytes to 1.0-intel-sagemaker-inference.py 132 | bert_uncased_base-aws-sagemaker-1 | update_endpoint is a no-op in sagemaker>=2. 133 | bert_uncased_base-aws-sagemaker-1 | See: https://sagemaker.readthedocs.io/en/stable/v2.html for details. 134 | bert_uncased_base-aws-sagemaker-1 | ---!/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (3.0.4) doesn't match a supported version! 135 | bert_uncased_base-aws-sagemaker-1 | warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " 136 | bert_uncased_base-aws-sagemaker-1 exited with code 0 137 | ``` 138 | 139 | -------------------------------------------------------------------------------- /template/DEVCATALOG_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | # Workflow Title 2 | 3 | Start with a short (~300 characters) teaser description of what this workflow 4 | does, useful as the summary in the dev catalog. Use "active" language. For 5 | example, 6 | 7 | Learn to use Intel's XPU hardware and Intel optimized software for 8 | distributed training and inference on the Azure Machine Learning Platform 9 | with PyTorch\*, Hugging Face, and Intel® Neural Compressor. 10 | 11 | or 12 | 13 | Run a video streamer pipeline that mimics real-time video analysis. Take in 14 | real-time data, send it to an endpoint for single-shot object detection, and 15 | store the resulting metadata for further review. 16 | 17 | Provide a link back to the dev catalog, for example: 18 | 19 | Check out more workflow examples and reference implementations in the [Developer Catalog](https://developer.intel.com/aireferenceimplementations). 20 | 21 | ## Overview 22 | Provide high-level information that would help a developer understand why this 23 | workflow might be relevant to them, the **benefits** of the features or 24 | functions showcased (don't just list features), and what they'll learn by trying 25 | this workflow. A bullet list could work well here. 26 | 27 | Here are some general authoring guidelines throughout the document: 28 | - Define or explain all acronyms on first use. If new abbreviations or acronyms 29 | are used within the diagram's text, explain them in the accompanying 30 | explanation. 31 | - Make sure diagrams and explanations are consistent (e.g., they use the same 32 | terminology and flow). 33 | - Text in diagrams must be readable (contrast and size). Provide a larger image 34 | that's scaled to fit the page so it can be clicked on to see the larger 35 | version or break an image into separate readable images. 36 | - Use proper Intel legal product names and acknowledgment of third-party 37 | trademarks with an asterisk (on first use): Intel® Extension for PyTorch\* 38 | and not IPEX. 39 | - Add comments in example script command/code blocks when it can better explain 40 | or clarify what's being done. 41 | - Try to eliminate using ```` placeholders in the template when you write 42 | the workflow instructions and code examples. If a value is used more than 43 | once, it's also a good idea to set an environment variable to the needed value 44 | and use that throughout the instructions. 45 | 46 | End the overview with a link to the workflow's main GitHub repo, for example: 47 | 48 | For more details, visit the [Cloud Training and Cloud Inference on Amazon 49 | Sagemaker/Elastic Kubernetes 50 | Service](https://github.com/intel/NLP-Workflow-with-AWS) GitHub repository. 51 | 52 | ## Hardware Requirements 53 | There are workflow-specific hardware and software setup requirements depending on 54 | how the workflow is run. Bare metal development system and Docker\* image running 55 | locally have the same system requirements. Specify those reqirements, such as 56 | 57 | | Recommended Hardware | Precision | 58 | | ---------------------------- | ---------- | 59 | | Intel® 4th Gen Xeon® Scalable Performance processors|BF16 | 60 | | Intel® 1st, 2nd, 3rd, and 4th Gen Xeon® Scalable Performance processors| FP32 | 61 | 62 | If Docker runs on a cloud service, specify cloud service requirements. 63 | 64 | ## How it Works 65 | Explain how the workflow does what it does, including its inputs, processing, and outputs. 66 | Provide a simple architecture or data flow diagram, and additional diagram(s) 67 | with more details if useful. 68 | 69 | Mention tuning opportunities and how the developer can interact with or alter 70 | the workflow. 71 | 72 | ## Get Started 73 | 74 | ### Download the Workflow Repository 75 | Create a working directory for the workflow and clone the [Main 76 | Repository]() repository into your working 77 | directory. 78 | 79 | ``` 80 | # For example... 81 | mkdir ~/work && cd ~/work 82 | git clone https://github.com/intel/workflow-repo.git 83 | cd 84 | git checkout 85 | ``` 86 | 87 | ### Download the Datasets 88 | Describe what datasets are used for input to this workflow, and how to download 89 | them if they're not included in the workflow repo or in the Docker image. If the 90 | datasets are particularly large, indicate storage space needed for download and 91 | extraction. 92 | 93 | ``` 94 | cd 95 | mkdir 96 | 97 | cd .. 98 | ``` 99 | 100 | What information can we document if a developer wants to provide their own 101 | dataset as input? Remind them to put their data into the ``datasets`` directory 102 | we created. 103 | 104 | --- 105 | 106 | ## Run Using Docker 107 | Follow these instructions to set up and run our provided Docker image. 108 | For running on bare metal, see the [bare metal instructions](#run-using-bare-metal) 109 | instructions. 110 | 111 | If possible, provide an estimate of time to set up and run the workflow 112 | using Docker on the recommended hardware. 113 | 114 | ### Set Up Docker Engine 115 | You'll need to install Docker Engine on your development system. 116 | Note that while **Docker Engine** is free to use, **Docker Desktop** may require 117 | you to purchase a license. See the [Docker Engine Server installation 118 | instructions](https://docs.docker.com/engine/install/#server) for details. 119 | 120 | If the Docker image is run on a cloud service, mention they may also need 121 | credentials to perform training and inference related operations (such as these 122 | for Azure): 123 | - [Set up the Azure Machine Learning Account](https://azure.microsoft.com/en-us/free/machine-learning) 124 | - [Configure the Azure credentials using the Command-Line Interface](https://docs.microsoft.com/en-us/cli/azure/authenticate-azure-cli) 125 | - [Compute targets in Azure Machine Learning](https://learn.microsoft.com/en-us/azure/machine-learning/concept-compute-target) 126 | - [Virtual Machine Products Available in Your Region](https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/?products=virtual-machines®ions=us-east) 127 | 128 | ### Set Up Docker Image 129 | Pull the provided docker image. 130 | ``` 131 | docker pull 132 | ``` 133 | 134 | If your environment requires a proxy to access the internet, export your 135 | development system's proxy settings to the docker environment: 136 | ``` 137 | export DOCKER_RUN_ENVS="-e ftp_proxy=${ftp_proxy} \ 138 | -e FTP_PROXY=${FTP_PROXY} -e http_proxy=${http_proxy} \ 139 | -e HTTP_PROXY=${HTTP_PROXY} -e https_proxy=${https_proxy} \ 140 | -e HTTPS_PROXY=${HTTPS_PROXY} -e no_proxy=${no_proxy} \ 141 | -e NO_PROXY=${NO_PROXY} -e socks_proxy=${socks_proxy} \ 142 | -e SOCKS_PROXY=${SOCKS_PROXY}" 143 | ``` 144 | 145 | ### Run Docker Image 146 | Run the workflow using the ``docker run`` command, as shown: (example) 147 | ``` 148 | export DATASET_DIR= 149 | export OUTPUT_DIR=/output 150 | docker run -a stdout $DOCKER_RUN_ENVS \ 151 | --env DATASET=${DATASET} \ 152 | --env OUTPUT_DIR=${OUTPUT_DIR} \ 153 | --volume ${DATASET_DIR}:/workspace/data \ 154 | --volume ${OUTPUT_DIR}:/output \ 155 | --volume ${PWD}:/workspace \ 156 | --workdir /workspace \ 157 | --privileged --init -it --rm --pull always \ 158 | intel/ai-workflows: \ 159 | ./run.sh 160 | ``` 161 | 162 | --- 163 | 164 | ## Run Using Bare Metal 165 | Follow these instructions to set up and run this workflow on your own development 166 | system. For running a provided Docker image with Docker, see the [Docker 167 | instructions](#run-using-docker). 168 | 169 | If possible, provide an estimate of time to set up and run 170 | the workflow on bare metal (with recommended HW). 171 | 172 | ### Set Up System Software 173 | Our examples use the ``conda`` package and enviroment on your local computer. 174 | If you don't already have ``conda`` installed, see the [Conda Linux installation 175 | instructions](https://docs.conda.io/projects/conda/en/stable/user-guide/install/linux.html). 176 | 177 | Mention that other required software is installed by a provided installation script 178 | or if not, provide instructions for installing required software packages and 179 | libraries, along with expected versions of each. 180 | 181 | ### Set Up Workflow 182 | Run these commands to set up the workflow's conda environment and install required software: 183 | ``` 184 | cd 185 | conda create -n dlsa python=3.8 --yes 186 | conda activate dlsa 187 | sh install.sh 188 | ``` 189 | 190 | ### Run Workflow 191 | Use these commands to run the workflow: 192 | ``` 193 | 194 | ``` 195 | 196 | ## Expected Output 197 | Explain what a successful execution looks like and where you'll find artifacts 198 | created by analysis or inference from the run (if any). 199 | 200 | ## Summary and Next Steps 201 | Explain what they've successfully done and what they could try next with this 202 | workflow. For example, are there different tuning knobs they could try that 203 | would show different results? 204 | 205 | ## Learn More 206 | For more information about or to read about other relevant workflow 207 | examples, see these guides and software resources: 208 | 209 | - Put ref links and descriptions here, for example 210 | - [Intel® AI Analytics Toolkit (AI Kit)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html) 211 | - [Azure Machine Learning Documentation](https://learn.microsoft.com/en-us/azure/machine-learning/) 212 | - links to other similar or related items from the dev catalog 213 | 214 | ## Troubleshooting 215 | Document known issues or problem spots, and if possible, workarounds. 216 | 217 | ## Support 218 | If you have questions or issues about this workflow, contact the [Support Team](support_forum_link). 219 | If there is no support forum, and we want developers to use GitHub issues to submit bugs and enhancement 220 | requests, put a link to that GitHub repo's issues, something like this: 221 | 222 | The End-to-end Document Level Sentiment Analysis team tracks both bugs and 223 | enhancement requests using [GitHub 224 | issues](https://github.com/intel/document-level-sentiment-analysis/issues). 225 | Before submitting a suggestion or bug report, search the [DLSA GitHub 226 | issues](https://github.com/intel/document-level-sentiment-analysis/issues) to 227 | see if your issue has already been reported. 228 | -------------------------------------------------------------------------------- /transfer_learning/tensorflow/resnet50/inference/README.md: -------------------------------------------------------------------------------- 1 | # Tensorflow ResNet50 INFERENCE - Vision Transfer Learning 2 | ## Description 3 | This document contains instructions on how to run Vision Transfer Learning e2e pipeline with make and docker compose. 4 | ## Project Structure 5 | ``` 6 | ├── transfer-learning-inference @ v1.0.1 7 | ├── DEVCATALOG.md 8 | ├── docker-compose.yml 9 | ├── Dockerfile.vision-transfer-learning 10 | ├── Makefile 11 | └── README.md 12 | ``` 13 | [_Makefile_](Makefile) 14 | ``` 15 | CHECKPOINT_DIR ?= /output/colorectal 16 | DATASET_DIR ?= /data 17 | FINAL_IMAGE_NAME ?= vision-transfer-learning 18 | OUTPUT_DIR ?= /output 19 | PLATFORM ?= None 20 | PRECISION ?= FP32 21 | SCRIPT ?= colorectal 22 | 23 | vision-transfer-learning: 24 | @CHECKPOINT_DIR=${CHECKPOINT_DIR} \ 25 | DATASET_DIR=${DATASET_DIR} \ 26 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 27 | OUTPUT_DIR=${OUTPUT_DIR} \ 28 | PLATFORM=${PLATFORM} \ 29 | PRECISION=${PRECISION} \ 30 | SCRIPT=${SCRIPT} \ 31 | docker compose up vision-transfer-learning --build 32 | 33 | clean: 34 | docker compose down 35 | ``` 36 | [_docker-compose.yml_](docker-compose.yml) 37 | ``` 38 | services: 39 | vision-transfer-learning: 40 | build: 41 | args: 42 | http_proxy: ${http_proxy} 43 | https_proxy: ${https_proxy} 44 | no_proxy: ${no_proxy} 45 | dockerfile: Dockerfile.vision-transfer-learning 46 | command: conda run --no-capture-output -n transfer_learning ./${SCRIPT}.sh --inference -cp "/workspace/checkpoint" 47 | environment: 48 | - DATASET_DIR=/workspace/data 49 | - OUTPUT_DIR=${OUTPUT_DIR}/${SCRIPT} 50 | - PLATFORM=${PLATFORM} 51 | - PRECISION=${PRECISION} 52 | - http_proxy=${http_proxy} 53 | - https_proxy=${https_proxy} 54 | - no_proxy=${no_proxy} 55 | image: ${FINAL_IMAGE_NAME}:inference-ubuntu-20.04 56 | privileged: true 57 | volumes: 58 | - /${CHECKPOINT_DIR}:/workspace/checkpoint 59 | - /${DATASET_DIR}:/workspace/data 60 | - ${OUTPUT_DIR}:${OUTPUT_DIR} 61 | - ./transfer-learning-inference:/workspace/transfer-learning 62 | working_dir: /workspace/transfer-learning 63 | ``` 64 | 65 | # Vision Transfer Learning 66 | End2End AI Workflow for transfer learning based image classification using ResNet50. 67 | 68 | ## Quick Start 69 | * Pull and configure the dependent repo submodule `git submodule update --init --recursive`. 70 | 71 | * Install [Pipeline Repository Dependencies](https://github.com/intel/ai-workflows/blob/main/pipelines/README.md) 72 | 73 | * Train weights using the [training version](https://github.com/intel/ai-workflows/blob/main/pipelines/transfer_learning/tensorflow/resnet50/training). And set `CHECKPOINT_DIR` to be equal to the `${OUTPUT_DIR}/${SCRIPT}`. 74 | 75 | * Other variables: 76 | 77 | | Variable Name | Default | Notes | 78 | | --- | --- | --- | 79 | | CHECKPOINT_DIR | `/checkpoint` | Checkpoint directory, default is placeholder | 80 | | DATASET_DIR | `/data` | Dataset directory, optional for `SCRIPT=colorectal`, default is placeholder | 81 | | FINAL_IMAGE_NAME | `vision-transfer-learning` | Final Docker image name | 82 | | OUTPUT_DIR | `/output` | Output directory | 83 | | PLATFORM | `None` | `SPR` and `None` are supported, Hyperthreaded SPR systems are not currently working | 84 | | PRECISION | `FP32` | `bf16` and `FP32` are supported | 85 | | SCRIPT | `colorectal` | `sports`, `resisc`, and `colorectal` are supported scripts that use different datasets/checkpoints | 86 | 87 | ## Build and Run 88 | Build and Run with defaults: 89 | ``` 90 | make vision-transfer-learning 91 | ``` 92 | ## Build and Run Example 93 | ``` 94 | $ PLATFORM=SPR make vision-transfer-learning 95 | [+] Building 2.0s (9/9) FINISHED 96 | => [internal] load build definition from Dockerfile.vision-transfer-learning 0.0s 97 | => => transferring dockerfile: 2.36kB 0.0s 98 | => [internal] load .dockerignore 0.0s 99 | => => transferring context: 2B 0.0s 100 | => [internal] load metadata for docker.io/library/ubuntu:20.04 0.0s 101 | => [1/5] FROM docker.io/library/ubuntu:20.04 0.0s 102 | => CACHED [2/5] RUN apt-get update && apt-get install --no-install-recommends --fix-missing -y build-essential ca-certificates git gcc numactl wget 0.0s 103 | => CACHED [3/5] RUN apt-get update && wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh && bash miniconda.sh -b -p /opt/conda && rm m 0.0s 104 | => CACHED [4/5] RUN conda create -y -n transfer_learning python=3.8 && source activate transfer_learning && conda install -y -c conda-forge gperftools && conda install -y intel-openm 0.0s 105 | => [5/5] RUN mkdir -p /workspace/transfer-learning 1.8s 106 | => exporting to image 0.0s 107 | => => exporting layers 0.0s 108 | => => writing image sha256:15de220251a06ec9098c458f43c21239f1811fd5bc563bf99f322721960a717b 0.0s 109 | => => naming to docker.io/library/vision-transfer-learning:inference-23-2022-ubuntu-20.04 0.0s 110 | 111 | Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them 112 | [+] Running 1/0 113 | ⠿ Container inference-vision-transfer-learning-1 Recreated 0.1s 114 | Attaching to inference-vision-transfer-learning-1 115 | inference-vision-transfer-learning-1 | /usr/bin/bash: /opt/conda/envs/transfer_learning/lib/libtinfo.so.6: no version information available (required by /usr/bin/bash) 116 | inference-vision-transfer-learning-1 | INFERENCE Default value is zero 117 | inference-vision-transfer-learning-1 | Inference option is : 1 118 | inference-vision-transfer-learning-1 | Checkpoint File is : /workspace/checkpoint 119 | inference-vision-transfer-learning-1 | Platform is SPR 120 | inference-vision-transfer-learning-1 | 2022-08-25 17:59:38.778284: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX512_VNNI AVX512_BF16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 121 | ``` 122 | ... 123 | ``` 124 | inference-vision-transfer-learning-1 | OMP: Info #255: KMP_AFFINITY: pid 22 tid 550 thread 100 bound to OS proc set 44 125 | inference-vision-transfer-learning-1 | OMP: Info #255: KMP_AFFINITY: pid 22 tid 551 thread 101 bound to OS proc set 45 126 | inference-vision-transfer-learning-1 | OMP: Info #255: KMP_AFFINITY: pid 22 tid 552 thread 102 bound to OS proc set 46 127 | inference-vision-transfer-learning-1 | OMP: Info #255: KMP_AFFINITY: pid 22 tid 553 thread 103 bound to OS proc set 47 128 | inference-vision-transfer-learning-1 | OMP: Info #255: KMP_AFFINITY: pid 22 tid 554 thread 104 bound to OS proc set 48 129 | inference-vision-transfer-learning-1 | OMP: Info #255: KMP_AFFINITY: pid 22 tid 555 thread 105 bound to OS proc set 49 130 | inference-vision-transfer-learning-1 | OMP: Info #255: KMP_AFFINITY: pid 22 tid 556 thread 106 bound to OS proc set 50 131 | inference-vision-transfer-learning-1 | OMP: Info #255: KMP_AFFINITY: pid 22 tid 557 thread 107 bound to OS proc set 51 132 | inference-vision-transfer-learning-1 | OMP: Info #255: KMP_AFFINITY: pid 22 tid 558 thread 108 bound to OS proc set 52 133 | inference-vision-transfer-learning-1 | OMP: Info #255: KMP_AFFINITY: pid 22 tid 559 thread 109 bound to OS proc set 53 134 | inference-vision-transfer-learning-1 | OMP: Info #255: KMP_AFFINITY: pid 22 tid 560 thread 110 bound to OS proc set 54 135 | inference-vision-transfer-learning-1 | OMP: Info #255: KMP_AFFINITY: pid 22 tid 561 thread 111 bound to OS proc set 55 136 | inference-vision-transfer-learning-1 exited with code 0 137 | ``` 138 | 139 | ## Check Results 140 | 141 | ``` 142 | $ tail -f result.txt 143 | Dataset directory is /workspace/data 144 | Setting Output Directory 145 | Is Tf32 enabled ? : False 146 | Test directory not present so using validation directory 147 | Total classes = 8 148 | 32/32 [==============================] - 5s 127ms/step - loss: 0.2466 - acc: 0.9140 149 | Test accuracy : 0.9139999747276306 150 | 32/32 [==============================] - 5s 137ms/step 151 | Classification report 152 | precision recall f1-score support 153 | 154 | 0 0.95 0.94 0.94 112 155 | 1 0.77 0.87 0.81 127 156 | 2 0.86 0.74 0.80 137 157 | 3 0.95 0.94 0.94 126 158 | 4 0.90 0.90 0.90 126 159 | 5 0.95 0.99 0.97 128 160 | 6 1.00 0.96 0.98 118 161 | 7 0.96 1.00 0.98 126 162 | 163 | accuracy 0.91 1000 164 | macro avg 0.92 0.92 0.92 1000 165 | weighted avg 0.92 0.91 0.91 1000 166 | 167 | Top 1 accuracy score: 0.914 168 | Top 5 accuracy score: 0.999 169 | ``` 170 | -------------------------------------------------------------------------------- /analytics/classical-ml/synthetic/inference/DEVCATALOG.md: -------------------------------------------------------------------------------- 1 | # Wafer Insights - Inference 2 | 3 | ## Overview 4 | Wafer Insights is a python application that allows users to predict FMAX/IDV tokens based on multiple data sources measured in the Intel fab. For detailed information about the workflow, go to [Wafer Insights](https://github.com/intel/wafer-insights-with-classical-ml) GitHub repository. 5 | 6 | ## How it Works 7 | Wafer Insights is an interactive data-visualization web application based on Dash and Plotly. It includes 2 major components: a data loader, which generates synthetic fab data for visualization, and a dash app that provides an interface for users to analyze the data and gain insight. Dash is written on top of Plotly.js and React.js, providing an ideal framework for building and deploying data apps with customized user interfaces. The `src/dashboard` folder contains the code for the dash app and the `src/loaders` folder contains the code for the data loader. 8 | 9 | ## Get Started 10 | 11 | ### **Prerequisites** 12 | 13 | #### Dependencies 14 | The following libraries are required before you get started: 15 | 1. Git 16 | 2. Anaconda/Miniconda 17 | 3. Docker 18 | 4. Python3 19 | 20 | #### Download the repo 21 | Clone [Wafer Insights](https://github.com/intel/wafer-insights-with-classical-ml) repository. 22 | ``` 23 | git clone https://github.com/intel/wafer-insights-with-classical-ml 24 | cd wafer-insights-with-classical-ml 25 | git checkout v1.0.0 26 | ``` 27 | #### Download the Dataset 28 | Actual measurement data from the Intel fab cannot be shared with the public. Therefore, we provide a synthetic data loader to generate synthetic data using the `make_regression` function from the sklearn library, which has the following format: 29 | | **Type** | **Format** | **Rows** | **Columns** | 30 | | ---------------- | ---------- | -------- | ----------- | 31 | | Feature Dataset | Parquet | 25000 | 2000 | 32 | | Response Dataset | Parquet | 25000 | 1 | 33 | 34 | Refer to [How to Run](#how-to-run) to construct the dataset 35 | ### **Docker** 36 | Below setup and how-to-run sessions are for users who want to use the provided docker image. 37 | For bare metal environment, please go to [Bare Metal](#bare-metal). 38 | #### Setup 39 | 40 | ##### Pull Docker Image 41 | ``` 42 | docker pull intel/ai-workflows:wafer-insights 43 | ``` 44 | 45 | ##### Set Up Synthetic Data 46 | ``` 47 | docker run -a stdout \ 48 | -v $(pwd):/workspace \ 49 | --workdir /workspace/src/loaders/synthetic_loader \ 50 | --privileged --init --rm -it \ 51 | intel/ai-workflows:wafer-insights \ 52 | conda run --no-capture-output -n WI python loader.py 53 | ``` 54 | 55 | #### How to Run 56 | 57 | (Optional) Export related proxy into docker environment. 58 | ``` 59 | export DOCKER_RUN_ENVS="-e ftp_proxy=${ftp_proxy} \ 60 | -e FTP_PROXY=${FTP_PROXY} -e http_proxy=${http_proxy} \ 61 | -e HTTP_PROXY=${HTTP_PROXY} -e https_proxy=${https_proxy} \ 62 | -e HTTPS_PROXY=${HTTPS_PROXY} -e no_proxy=${no_proxy} \ 63 | -e NO_PROXY=${NO_PROXY} -e socks_proxy=${socks_proxy} \ 64 | -e SOCKS_PROXY=${SOCKS_PROXY}" 65 | ``` 66 | To run the pipeline, follow the below instructions outside of the docker instance. 67 | ``` 68 | export OUTPUT_DIR=/output 69 | ``` 70 | 71 | ``` 72 | docker run -a stdout $DOCKER_RUN_ENVS \ 73 | --env OUTPUT_DIR=${OUTPUT_DIR} \ 74 | --env PYTHONPATH=$PYTHONPATH:$PWD \ 75 | --volume ${OUTPUT_DIR}:/output \ 76 | --volume $(pwd):/workspace \ 77 | --workdir /workspace \ 78 | -p 8050:8050 \ 79 | --privileged --init --rm -it \ 80 | intel/ai-workflows:wafer-insights \ 81 | conda run --no-capture-output -n WI python src/dashboard/app.py 82 | ``` 83 | 84 | #### Output 85 | ``` 86 | $ make wafer-insight 87 | WARN[0000] The "PYTHONPATH" variable is not set. Defaulting to a blank string. 88 | [+] Building 0.1s (9/9) FINISHED 89 | => [internal] load build definition from Dockerfile.wafer-insights 0.0s 90 | => => transferring dockerfile: 47B 0.0s 91 | => [internal] load .dockerignore 0.0s 92 | => => transferring context: 2B 0.0s 93 | => [internal] load metadata for docker.io/library/ubuntu:20.04 0.0s 94 | => [1/5] FROM docker.io/library/ubuntu:20.04 0.0s 95 | => CACHED [2/5] RUN apt-get update && apt-get install --no-install-recommends --fix-missing -y ca-cer 0.0s 96 | => CACHED [3/5] RUN mkdir -p /workspace 0.0s 97 | => CACHED [4/5] RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O 0.0s 98 | => CACHED [5/5] RUN conda create -yn WI python=3.9 && source activate WI && conda install -y scik 0.0s 99 | => exporting to image 0.0s 100 | => => exporting layers 0.0s 101 | => => writing image sha256:dfa7411736694db4d3c8d0032f424fc88f0af98fabd163a659a90d0cc2dfe587 0.0s 102 | => => naming to docker.io/library/wafer-insights:inference-ubuntu-20.04 0.0s 103 | WARN[0000] Found orphan containers ([inference-wafer-analytics-1]) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up. 104 | [+] Running 1/1 105 | ⠿ Container inference-wafer-insight-1 Recreated 0.1s 106 | Attaching to inference-wafer-insight-1 107 | inference-wafer-insight-1 | [[-5.39543860e-04 2.39971569e-03 -3.42210731e-04 ... -2.35041980e-03 108 | inference-wafer-insight-1 | -1.81397056e-04 -2.09303234e-03] 109 | inference-wafer-insight-1 | [-1.00075542e-04 -5.41824409e-04 -2.38435358e-04 ... 3.39901582e-03 110 | inference-wafer-insight-1 | 3.35075678e-04 2.04678475e-03] 111 | inference-wafer-insight-1 | [-5.14076633e-04 -2.28770984e-03 3.52836617e-04 ... -3.59841471e-03 112 | inference-wafer-insight-1 | -2.57484490e-03 5.23169035e-04] 113 | inference-wafer-insight-1 | ... 114 | inference-wafer-insight-1 | [-3.13805323e-03 -3.16870576e-03 1.28447995e-03 ... -8.94258047e-05 115 | inference-wafer-insight-1 | 8.13668371e-04 -5.02239567e-04] 116 | inference-wafer-insight-1 | [-7.28863425e-04 2.32030465e-03 1.57134892e-03 ... 2.64884040e-04 117 | inference-wafer-insight-1 | -2.12739801e-03 -1.98500740e-04] 118 | inference-wafer-insight-1 | [-1.79534321e-03 6.97006847e-04 4.70415219e-04 ... -4.21349858e-04 119 | inference-wafer-insight-1 | 2.88895727e-03 4.20368128e-04]] 120 | inference-wafer-insight-1 | fcol`feature_0 fcol`feature_1 ... fcol`feature_1999 TEST_END_DATE 121 | inference-wafer-insight-1 | 0 -0.000540 0.002400 ... -0.002093 2022-06-24 17:57:44.060832 122 | inference-wafer-insight-1 | 1 -0.000100 -0.000542 ... 0.002047 2022-06-24 18:02:55.100832 123 | inference-wafer-insight-1 | 2 -0.000514 -0.002288 ... 0.000523 2022-06-24 18:08:06.140832 124 | inference-wafer-insight-1 | 3 -0.000020 -0.003073 ... 0.001036 2022-06-24 18:13:17.180832 125 | inference-wafer-insight-1 | 4 -0.001280 0.001955 ... -0.000343 2022-06-24 18:18:28.220832 126 | inference-wafer-insight-1 | 127 | inference-wafer-insight-1 | [5 rows x 2001 columns] 128 | inference-wafer-insight-1 | started_stacking 129 | inference-wafer-insight-1 | LOT7 WAFER3 PROCESS ... MEDIAN DEVREVSTEP TESTNAME`STRUCTURE_NAME 130 | inference-wafer-insight-1 | 0 DG0000000001 0 1234 ... -0.000540 DPMLD fcol`feature_0 131 | inference-wafer-insight-1 | 1 DG0000000001 1 1234 ... -0.000100 DPMLD fcol`feature_0 132 | inference-wafer-insight-1 | 2 DG0000000001 2 1234 ... -0.000514 DPMLD fcol`feature_0 133 | inference-wafer-insight-1 | 3 DG0000000001 3 1234 ... -0.000020 DPMLD fcol`feature_0 134 | inference-wafer-insight-1 | 4 DG0000000001 4 1234 ... -0.001280 DPMLD fcol`feature_0 135 | inference-wafer-insight-1 | 136 | inference-wafer-insight-1 | [5 rows x 10 columns] 137 | inference-wafer-insight-1 | 138 | inference-wafer-insight-1 | Dash is running on http://0.0.0.0:8050/ 139 | inference-wafer-insight-1 | 140 | inference-wafer-insight-1 | * Serving Flask app 'app' 141 | inference-wafer-insight-1 | * Debug mode: on 142 | ``` 143 | 144 | ### **Bare Metal** 145 | Below setup and how-to-run sessions are for users who want to use the bare metal environment. 146 | For docker environment, please go to [Docker](#docker). 147 | #### Setup 148 | First, set up the environment with conda using: 149 | ``` 150 | conda create -n WI 151 | conda activate WI 152 | pip install dash scikit-learn pandas pyarrow colorlover 153 | ``` 154 | #### How to Run 155 | To generate synthetic data for testing from the root directory: 156 | ``` 157 | cd src/loaders/synthetic_loader 158 | python loader.py 159 | ``` 160 | To run the dashboard: 161 | ``` 162 | export PYTHONPATH=$PYTHONPATH:$PWD 163 | python src/dashboard/app.py 164 | ``` 165 | The default dashboard URL is: http://0.0.0.0:8050/ 166 | 167 | ## Recommended Hardware 168 | The hardware below is recommended for use with this reference implementation. 169 | | **Name** | Description | 170 | | --------- | ---------------------------------------------------- | 171 | | CPU | Intel(R) Xeon(R) Gold 6252N CPU @ 2.30GHz (96 vCPUs) | 172 | | Free RAM | 367 GiB/376 GiB | 173 | | Disk Size | 2 TB | 174 | 175 | **Note: The code was developed and tested on a machine with this configuration. However, it may be sufficient to use a machine that is much less powerful than the recommended configuration.** 176 | 177 | ## Useful Resources 178 | [Intel AI Analytics Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html)
179 | [View All Containers and Solutions 🡢](https://www.intel.com/content/www/us/en/developer/tools/software-catalog/containers.html)
180 | 181 | ## Support 182 | [Report Issue](https://community.intel.com/t5/Intel-Optimized-AI-Frameworks/bd-p/optimized-ai-frameworks)
183 | -------------------------------------------------------------------------------- /analytics/classical-ml/recsys/training/README.md: -------------------------------------------------------------------------------- 1 | # Classical ML RecSys Training - Analytics with Python 2 | ## Description 3 | This document contains instructions on how to run RecSys Challenge Analytics with Python pipeline with make and docker compose. 4 | ## Project Structure 5 | ``` 6 | ├── analytics-with-python @ 1.0 7 | ├── docker-compose.yml 8 | ├── Makefile 9 | └── README.md 10 | ``` 11 | [_Makefile_](Makefile) 12 | ``` 13 | DATASET_DIR ?= /data/recsys2021 14 | FINAL_IMAGE_NAME ?= recsys-challenge 15 | OUTPUT_DIR ?= /output 16 | 17 | recsys-challenge: 18 | ./analytics-with-python/hadoop-folder-prep.sh . 19 | if ! docker network inspect hadoop ; then \ 20 | docker network create --driver=bridge hadoop; \ 21 | fi 22 | @DATASET_DIR=${DATASET_DIR} \ 23 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 24 | OUTPUT_DIR=${OUTPUT_DIR} \ 25 | docker compose up recsys-challenge --build 26 | 27 | clean: 28 | sudo rm -rf tmp 29 | docker network rm hadoop 30 | DATASET_DIR=${DATASET_DIR} CONFIG_DIR=${CONFIG_DIR} docker compose down 31 | ``` 32 | [_docker-compose.yml_](docker-compose.yml) 33 | ``` 34 | services: 35 | recsys-challenge: 36 | build: 37 | args: 38 | http_proxy: ${http_proxy} 39 | https_proxy: ${https_proxy} 40 | no_proxy: ${no_proxy} 41 | dockerfile: analytics-with-python/Dockerfile 42 | command: /mnt/code/run-all.sh 43 | container_name: hadoop-master 44 | environment: 45 | - http_proxy=${http_proxy} 46 | - https_proxy=${https_proxy} 47 | - no_proxy=${no_proxy} 48 | hostname: hadoop-master 49 | image: ${FINAL_IMAGE_NAME}:training-python-3.7-buster 50 | ports: 51 | - 8088:8088 52 | - 8888:8888 53 | - 8080:8080 54 | - 9870:9870 55 | - 9864:9864 56 | - 4040:4040 57 | - 18081:18081 58 | privileged: true 59 | volumes: 60 | - ${OUTPUT_DIR}:${OUTPUT_DIR} 61 | - /${DATASET_DIR}:/mnt/data 62 | - ./tmp:/mnt 63 | - ./analytics-with-python/config:/mnt/config 64 | - ./analytics-with-python:/mnt/code 65 | working_dir: /mnt/code 66 | ``` 67 | 68 | # RecSys Challenge 69 | End2End AI Workflow utilizing Analytics with Python. More information [here](https://github.com/intel-sandbox/applications.ai.appliedml.workflow.analyticswithpython) 70 | 71 | ## Quick Start 72 | * Pull and configure the dependent repo submodule `git submodule update --init --recursive`. 73 | 74 | * Install [Pipeline Repository Dependencies](../../../../README.md) 75 | 76 | * Other variables: 77 | 78 | | Variable Name | Default | Notes | 79 | | --- | --- | --- | 80 | | FINAL_IMAGE_NAME | `recsys-challenge` | Final Docker image name | 81 | | OUTPUT_DIR | `/output` | Output directory | 82 | | DATASET_DIR | `/data/recsys2021` | RecSys Dataset Directory | 83 | ## Build and Run 84 | Build and Run with defaults: 85 | ``` 86 | make recsys-challenge 87 | ``` 88 | ## Build and Run Example 89 | ``` 90 | $ make recsys-challenge 91 | ./analytics-with-python/hadoop-folder-prep.sh . 92 | -e 93 | remove path if already exists.... 94 | -e 95 | create folder for hadoop.... 96 | if ! docker network inspect hadoop ; then \ 97 | docker network create --driver=bridge hadoop; \ 98 | fi 99 | [] 100 | [+] Building 0.9s (13/13) FINISHED 101 | => [internal] load build definition from Dockerfile 0.0s 102 | => => transferring dockerfile: 2.32kB 0.0s 103 | => [internal] load .dockerignore 0.0s 104 | => => transferring context: 2B 0.0s 105 | => [internal] load metadata for docker.io/library/python:3.7-buster 0.8s 106 | => [auth] library/python:pull token for registry-1.docker.io 0.0s 107 | => [1/8] FROM docker.io/library/python:3.7-buster@sha256:2703aeb7b87e849ad2d4cdf25e1b21cf575ca1d2e1442a36f24017a481578222 0.0s 108 | => CACHED [2/8] RUN DEBIAN_FRONTEND=noninteractive apt-get -y update && DEBIAN_FRONTEND=noninteractive apt-get -y install --no-install-recommends openssh-server ssh wget vim net-tools git ht 0.0s 109 | => CACHED [3/8] RUN wget --no-check-certificate https://repo.huaweicloud.com/java/jdk/8u201-b09/jdk-8u201-linux-x64.tar.gz && tar -zxvf jdk-8u201-linux-x64.tar.gz && mv jdk1.8.0_201 /opt 0.0s 110 | => CACHED [4/8] RUN wget --no-check-certificate https://dlcdn.apache.org/hadoop/common/hadoop-3.3.3/hadoop-3.3.3.tar.gz && tar -zxvf hadoop-3.3.3.tar.gz && mv hadoop-3.3.3 /opt/hadoop-3. 0.0s 111 | => CACHED [5/8] RUN wget --no-check-certificate https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz && tar -zxvf spark-3.3.0-bin-hadoop3.tgz && mv spark-3.3.0-bin-hado 0.0s 112 | => CACHED [6/8] RUN wget --no-check-certificate http://distfiles.macports.org/scala2.12/scala-2.12.12.tgz && tar -zxvf scala-2.12.12.tgz && mv scala-2.12.12 /opt/scala-2.12.12 && rm 0.0s 113 | => CACHED [7/8] RUN ssh-keygen -t rsa -f /root/.ssh/id_rsa -P '' && cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys && sed -i 's/# Port 22/Port 12345/' /etc/ssh/ssh_config && 0.0s 114 | => CACHED [8/8] RUN pip install --no-cache-dir pyarrow findspark numpy pandas transformers torch pyrecdp sklearn xgboost 0.0s 115 | => exporting to image 0.0s 116 | => => exporting layers 0.0s 117 | => => writing image sha256:a76c8bf585a22bfffe825988f7cf6213bc8b737895694a0f55a7661f4805ffb9 0.0s 118 | => => naming to docker.io/library/recsys-challenge:training-python-3.7-buster 0.0s 119 | 120 | Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them 121 | [+] Running 1/0 122 | ⠿ Container hadoop-master Recreated 0.1s 123 | Attaching to hadoop-master 124 | hadoop-master | 125 | hadoop-master | prepare spark dev environment.... 126 | hadoop-master | 127 | hadoop-master | format namenode... 128 | ``` 129 | ... 130 | ``` 131 | hadoop-master | ######################### 132 | hadoop-master | ### retweet_timestamp 133 | hadoop-master | ######################### 134 | hadoop-master | Training..... 135 | hadoop-master | [0] train-logloss:0.62301 valid-logloss:0.62302 136 | hadoop-master | [25] train-logloss:0.24346 valid-logloss:0.24299 137 | hadoop-master | [50] train-logloss:0.23107 valid-logloss:0.23059 138 | hadoop-master | [75] train-logloss:0.22883 valid-logloss:0.22877 139 | hadoop-master | [100] train-logloss:0.22766 valid-logloss:0.22803 140 | hadoop-master | [125] train-logloss:0.22674 valid-logloss:0.22753 141 | hadoop-master | [150] train-logloss:0.22602 valid-logloss:0.22720 142 | hadoop-master | [175] train-logloss:0.22534 valid-logloss:0.22693 143 | hadoop-master | [200] train-logloss:0.22477 valid-logloss:0.22675 144 | hadoop-master | [225] train-logloss:0.22422 valid-logloss:0.22658 145 | hadoop-master | [249] train-logloss:0.22381 valid-logloss:0.22648 146 | hadoop-master | Predicting... 147 | hadoop-master | took 228.5 seconds 148 | hadoop-master | ######################### 149 | hadoop-master | ### retweet_with_comment_timestamp 150 | hadoop-master | ######################### 151 | hadoop-master | Training..... 152 | hadoop-master | [0] train-logloss:0.60022 valid-logloss:0.60020 153 | hadoop-master | [25] train-logloss:0.05844 valid-logloss:0.05846 154 | hadoop-master | [50] train-logloss:0.03246 valid-logloss:0.03270 155 | hadoop-master | [75] train-logloss:0.03087 valid-logloss:0.03150 156 | hadoop-master | [100] train-logloss:0.03037 valid-logloss:0.03133 157 | hadoop-master | [125] train-logloss:0.03002 valid-logloss:0.03127 158 | hadoop-master | [150] train-logloss:0.02971 valid-logloss:0.03125 159 | hadoop-master | [175] train-logloss:0.02948 valid-logloss:0.03124 160 | hadoop-master | [200] train-logloss:0.02923 valid-logloss:0.03123 161 | hadoop-master | [219] train-logloss:0.02906 valid-logloss:0.03123 162 | hadoop-master | Predicting... 163 | hadoop-master | took 201.8 seconds 164 | hadoop-master | ######################### 165 | hadoop-master | ### like_timestamp 166 | hadoop-master | ######################### 167 | hadoop-master | Training..... 168 | hadoop-master | [0] train-logloss:0.67215 valid-logloss:0.67171 169 | hadoop-master | [25] train-logloss:0.55620 valid-logloss:0.55312 170 | hadoop-master | [50] train-logloss:0.54695 valid-logloss:0.54384 171 | hadoop-master | [75] train-logloss:0.54348 valid-logloss:0.54068 172 | hadoop-master | [100] train-logloss:0.54142 valid-logloss:0.53901 173 | hadoop-master | [125] train-logloss:0.53950 valid-logloss:0.53753 174 | hadoop-master | [150] train-logloss:0.53816 valid-logloss:0.53661 175 | hadoop-master | [175] train-logloss:0.53689 valid-logloss:0.53576 176 | hadoop-master | [200] train-logloss:0.53588 valid-logloss:0.53516 177 | hadoop-master | [225] train-logloss:0.53500 valid-logloss:0.53470 178 | hadoop-master | [249] train-logloss:0.53422 valid-logloss:0.53431 179 | hadoop-master | Predicting... 180 | hadoop-master | took 230.8 seconds 181 | hadoop-master | reply_timestamp AP:0.13177 RCE:17.21939 182 | hadoop-master | retweet_timestamp AP:0.34489 RCE:19.32879 183 | hadoop-master | retweet_with_comment_timestamp AP:0.02778 RCE:8.86315 184 | hadoop-master | like_timestamp AP:0.70573 RCE:20.61987 185 | hadoop-master | 0.1318 17.2194 0.3449 19.3288 0.0278 8.8631 0.7057 20.6199 186 | hadoop-master | AVG AP: 0.3025420714922875 187 | hadoop-master | AVG RCE: 16.507797035487055 188 | hadoop-master | This notebook took 888.9 seconds 189 | hadoop-master | 190 | hadoop-master | 191 | hadoop-master | 192 | hadoop-master | all training finished! 193 | hadoop-master exited with code 0 194 | sudo rm -rf tmp 195 | ``` 196 | -------------------------------------------------------------------------------- /big-data/friesian/training/README.md: -------------------------------------------------------------------------------- 1 | # BigDL Friesian - Training 2 | 3 | ## Description 4 | This document contains instructions on how to train the Wide and Deep model with make and docker compose. 5 | ## Project Structure 6 | ``` 7 | ├── BigDL @ ai-workflow 8 | ├── DEVCATALOG.md 9 | ├── Dockerfile.friesian-training 10 | ├── Makefile 11 | ├── README.md 12 | └── docker-compose.yml 13 | ``` 14 | [_Makefile_](Makefile) 15 | ``` 16 | DATASET_DIR ?= /dataset 17 | FINAL_IMAGE_NAME ?= friesian-training 18 | MODEL_OUTPUT ?= /model_output 19 | 20 | friesian-training: 21 | wget https://labs.criteo.com/wp-content/uploads/2015/04/dac_sample.tar.gz 22 | tar -xvzf dac_sample.tar.gz 23 | mkdir -p ${DATASET_DIR}/data-csv 24 | mv dac_sample.txt ${DATASET_DIR}/data-csv/day_0.csv 25 | rm dac_sample.tar.gz 26 | @DATASET_DIR=${DATASET_DIR} \ 27 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 28 | MODEL_OUTPUT=${MODEL_OUTPUT} \ 29 | docker compose up friesian-training --build 30 | 31 | clean: 32 | @DATASET_DIR=${DATASET_DIR} \ 33 | OUTPUT_DIR=${MODEL_OUTPUT} \ 34 | docker compose down 35 | ``` 36 | [_docker-compose.yml_](docker-compose.yml) 37 | ``` 38 | services: 39 | csv-to-parquet: 40 | build: 41 | args: 42 | http_proxy: ${http_proxy} 43 | https_proxy: ${https_proxy} 44 | no_proxy: ${no_proxy} 45 | dockerfile: Dockerfile.friesian-training 46 | command: conda run -n bigdl --no-capture-output conda run -n bigdl --no-capture-output python3 csv_to_parquet.py --input /dataset/data-csv/day_0.csv --output /dataset/data-parquet/day_0.parquet 47 | environment: 48 | - DATASET_DIR=${DATASET_DIR} 49 | - MODEL_OUTPUT=${MODEL_OUTPUT} 50 | - http_proxy=${http_proxy} 51 | - https_proxy=${https_proxy} 52 | - no_proxy=${no_proxy} 53 | image: ${FINAL_IMAGE_NAME}:training-ubuntu-20.04 54 | privileged: true 55 | volumes: 56 | - ${DATASET_DIR}:/dataset 57 | - ${MODEL_OUTPUT}:/model 58 | - $PWD:/workspace 59 | working_dir: /workspace/BigDL/python/friesian/example/wnd 60 | preprocessing: 61 | build: 62 | args: 63 | http_proxy: ${http_proxy} 64 | https_proxy: ${https_proxy} 65 | no_proxy: ${no_proxy} 66 | dockerfile: Dockerfile.friesian-training 67 | command: conda run -n bigdl --no-capture-output python wnd_preprocessing.py --executor_cores 36 --executor_memory 50g --days 0-0 --input_folder /dataset/data-parquet --output_folder /dataset/data-processed --frequency_limit 15 --cross_sizes 10000,10000 68 | depends_on: 69 | csv-to-parquet: 70 | condition: service_completed_successfully 71 | environment: 72 | - DATASET_DIR=${DATASET_DIR} 73 | - MODEL_OUTPUT=${MODEL_OUTPUT} 74 | - http_proxy=${http_proxy} 75 | - https_proxy=${https_proxy} 76 | - no_proxy=${no_proxy} 77 | image: ${FINAL_IMAGE_NAME}:training-ubuntu-20.04 78 | privileged: true 79 | volumes: 80 | - ${DATASET_DIR}:/dataset 81 | - ${MODEL_OUTPUT}:/model 82 | - $PWD:/workspace 83 | working_dir: /workspace/BigDL/python/friesian/example/wnd 84 | friesian-training: 85 | build: 86 | args: 87 | http_proxy: ${http_proxy} 88 | https_proxy: ${https_proxy} 89 | no_proxy: ${no_proxy} 90 | dockerfile: Dockerfile.friesian-training 91 | command: conda run -n bigdl --no-capture-output python wnd_train.py --executor_cores 36 --executor_memory 50g --data_dir /dataset/data-processed --model_dir /model 92 | depends_on: 93 | preprocessing: 94 | condition: service_completed_successfully 95 | environment: 96 | - DATASET_DIR=${DATASET_DIR} 97 | - MODEL_OUTPUT=${MODEL_OUTPUT} 98 | - http_proxy=${http_proxy} 99 | - https_proxy=${https_proxy} 100 | - no_proxy=${no_proxy} 101 | image: ${FINAL_IMAGE_NAME}:training-ubuntu-20.04 102 | privileged: true 103 | volumes: 104 | - ${DATASET_DIR}:/dataset 105 | - ${MODEL_OUTPUT}:/model 106 | - $PWD:/workspace 107 | working_dir: /workspace/BigDL/python/friesian/example/wnd 108 | ``` 109 | 110 | # Train a WideAndDeep Model on the Criteo Dataset 111 | This example demonstrates how to use BigDL Friesian to preprocess the Criteo dataset and train the WideAndDeep model in a distributed fashion. 112 | 113 | End-to-End Recommendation Systems AI Workflow utilizing BigDL - Friesian. More information at [Intel Analytics - BigDL](https://github.com/intel-analytics/BigDL) 114 | 115 | ## Quick Start 116 | * Pull and configure the dependent repo submodule 117 | ``` 118 | git submodule update --init --recursive BigDL 119 | ``` 120 | 121 | * Install [Pipeline Repository Dependencies](../../../README.md) 122 | 123 | * Other variables: 124 | 125 | | Variable Name | Default | Notes | 126 | | --- | --- | --- | 127 | | DATASET_DIR | `/dataset` | Default directory for dataset to be downloaded. Dataset will be downloaded when running with `make` command | 128 | | FINAL_IMAGE_NAME | `friesian-training` | Final image name | 129 | | MODEL_OUTPUT | `/model_output` | Trained model will be produced in this directory | 130 | 131 | ## Build and Run 132 | Build and Run with defaults: 133 | ``` 134 | make friesian-training 135 | ``` 136 | 137 | ## Build and Run Example 138 | ``` 139 | $ DATASET_DIR=/localdisk/criteo-dataset MODEL_OUTPUT=/locadisk/model make friesian-training 140 | wget https://labs.criteo.com/wp-content/uploads/2015/04/dac_sample.tar.gz 141 | --2022-12-13 10:06:29-- https://labs.criteo.com/wp-content/uploads/2015/04/dac_sample.tar.gz 142 | Resolving proxy-dmz.intel.com (proxy-dmz.intel.com)... 10.7.211.16 143 | Connecting to proxy-dmz.intel.com (proxy-dmz.intel.com)|10.7.211.16|:912... connected. 144 | Proxy request sent, awaiting response... 200 OK 145 | Length: 8787154 (8.4M) [application/x-gzip] 146 | Saving to: ‘dac_sample.tar.gz’ 147 | 148 | dac_sample.tar.gz 100%[============================================================>] 8.38M 6.27MB/s in 1.3s 149 | 150 | 2022-12-13 10:06:31 (6.27 MB/s) - ‘dac_sample.tar.gz’ saved [8787154/8787154] 151 | 152 | tar -xvzf dac_sample.tar.gz 153 | tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime' 154 | tar: Ignoring unknown extended header keyword 'SCHILY.dev' 155 | tar: Ignoring unknown extended header keyword 'SCHILY.ino' 156 | tar: Ignoring unknown extended header keyword 'SCHILY.nlink' 157 | dac_sample.txt 158 | tar: Ignoring unknown extended header keyword 'SCHILY.dev' 159 | tar: Ignoring unknown extended header keyword 'SCHILY.ino' 160 | tar: Ignoring unknown extended header keyword 'SCHILY.nlink' 161 | ./._readme.txt 162 | tar: Ignoring unknown extended header keyword 'SCHILY.dev' 163 | tar: Ignoring unknown extended header keyword 'SCHILY.ino' 164 | tar: Ignoring unknown extended header keyword 'SCHILY.nlink' 165 | readme.txt 166 | tar: Ignoring unknown extended header keyword 'SCHILY.dev' 167 | tar: Ignoring unknown extended header keyword 'SCHILY.ino' 168 | tar: Ignoring unknown extended header keyword 'SCHILY.nlink' 169 | license.txt 170 | mkdir -p /localdisk/criteo-dataset/data-csv 171 | mv dac_sample.txt /localdisk/criteo-dataset/data-csv/day_0.csv 172 | rm dac_sample.tar.gz 173 | [+] Building 0.5s (10/10) FINISHED 174 | => [internal] load build definition from Dockerfile.friesian-training 0.0s 175 | => => transferring dockerfile: 50B 0.0s 176 | => [internal] load .dockerignore 0.0s 177 | => => transferring context: 2B 0.0s 178 | => [internal] load metadata for docker.io/library/ubuntu:20.04 0.4s 179 | => [1/6] FROM docker.io/library/ubuntu:20.04@sha256:0e0402cd13f68137edb0266e1d2c682f217814420f2d43d300ed8f65479b14fb 0.0s 180 | => CACHED [2/6] RUN apt-get update && apt-get install --no-install-recommends --fix-missing -y ca-certificates vim 0.0s 181 | => CACHED [3/6] RUN wget --no-check-certificate -q https://repo.huaweicloud.com/java/jdk/8u201-b09/jdk-8u201-linux-x64.tar.gz & 0.0s 182 | => CACHED [4/6] RUN apt-get update && wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O 0.0s 183 | => CACHED [5/6] RUN conda create -yn bigdl python=3.7.5 && source activate bigdl && conda update -y -n base -c defaults 0.0s 184 | => CACHED [6/6] RUN mkdir -p /workspace 0.0s 185 | => exporting to image 0.0s 186 | => => exporting layers 0.0s 187 | => => writing image sha256:3a788bfb21c562aacdb45eed2eb9232c05a7394ec4c9588016654f57f99e1232 0.0s 188 | => => naming to docker.io/library/friesian-training:training-ubuntu-20.04 0.0s 189 | [+] Running 3/3 190 | ⠿ Container training-csv-to-parquet-1 Recreated 0.1s 191 | ⠿ Container training-preprocessing-1 Recreated 0.2s 192 | ⠿ Container training-friesian-training-1 Recreated 0.1s 193 | Attaching to training-friesian-training-1 194 | ``` 195 | ... 196 | ``` 197 | training-friesian-training-1 | 198 | 63/79 [======================>.......] - ETA: 1s - loss: 0.5001 - binary_accuracy: 0.7751 - binary_crossentropy: 0.5001 - auc: 0.6836 199 | 65/79 [=======================>......] - ETA: 1s - loss: 0.5003 - binary_accuracy: 0.7749 - binary_crossentropy: 0.5003 - auc: 0.6842 200 | 66/79 [========================>.....] - ETA: 0s - loss: 0.5003 - binary_accuracy: 0.7750 - binary_crossentropy: 0.5003 - auc: 0.6839 201 | 67/79 [========================>.....] - ETA: 0s - loss: 0.5002 - binary_accuracy: 0.7751 - binary_crossentropy: 0.5002 - auc: 0.6840 202 | 69/79 [=========================>....] - ETA: 0s - loss: 0.5001 - binary_accuracy: 0.7749 - binary_crossentropy: 0.5001 - auc: 0.6854 203 | 70/79 [=========================>....] - ETA: 0s - loss: 0.4999 - binary_accuracy: 0.7750 - binary_crossentropy: 0.4999 - auc: 0.6855 204 | 72/79 [==========================>...] - ETA: 0s - loss: 0.4997 - binary_accuracy: 0.7752 - binary_crossentropy: 0.4997 - auc: 0.6857 205 | 73/79 [==========================>...] - ETA: 0s - loss: 0.4997 - binary_accuracy: 0.7752 - binary_crossentropy: 0.4997 - auc: 0.6859 206 | 74/79 [===========================>..] - ETA: 0s - loss: 0.4997 - binary_accuracy: 0.7749 - binary_crossentropy: 0.4997 - auc: 0.6862 207 | 76/79 [===========================>..] - ETA: 0s - loss: 0.4995 - binary_accuracy: 0.7749 - binary_crossentropy: 0.4995 - auc: 0.6873 208 | 77/79 [============================>.] - ETA: 0s - loss: 0.4995 - binary_accuracy: 0.7749 - binary_crossentropy: 0.4995 - auc: 0.6874 209 | 79/79 [==============================] - ETA: 0s - loss: 0.4993 - binary_accuracy: 0.7750 - binary_crossentropy: 0.4993 - auc: 0.6875 210 | training-friesian-training-1 | Training time is: 26.327171802520752 211 | 79/79 [==============================] - 6s 81ms/step - loss: 0.4993 - binary_accuracy: 0.7750 - binary_crossentropy: 0.4993 - auc: 0.6875 - val_loss: 0.5382 - val_binary_accuracy: 0.7730 - val_binary_crossentropy: 0.5382 - val_auc: 0.6826 212 | training-friesian-training-1 | Stopping orca context 213 | training-friesian-training-1 exited with code 0 214 | ``` 215 | 216 | ## Cleanup 217 | Remove containers, copied files, and special configurations 218 | ``` 219 | make clean 220 | ``` 221 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /big-data/aiok-ray/inference/README.md: -------------------------------------------------------------------------------- 1 | # Ray DLRM INFERENCE - Recommendation-Ray 2 | ## Description 3 | This document contains instructions on how to run recommendation-ray pipeline with make and docker compose. 4 | ## Project Structure 5 | ``` 6 | ├── AIOK_Ray @ aiok-ray-v0.2 7 | ├── DEVCATALOG.md 8 | ├── Makefile 9 | ├── README.md 10 | └── docker-compose.yml 11 | ``` 12 | [_Makefile_](Makefile) 13 | ``` 14 | DATASET_DIR ?= ./data 15 | FINAL_IMAGE_NAME ?= recommendation-ray 16 | CHECKPOINT_DIR ?= /output 17 | RUN_MODE ?= kaggle 18 | DOCKER_NETWORK_NAME = ray-inference 19 | 20 | recommendation-ray: 21 | if [ ! -d "AIOK_Ray/dlrm_all/dlrm/dlrm" ]; then \ 22 | CWD=${PWD}; \ 23 | cd AIOK_Ray/; \ 24 | sh dlrm_all/dlrm/patch_dlrm.sh; \ 25 | cd ${CWD}; \ 26 | fi 27 | @wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.12.0-Linux-x86_64.sh \ 28 | -P AIOK_Ray/Dockerfile-ubuntu18.04/ \ 29 | -O AIOK_Ray/Dockerfile-ubuntu18.04/miniconda.sh 30 | if [ ! "$(shell docker network ls | grep ${DOCKER_NETWORK_NAME})" ]; then \ 31 | docker network create --driver=bridge ${DOCKER_NETWORK_NAME} ; \ 32 | fi 33 | @DATASET_DIR=${DATASET_DIR} \ 34 | FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \ 35 | CHECKPOINT_DIR=${CHECKPOINT_DIR} \ 36 | RUN_MODE=${RUN_MODE} \ 37 | docker compose up recommendation-ray --build 38 | 39 | clean: 40 | docker network rm ${DOCKER_NETWORK_NAME} 41 | DATASET_DIR=${DATASET_DIR} OUTPUT_DIR=${OUTPUT_DIR} docker compose down 42 | ``` 43 | [_docker-compose.yml_](docker-compose.yml) 44 | ``` 45 | networks: 46 | ray-inference: 47 | external: true 48 | services: 49 | recommendation-ray: 50 | build: 51 | args: 52 | http_proxy: ${http_proxy} 53 | https_proxy: ${https_proxy} 54 | no_proxy: ${no_proxy} 55 | dockerfile: DockerfilePytorch 56 | context: AIOK_Ray/Dockerfile-ubuntu18.04 57 | command: 58 | - /bin/bash 59 | - -c 60 | - | 61 | bash $$APP_DIR/scripts/run_inference_docker.sh $RUN_MODE 62 | container_name: ray-inference 63 | hostname: ray 64 | networks: 65 | - ray-inference 66 | environment: 67 | - http_proxy=${http_proxy} 68 | - https_proxy=${https_proxy} 69 | - no_proxy=${no_proxy} 70 | - RUN_MODE=${RUN_MODE} 71 | - APP_DIR=/home/vmagent/app/e2eaiok 72 | - OUTPUT_DIR=/output 73 | image: ${FINAL_IMAGE_NAME}:training-inference-ubuntu-18.04 74 | privileged: true 75 | devices: 76 | - /dev/dri 77 | volumes: 78 | - ${DATASET_DIR}:/home/vmagent/app/dataset/criteo 79 | - ./AIOK_Ray:/home/vmagent/app/e2eaiok 80 | - ${CHECKPOINT_DIR}:/output 81 | working_dir: /home/vmagent/app/e2eaiok/dlrm_all/dlrm/ 82 | shm_size: 300g 83 | ``` 84 | 85 | 86 | # Ray Recommendation System 87 | End2End AI Workflow utilizing Ray framework for simplifying the end-to-end process at large scale. More information [here](https://github.com/intel/e2eAIOK/tree/aiok-ray-v0.2) 88 | 89 | ## Quick Start 90 | * Pull and configure the dependent repo submodule `git submodule update --init --recursive`. 91 | 92 | * Install [Pipeline Repository Dependencies](../../../README.md) 93 | 94 | * The model supports following three datasets: kaggle, criteo_small, criteo_full. The instructions to download each of them is provided at [README.md](https://github.com/intel/e2eAIOK/tree/aiok-ray-v0.2/README.md#dataset) 95 | 96 | * This pipeline requires the pre-trained model. Please run the [training pipeline](../training/README.md) before running inference to get the trained model. 97 | 98 | * Other variables: 99 | 100 | | Variable Name | Default | Notes | 101 | | --- | --- | --- | 102 | | DATASET_DIR | `./data` | Dataset directory | 103 | | RUN_MODE | `kaggle` | Dataset run mode from `kaggle`, `criteo_small`, `criteo_full` | 104 | | FINAL_IMAGE_NAME | `recommendation-ray` | Final Docker image name | 105 | | CHECKPOINT_DIR | `/output` | Checkpoint directory. Should contain a directory result/ with checkoints saved from training | 106 | 107 | ## Build and Run 108 | Build and Run with defaults: 109 | ``` 110 | make recommendation-ray 111 | ``` 112 | 113 | ## Build and Run Example 114 | ``` 115 | $ DATASET_DIR=/localdisk/sharvils/data/criteo_kaggle/ CHECKPOINT_DIR=.output/ RUN_MODE=kaggle make recommendation-ray 116 | 117 | => [internal] load build definition from DockerfilePytorch 0.0s 118 | => => transferring dockerfile: 39B 0.0s 119 | => [internal] load .dockerignore 0.0s 120 | => => transferring context: 2B 0.0s 121 | => [internal] load metadata for docker.io/library/ubuntu:18.04 0.3s 122 | => [internal] load build context 0.0s 123 | => => transferring context: 68B 0.0s 124 | => [ 1/40] FROM docker.io/library/ubuntu:18.04@sha256:daf3e62183e8aa9a56878a685ed26f3af3dd8c08c8fd11ef1c167a1aa9bd66a3 0.0s 125 | => CACHED [ 2/40] WORKDIR /root/ 0.0s 126 | => CACHED [ 3/40] RUN apt-get update -y && apt-get upgrade -y && apt-get install -y openjdk-8-jre build-essential cmake wget curl gi 0.0s 127 | => CACHED [ 4/40] COPY miniconda.sh . 0.0s 128 | => CACHED [ 5/40] COPY spark-env.sh . 0.0s 129 | => CACHED [ 6/40] RUN ls ~/ 0.0s 130 | => CACHED [ 7/40] RUN /bin/bash ~/miniconda.sh -b -p /opt/intel/oneapi/intelpython/latest 0.0s 131 | => CACHED [ 8/40] RUN yes | conda create -n pytorch_mlperf python=3.7 0.0s 132 | => CACHED [ 9/40] RUN conda install gxx_linux-64==8.4.0 0.0s 133 | => CACHED [10/40] RUN cp /opt/intel/oneapi/intelpython/latest/lib/python3.7/_sysconfigdata_x86_64_conda_cos6_linux_gnu.py /opt/intel 0.0s 134 | => CACHED [11/40] RUN cp /opt/intel/oneapi/intelpython/latest/envs/pytorch_mlperf/lib/python3.7/_sysconfigdata_x86_64_conda_cos6_lin 0.0s 135 | => CACHED [12/40] RUN cp -r /opt/intel/oneapi/intelpython/latest/envs/pytorch_mlperf/lib/* /opt/intel/oneapi/intelpython/latest/envs 0.0s 136 | => CACHED [13/40] RUN python -m pip install sklearn onnx tqdm lark-parser pyyaml 0.0s 137 | => CACHED [14/40] RUN conda install ninja cffi typing --no-update-deps 0.0s 138 | => CACHED [15/40] RUN conda install intel-openmp mkl mkl-include numpy -c intel --no-update-deps 0.0s 139 | => CACHED [16/40] RUN conda install -c conda-forge gperftools 0.0s 140 | => CACHED [17/40] RUN git clone https://github.com/pytorch/pytorch.git && cd pytorch && git checkout tags/v1.5.0-rc3 -b v1.5-rc3 && 0.0s 141 | => CACHED [18/40] RUN git clone https://github.com/intel/intel-extension-for-pytorch.git && cd intel-extension-for-pytorch && git ch 0.0s 142 | => CACHED [19/40] RUN cd intel-extension-for-pytorch && cp torch_patches/0001-enable-Intel-Extension-for-CPU-enable-CCL-backend.patc 0.0s 143 | => CACHED [20/40] RUN cp -r /opt/intel/oneapi/intelpython/latest/envs/pytorch_mlperf/lib/* /opt/intel/oneapi/intelpython/latest/envs 0.0s 144 | => CACHED [21/40] RUN cd pytorch && python setup.py install 0.0s 145 | => CACHED [22/40] RUN cd intel-extension-for-pytorch && python setup.py install 0.0s 146 | => CACHED [23/40] RUN git clone https://github.com/oneapi-src/oneCCL.git && cd oneCCL && git checkout 2021.1-beta07-1 && mkdir build 0.0s 147 | => CACHED [24/40] RUN git clone https://github.com/intel/torch-ccl.git && cd torch-ccl && git checkout 2021.1-beta07-1 0.0s 148 | => CACHED [25/40] RUN source /opt/intel/oneapi/intelpython/latest/envs/pytorch_mlperf/.local/env/setvars.sh && cd torch-ccl && pytho 0.0s 149 | => CACHED [26/40] RUN python -m pip install --no-cache-dir --ignore-installed sigopt==7.5.0 pandas pytest prefetch_generator tensorb 0.0s 150 | => CACHED [27/40] RUN python -m pip install "git+https://github.com/mlperf/logging.git@1.0.0" 0.0s 151 | => CACHED [28/40] RUN pip install ray==2.1.0 raydp-nightly pyrecdp pandas scikit-learn "pyarrow<7.0.0" 0.0s 152 | => CACHED [29/40] RUN apt-get update -y && apt-get install -y openssh-server pssh sshpass vim 0.0s 153 | => CACHED [30/40] RUN sed -i 's/#Port 22/Port 12346/g' /etc/ssh/sshd_config 0.0s 154 | => CACHED [31/40] RUN sed -i 's/# Port 22/ Port 12346/g' /etc/ssh/ssh_config 0.0s 155 | => CACHED [32/40] RUN echo 'PermitRootLogin yes' >> /etc/ssh/sshd_config 0.0s 156 | => CACHED [33/40] RUN conda init bash 0.0s 157 | => CACHED [34/40] RUN echo "source /opt/intel/oneapi/intelpython/latest/envs/pytorch_mlperf/.local/env/setvars.sh" >> /etc/bash.bash 0.0s 158 | => CACHED [35/40] RUN echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/oneapi/intelpython/latest/envs/pytorch_mlperf/lib/pyt 0.0s 159 | => CACHED [36/40] RUN echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/oneapi/intelpython/latest/envs/pytorch_mlperf/lib/pyt 0.0s 160 | => CACHED [37/40] RUN echo "source ~/spark-env.sh" >> /etc/bash.bashrc 0.0s 161 | => CACHED [38/40] RUN echo "KMP_BLOCKTIME=1" >> /etc/bash.bashrc 0.0s 162 | => CACHED [39/40] RUN echo "KMP_AFFINITY="granularity=fine,compact,1,0"" >> /etc/bash.bashrc 0.0s 163 | => CACHED [40/40] RUN echo "root:docker" | chpasswd 0.0s 164 | => exporting to image 0.0s 165 | => => exporting layers 0.0s 166 | => => writing image sha256:740242ef084e164945902d271a7edf9291015b5a3ad9fa79b8527e452ece03b3 0.0s 167 | => => naming to docker.io/library/recommendation-ray:training-inference-ubuntu-18.04 0.0s 168 | [+] Running 1/1 169 | ⠿ Container ray-inference Created 0.1s 170 | Attaching to ray-inference 171 | ray-inference | check cmd 172 | ray-inference | check dataset 173 | ray-inference | check data path: /home/vmagent/app/dataset/criteo 174 | ray-inference | check kaggle dataset 175 | ``` 176 | ... 177 | ``` 178 | ray-inference | [1] Start inference==========================: 179 | ray-inference | [0] Start inference==========================: 180 | ray-inference | [1] loss 0.462588, auc 0.7900 accuracy 78.372 % 181 | ray-inference | [1] Test time:1.955101728439331 182 | ray-inference | [1] Total results length:3274330 183 | ray-inference | [0] loss 0.462588, auc 0.7900 accuracy 78.372 % 184 | ray-inference | inference time is 27 seconds. 185 | ``` 186 | -------------------------------------------------------------------------------- /classification/tensorflow/bert_base/inference/DEVCATALOG.md: -------------------------------------------------------------------------------- 1 | # TensorFlow BERT Base Inference - AWS SageMaker 2 | 3 | Run inference on the BERT base model with TensorFlow on Intel's hardware on Amazon Sagemaker. 4 | 5 | Check out more workflow examples and reference implementations in the 6 | [Developer Catalog](https://developer.intel.com/aireferenceimplementations). 7 | 8 | ## Overview 9 | This workflow demonstrates how you can use Intel’s CPU hardware (Cascade Lake or above) and related optimized software to perform cloud inference on the Amazon Sagemaker platform. A step-by-step Jupyter notebook is provided to perform the following: 10 | 11 | 1. Specify AWS information 12 | 2. Build a custom docker image for inference 13 | 3. Deploy the TensorFlow model using Sagemaker, with options to change the instance type and number of nodes 14 | 4. Preprocess the input data and send it to the endpoint 15 | 16 | For detailed information about the workflow, go to the [Cloud Training and Cloud Inference on Amazon Sagemaker/Elastic Kubernetes Service](https://github.com/intel/NLP-Workflow-with-AWS) GitHub repository and follow the instructions for AWS inference. 17 | 18 | ## Hardware Requirements 19 | We recommend you use the following hardware for this reference implementation. 20 | | **Name** | **Description** 21 | | :--- | :--- 22 | | CPU | Intel® Xeon® processor family, 2nd Gen or newer 23 | | Usable RAM | 16 GB 24 | | Disk Size | 256 GB 25 | 26 | ## How it Works 27 | In this workflow, we'll use a pretrained BERT Base model and perform inference with the Amazon Sagemaker infrastructure. This diagram shows the architecture for both training and inference but only the inference path is demonstrated in this workflow. 28 | 29 | ### Architecture 30 | ![sagemaker_architecture](https://user-images.githubusercontent.com/43555799/207917598-ec21b0c5-0915-4a3b-a5e2-33458051f286.png) 31 | 32 | ### Model Spec 33 | In this workflow, we use the uncased BERT base model. Parameters can be changed depending on your requirements. 34 | 35 | ```python 36 | bert-base-uncased-config = { 37 | "architectures": [ 38 | "BertForMaskedLM" 39 | ], 40 | "attention_probs_dropout_prob": 0.1, 41 | "gradient_checkpointing": false, 42 | "hidden_act": "gelu", 43 | "hidden_dropout_prob": 0.1, 44 | "hidden_size": 768, 45 | "initializer_range": 0.02, 46 | "intermediate_size": 3072, 47 | "layer_norm_eps": 1e-12, 48 | "max_position_embeddings": 128, 49 | "model_type": "bert", 50 | "num_attention_heads": 12, 51 | "num_hidden_layers": 12, 52 | "pad_token_id": 0, 53 | "position_embedding_type": "absolute", 54 | "transformers_version": "4.21.1", 55 | "type_vocab_size": 2, 56 | "use_cache": true, 57 | "vocab_size": 30522 58 | } 59 | ``` 60 | 61 | ## Get Started 62 | 63 | ### Download the Workflow Repository 64 | Create a working directory for the workflow and clone the [Main Repository](https://github.com/intel/NLP-Workflow-with-AWS) repository into your working directory. 65 | ``` 66 | mkdir ~/work && cd ~/work 67 | git clone https://github.com/intel/NLP-Workflow-with-AWS.git 68 | cd NLP-Workflow-with-AWS.git 69 | git checkout main 70 | ``` 71 | ### Download the Datasets 72 | No Intel-supplied dataset is needed, but you will need your own input for inference. 73 | 74 | ## Run Using Docker 75 | Follow these instructions to set up and run our provided Docker image. 76 | For running on bare metal, see the [bare metal instructions](#run-using-bare-metal) 77 | instructions. 78 | 79 | ### Set Up Docker Engine 80 | You'll need to install Docker Engine on your development system. 81 | Note that while **Docker Engine** is free to use, **Docker Desktop** may require 82 | you to purchase a license. See the [Docker Engine Server installation 83 | instructions](https://docs.docker.com/engine/install/#server) for details. 84 | 85 | Because the Docker image is run on a cloud service, you will need Azure credentials to perform inference related operations: 86 | - [Set up the Azure Machine Learning Account](https://azure.microsoft.com/en-us/free/machine-learning) 87 | - [Configure the Azure credentials using the Command-Line Interface](https://docs.microsoft.com/en-us/cli/azure/authenticate-azure-cli) 88 | - [Compute targets in Azure Machine Learning](https://learn.microsoft.com/en-us/azure/machine-learning/concept-compute-target) 89 | - [Virtual Machine Products Available in Your Region](https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/?products=virtual-machines®ions=us-east) 90 | 91 | 92 | ### Set Up AWS Credentials 93 | You will need AWS credentials and the related AWS CLI installed on the machine to push data/docker image to the Amazon Elastic Container Registry (Amazon ECR). 94 | 95 | Set up an [AWS Credential Account](https://aws.amazon.com/account/) and [configure](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) it. 96 | 97 | ### Set Up Docker Image 98 | Pull the provided docker image. 99 | ``` 100 | docker pull intel/ai-workflows:nlp-aws-sagemaker 101 | ``` 102 | If your environment requires a proxy to access the internet, export your 103 | development system's proxy settings to the docker environment: 104 | ``` 105 | export DOCKER_RUN_ENVS="-e ftp_proxy=${ftp_proxy} \ 106 | -e FTP_PROXY=${FTP_PROXY} -e http_proxy=${http_proxy} \ 107 | -e HTTP_PROXY=${HTTP_PROXY} -e https_proxy=${https_proxy} \ 108 | -e HTTPS_PROXY=${HTTPS_PROXY} -e no_proxy=${no_proxy} \ 109 | -e NO_PROXY=${NO_PROXY} -e socks_proxy=${socks_proxy} \ 110 | -e SOCKS_PROXY=${SOCKS_PROXY}" 111 | ``` 112 | 113 | ### Run Docker Image 114 | Provide your own dataset and a value for ``path to dataset`` and 115 | run the workflow using the ``docker run`` command, as shown. 116 | ``` 117 | export DATASET_DIR= 118 | export OUTPUT_DIR=/output 119 | docker run -a stdout $DOCKER_RUN_ENVS \ 120 | --env DATASET=${DATASET} \ 121 | --env OUTPUT_DIR=${OUTPUT_DIR} \ 122 | --volume ${DATASET_DIR}:/workspace/data \ 123 | --volume ${OUTPUT_DIR}:/output \ 124 | --volume ${PWD}:/workspace \ 125 | --workdir /workspace \ 126 | --privileged --init -it --rm --pull always \ 127 | intel/ai-workflows:nlp-aws-sagemaker 128 | ``` 129 | 130 | ### Sagemaker Inference 131 | After starting the container, execute the following command in the interactive shell. 132 | ``` 133 | cd /root/notebooks 134 | jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root 135 | ``` 136 | Start the notebook with "intel-sagemaker-inference" in the filename. 137 | 138 | --- 139 | 140 | ## Run Using Bare Metal 141 | This workflow requires Docker and cannot be run using bare metal. 142 | 143 | ## Expected Output 144 | Running the Docker image should give the following output: 145 | ``` 146 | $ AWS_CSV_FILE=./aws_config.csv S3_MODEL_URI="s3://model.tar.gz" ROLE="role" make nlp-sagemaker 147 | [+] Building 0.1s (9/9) FINISHED 148 | => [internal] load build definition from Dockerfile 0.0s 149 | => => transferring dockerfile: 32B 0.0s 150 | => [internal] load .dockerignore 0.0s 151 | => => transferring context: 2B 0.0s 152 | => [internal] load metadata for docker.io/library/ubuntu:20.04 0.0s 153 | => [1/5] FROM docker.io/library/ubuntu:20.04 0.0s 154 | => CACHED [2/5] RUN apt-get update && DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get -y install tzdata=2022c-0ubuntu0.20.04.0 --no-install-recommends && rm -rf /var/lib/apt/lists/* 0.0s 155 | => CACHED [3/5] RUN apt-get -y update && apt-get install -y --no-install-recommends wget=1.20.3-1ubuntu2 nginx=1.18.0-0ubuntu1.3 cmake=3.16.3-1ubuntu1 software-prope 0.0s 156 | => CACHED [4/5] RUN pip install --no-cache-dir boto3==1.24.15 && pip install --no-cache-dir sagemaker==2.96.0 && pip install --no-cache-dir tensorflow-cpu==2.9.1 && pip install --no-cache-dir 0.0s 157 | => CACHED [5/5] RUN pip install --no-cache-dir virtualenv==20.14.1 && virtualenv intel_neural_compressor_venv && . intel_neural_compressor_venv/bin/activate && pip install --no-cache-dir Cy 0.0s 158 | => exporting to image 0.0s 159 | => => exporting layers 0.0s 160 | => => writing image sha256:91b43c6975feab4db06cf34a9635906d2781102a05d406b93c5bf2eb87c30a94 0.0s 161 | => => naming to docker.io/library/intel_amazon_cloud_trainandinf:inference-ubuntu-20.04 0.0s 162 | [+] Running 1/0 163 | ⠿ Container bert_uncased_base-aws-sagemaker-1 Created 0.0s 164 | Attaching to bert_uncased_base-aws-sagemaker-1 165 | bert_uncased_base-aws-sagemaker-1 | [NbConvertApp] Converting notebook 1.0-intel-sagemaker-inference.ipynb to python 166 | bert_uncased_base-aws-sagemaker-1 | [NbConvertApp] Writing 3597 bytes to 1.0-intel-sagemaker-inference.py 167 | bert_uncased_base-aws-sagemaker-1 | update_endpoint is a no-op in sagemaker>=2. 168 | bert_uncased_base-aws-sagemaker-1 | See: https://sagemaker.readthedocs.io/en/stable/v2.html for details. 169 | bert_uncased_base-aws-sagemaker-1 | ---!/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (3.0.4) doesn't match a supported version! 170 | bert_uncased_base-aws-sagemaker-1 | warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " 171 | bert_uncased_base-aws-sagemaker-1 exited with code 0 172 | ``` 173 | 174 | After running inference, the Jupyter notebook will print out the prediction value as a number which represents the confidence or probability the two input sentences have the same meaning. A value of 0 means the sentences do not have the same meaning. A value of 1 means the sentences should have the same meaning. 175 | 176 | ## Summary and Next Steps 177 | In this workflow, you loaded a Docker image and performed inference on a TensorFlow BERT base model on Amazon Sagemaker using Intel® Xeon® Scalable Processors. The [GitHub repository](https://github.com/intel/NLP-Workflow-with-AWS/tree/main) also contains workflows for training on Sagemaker and training and inference on Elastic Kubernetes Service (EKS). 178 | 179 | ## Learn More 180 | For more information or to read about other relevant workflow 181 | examples, see these guides and software resources: 182 | 183 | - Put ref links and descriptions here, for example 184 | - [Intel® AI Analytics Toolkit (AI Kit)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html) 185 | - [Azure Machine Learning Documentation](https://learn.microsoft.com/en-us/azure/machine-learning/) 186 | - links to other similar or related items from the dev catalog 187 | 188 | ## Troubleshooting 189 | Issues, problems, and their workarounds if possible, will be listed here. 190 | 191 | ## Support 192 | We track bugs and enhancement requests using [GitHub issues](https://github.com/intel/NLP-Workflow-with-AWS/issues). Search through these issues before submitting your own bug or enhancement request. 193 | -------------------------------------------------------------------------------- /language_modeling/pytorch/bert_base/training/DEVCATALOG.md: -------------------------------------------------------------------------------- 1 | # Intel® NLP workflow for Azure* ML - Training 2 | Learn how to use Intel's XPU hardware and Intel optimized software to perform distributed training on the Azure Machine Learning Platform with PyTorch\*, Intel® Extension for PyTorch\*, Hugging Face, and Intel® Neural Compressor. 3 | 4 | Check out more workflow examples and reference implementations in the 5 | [Developer Catalog](https://developer.intel.com/aireferenceimplementations). 6 | 7 | ## Overview 8 | This workflow demonstrates how to use Intel’s XPU hardware (e.g.: CPU - Ice Lake or above) and related optimized software to perform distributed training on the Azure Machine Learning Platform (Azure ML). The main software packages used here are Intel® Extension for PyTorch\*, PyTorch\*, Hugging Face, Azure Machine Learning Platform, and Intel® Neural Compressor. 9 | 10 | Instructions are provided to perform the following: 11 | 12 | 1. Specify Azure ML information 13 | 2. Build a custom docker image for training 14 | 3. Train a PyTorch model using Azure ML, with options to change the instance type and number of nodes 15 | 16 | For more detailed information, please visit the [Intel® NLP workflow for Azure* ML](https://github.com/intel/Intel-NLP-workflow-for-Azure-ML) GitHub repository. 17 | 18 | ## Recommended Hardware 19 | We recommend you use the following hardware for this reference implementation. 20 | | **Name** | **Description** | 21 | | ---------- | ----------------------------- | 22 | | CPU | Intel CPU - Ice Lake or above | 23 | | Usable RAM | 16 GB | 24 | | Disk Size | 256 GB | 25 | 26 | ## How it Works 27 | This workflow uses the Azure ML infrastructure to fine-tune a pretrained BERT base model. While the following diagram shows the architecture for both training and inference, this specific workflow is focused on the training portion. See the [Intel® NLP workflow for Azure ML - Inference](https://github.com/intel/ai-workflows/blob/main/language_modeling/pytorch/bert_base/inference/DEVCATALOG.md) workflow that uses this trained model. 28 | 29 | ### Architecture 30 | 31 | AzureML: 32 | 33 | ![azureml_architecture](https://user-images.githubusercontent.com/43555799/205149722-e37dcec5-5ef2-4440-92f2-9dc243b9e556.jpg) 34 | 35 | ### Model Spec 36 | The uncased BERT base model is used to demonstrate this workflow. 37 | 38 | ```python 39 | bert-base-uncased-config = { 40 | "architectures": [ 41 | "BertForMaskedLM" 42 | ], 43 | "attention_probs_dropout_prob": 0.1, 44 | "gradient_checkpointing": false, 45 | "hidden_act": "gelu", 46 | "hidden_dropout_prob": 0.1, 47 | "hidden_size": 768, 48 | "initializer_range": 0.02, 49 | "intermediate_size": 3072, 50 | "layer_norm_eps": 1e-12, 51 | "max_position_embeddings": 128, 52 | "model_type": "bert", 53 | "num_attention_heads": 12, 54 | "num_hidden_layers": 12, 55 | "pad_token_id": 0, 56 | "position_embedding_type": "absolute", 57 | "transformers_version": "4.21.1", 58 | "type_vocab_size": 2, 59 | "use_cache": true, 60 | "vocab_size": 30522 61 | } 62 | ``` 63 | 64 | ### Dataset 65 | [Microsoft Research Paraphrase Corpus](https://www.microsoft.com/en-us/research/publication/automatically-constructing-a-corpus-of-sentential-paraphrases/) is used as the dataset. 66 | 67 | | **Type** | **Format** | **Rows** 68 | | :--- | :--- | :--- 69 | | Training Dataset | HuggingFace Dataset | 3668 70 | | Testing Dataset | HuggingFace Dataset | 1725 71 | 72 | ## Get Started 73 | 74 | #### Download the workflow repository 75 | Clone [Intel® NLP workflow for Azure* ML](https://github.com/intel/Intel-NLP-workflow-for-Azure-ML) repository. 76 | ``` 77 | git clone https://github.com/intel/Intel-NLP-workflow-for-Azure-ML.git 78 | cd Intel-NLP-workflow-for-Azure-ML 79 | git checkout v1.0.1 80 | ``` 81 | 82 | #### Download the Datasets 83 | The dataset will be downloaded the first time the training runs. 84 | 85 | 86 | ## Run Using Docker 87 | *Follow these instructions to set up and run our provided Docker image. 88 | For running on bare metal, see the [bare metal instructions](#run-using-bare-metal) 89 | instructions.* 90 | 91 | ### Set Up Docker Engine 92 | You'll need to install Docker Engine on your development system. 93 | Note that while **Docker Engine** is free to use, **Docker Desktop** may require 94 | you to purchase a license. See the [Docker Engine Server installation 95 | instructions](https://docs.docker.com/engine/install/#server) for details. 96 | 97 | Because the Docker image is run on a cloud service, you will need Azure credentials to perform training and inference related operations: 98 | - [Set up the Azure Machine Learning Account](https://azure.microsoft.com/en-us/free/machine-learning) 99 | - [Configure the Azure credentials using the Command-Line Interface](https://docs.microsoft.com/en-us/cli/azure/authenticate-azure-cli) 100 | - [Compute targets in Azure Machine Learning](https://learn.microsoft.com/en-us/azure/machine-learning/concept-compute-target) 101 | - [Virtual Machine Products Available in Your Region](https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/?products=virtual-machines®ions=us-east) 102 | 103 | ### Set Up Docker Image 104 | Pull the provided docker image. 105 | ``` 106 | docker pull intel/ai-workflows:nlp-azure-training 107 | ``` 108 | 109 | If your environment requires a proxy to access the internet, export your 110 | development system's proxy settings to the docker environment: 111 | ``` 112 | export DOCKER_RUN_ENVS="-e ftp_proxy=${ftp_proxy} \ 113 | -e FTP_PROXY=${FTP_PROXY} -e http_proxy=${http_proxy} \ 114 | -e HTTP_PROXY=${HTTP_PROXY} -e https_proxy=${https_proxy} \ 115 | -e HTTPS_PROXY=${HTTPS_PROXY} -e no_proxy=${no_proxy} \ 116 | -e NO_PROXY=${NO_PROXY} -e socks_proxy=${socks_proxy} \ 117 | -e SOCKS_PROXY=${SOCKS_PROXY}" 118 | ``` 119 | 120 | ### Run Docker Image 121 | Below setup and how-to-run sessions are for users who want to use the provided docker image to run the entire pipeline. 122 | For interactive set up, please go to [Interactive Docker](#interactive-docker). 123 | 124 | #### Setup 125 | Download the `config.json` file from your Azure ML Studio Workspace. 126 | 127 | #### How to run 128 | Run the workflow using the ``docker run`` command, as shown: (example) 129 | ``` 130 | export AZURE_CONFIG_FILE= 131 | 132 | docker run \ 133 | --volume ${PWD}/notebooks:/root/notebooks \ 134 | --volume ${PWD}/src:/root/src \ 135 | --volume ${PWD}/${AZURE_CONFIG_FILE}:/root/config.json \ 136 | --workdir /root/notebooks \ 137 | --privileged --init -it \ 138 | intel/ai-workflows:nlp-azure-training \ 139 | sh -c "jupyter nbconvert --to python 1.0-intel-azureml-training.ipynb && python3 1.0-intel-azureml-training.py" 140 | ``` 141 | ### Interactive Docker 142 | Below setup and how-to-run sessions are for users who want to use an interactive environment. 143 | For docker pipeline, please go to [docker session](#docker). 144 | #### Setup 145 | 146 | Build the docker image to prepare the environment for running the Jupyter notebooks. 147 | ``` 148 | cd scripts 149 | sh build_main_image.sh 150 | ``` 151 | 152 | Use the Docker image built by ``build_main_image.sh`` to run the Jupyter notebook. Execute the following command: 153 | ```bash 154 | sh start_script.sh 155 | ``` 156 | After starting the container, execute the following command in the interactive shell. 157 | ```bash 158 | cd notebooks 159 | jupyter notebook --allow-root 160 | ``` 161 | Start the notebook with "training" in the filename. 162 | 163 | ## Run Using Bare Metal 164 | This workflow requires Docker and currently cannot be run using bare metal. 165 | 166 | ## Expected Output 167 | 168 | ``` 169 | training-nlp-azure-1 | 170 | training-nlp-azure-1 | 88%|████████▊ | 95/108 [00:40<00:05, 2.36it/s] 171 | training-nlp-azure-1 | 172 | training-nlp-azure-1 | 89%|████████▉ | 96/108 [00:40<00:05, 2.35it/s] 173 | training-nlp-azure-1 | 174 | training-nlp-azure-1 | 90%|████████▉ | 97/108 [00:41<00:04, 2.35it/s] 175 | training-nlp-azure-1 | 176 | training-nlp-azure-1 | 91%|█████████ | 98/108 [00:41<00:04, 2.33it/s] 177 | training-nlp-azure-1 | 178 | training-nlp-azure-1 | 92%|█████████▏| 99/108 [00:42<00:03, 2.32it/s] 179 | training-nlp-azure-1 | 180 | training-nlp-azure-1 | 93%|█████████▎| 100/108 [00:42<00:03, 2.30it/s] 181 | training-nlp-azure-1 | 182 | training-nlp-azure-1 | 94%|█████████▎| 101/108 [00:42<00:03, 2.29it/s] 183 | training-nlp-azure-1 | 184 | training-nlp-azure-1 | 94%|█████████▍| 102/108 [00:43<00:02, 2.23it/s] 185 | training-nlp-azure-1 | 186 | training-nlp-azure-1 | 95%|█████████▌| 103/108 [00:43<00:02, 2.27it/s] 187 | training-nlp-azure-1 | 188 | training-nlp-azure-1 | 96%|█████████▋| 104/108 [00:44<00:01, 2.29it/s] 189 | training-nlp-azure-1 | 190 | training-nlp-azure-1 | 97%|█████████▋| 105/108 [00:44<00:01, 2.27it/s] 191 | training-nlp-azure-1 | 192 | training-nlp-azure-1 | 98%|█████████▊| 106/108 [00:45<00:00, 2.28it/s] 193 | training-nlp-azure-1 | 194 | training-nlp-azure-1 | 99%|█████████▉| 107/108 [00:45<00:00, 2.28it/s] 195 | training-nlp-azure-1 | 196 | training-nlp-azure-1 | 100%|██████████| 108/108 [00:45<00:00, 2.32it/s] 197 | training-nlp-azure-1 | 198 | training-nlp-azure-1 | 199 | training-nlp-azure-1 | 200 | training-nlp-azure-1 | {'eval_loss': 0.60855633020401, 'eval_accuracy': 0.8573913043478261, 'eval_runtime': 46.8883, 'eval_samples_per_second': 36.79, 'eval_steps_per_second': 2.303, 'epoch': 3.0} 201 | training-nlp-azure-1 | 202 | training-nlp-azure-1 | 100%|██████████| 690/690 [31:31<00:00, 2.27s/it] 203 | training-nlp-azure-1 | 204 | training-nlp-azure-1 | 100%|██████████| 108/108 [00:46<00:00, 2.32it/s] 205 | training-nlp-azure-1 | 206 | training-nlp-azure-1 |  207 | training-nlp-azure-1 | 208 | training-nlp-azure-1 | Training completed. Do not forget to share your model on huggingface.co/models =) 209 | training-nlp-azure-1 | 210 | training-nlp-azure-1 | 211 | training-nlp-azure-1 | 212 | training-nlp-azure-1 | 213 | training-nlp-azure-1 | {'train_runtime': 1891.9246, 'train_samples_per_second': 5.816, 'train_steps_per_second': 0.365, 'train_loss': 0.31064462523529496, 'epoch': 3.0} 214 | training-nlp-azure-1 | 215 | training-nlp-azure-1 | 100%|██████████| 690/690 [31:31<00:00, 2.27s/it] 216 | training-nlp-azure-1 | 100%|██████████| 690/690 [31:31<00:00, 2.74s/it] 217 | training-nlp-azure-1 | Saving model checkpoint to ./outputs/trained_model 218 | training-nlp-azure-1 | Configuration saved in ./outputs/trained_model/config.json 219 | training-nlp-azure-1 | Model weights saved in ./outputs/trained_model/pytorch_model.bin 220 | training-nlp-azure-1 | Time for training: 1922.514419555664s 221 | training-nlp-azure-1 | Cleaning up all outstanding Run operations, waiting 300.0 seconds 222 | training-nlp-azure-1 | 1 items cleaning up... 223 | training-nlp-azure-1 | Cleanup took 5.616384744644165 seconds 224 | training-nlp-azure-1 | 225 | training-nlp-azure-1 | Execution Summary 226 | training-nlp-azure-1 | ================= 227 | training-nlp-azure-1 | RunId: IntelIPEX_HuggingFace_DDP_1666115383_6ff5fb64 228 | training-nlp-azure-1 | Web View: https://ml.azure.com/runs/IntelIPEX_HuggingFace_DDP_1666115383_6ff5fb64?wsid=/subscriptions/0a5dbdd4-ee35-483f-b248-93e05a52cd9f/resourcegroups/intel_azureml_resource/workspaces/cloud_t7_i9&tid=46c98d88-e344-4ed4-8496-4ed7712e255d 229 | training-nlp-azure-1 | 230 | training-nlp-azure-1 | Length of output paths is not the same as the length of pathsor output_paths contains duplicates. Using paths as output_paths. 231 | training-nlp-azure-1 exited with code 0 232 | ``` 233 | ## Summary and Next Steps 234 | In this workflow, you loaded a docker image and performed distributed training on a PyTorch BERT base model on the Azure Machine Learning Platform using Intel® Xeon® Scalable Processors. See the [Intel® NLP workflow for Azure ML - Inference](https://github.com/intel/ai-workflows/blob/main/language_modeling/pytorch/bert_base/inference/DEVCATALOG.md) workflow that uses this trained model. 235 | 236 | ## Learn More 237 | For more information about Intel® NLP workflow for Azure* ML or to read about other relevant workflow 238 | examples, see these guides and software resources: 239 | 240 | - [Intel® AI Analytics Toolkit (AI Kit)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html) 241 | - [Azure Machine Learning Documentation](https://learn.microsoft.com/en-us/azure/machine-learning/) 242 | 243 | ## Troubleshooting 244 | Issues, problem spots, and their workarounds if possible, will be listed here. 245 | 246 | ## Support 247 | [Intel® NLP workflow for Azure* ML](https://github.com/intel/Intel-NLP-workflow-for-Azure-ML) tracks both bugs and enhancement requests using [GitHub issues](https://github.com/intel/Intel-NLP-workflow-for-Azure-ML/issues). Search there before submitting a new issue. 248 | --------------------------------------------------------------------------------