├── Dockerfile ├── README.md ├── Slides.pdf ├── dataset.csv ├── dataset.csv.out └── linear-regression.py /Dockerfile: -------------------------------------------------------------------------------- 1 | # from Ubuntu 20.04 base image 2 | FROM ubuntu:20.04 3 | 4 | # provide your contact info if you wish 5 | MAINTAINER Noureddin Sadawi, myemail@mail.com 6 | 7 | # run these commands .. 8 | RUN apt update 9 | RUN apt install -y python3-pip 10 | RUN pip3 install pandas scikit_learn 11 | 12 | # copy this Python script from host machine to docker image 13 | ADD linear-regression.py / 14 | 15 | # as soon as a container starts, run this script using Python3 16 | ENTRYPOINT ["python3", "/linear-regression.py"] 17 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Building and using a Docker Image/Container 2 | 3 | * This example helps you create an image that: 4 | * is based on Ubuntu 20.04 (It has Python 3 already installed) 5 | * We will update Ubuntu and install the sklearn, pandas packages on it 6 | * Write a Python script that receives a CSV file, perform some calculations on it and write the results into a results file (e.g. `dataset.csv.out`) 7 | * This results file should be saved into our host machine (permanent) 8 | 9 | 10 | * Let’s build our image (you can choose your own tag): 11 | `$ docker build -t linreg .` 12 | 13 | 14 | * and now we run the container passing it our dataset file and specifying the folder we want to the result to be stored in (using volumes): 15 | `$ docker run -v /path/to/dockerfile/:/work/ linreg dataset.csv ` 16 | 17 | * This `/path/to/dockerfile/` is the full path to your current directory on the host machine 18 | 19 | * This `/work/` is the directory inside the container (the script will write the results there) 20 | 21 | * The `dataset.csv` file will be automatically passed to the script as an `ENTRYPOINT` argument 22 | -------------------------------------------------------------------------------- /Slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nsadawi/DockerLinReg/55649043244ac282144c02867c269d9ff85034e8/Slides.pdf -------------------------------------------------------------------------------- /dataset.csv: -------------------------------------------------------------------------------- 1 | Rent_per_arable_acre_dollars,Milk_cows_per_square_mile,Difference_between_pasture_and_arable_land,Outcome 2 | 22.29,18.51,0.2,18.38 3 | 40.6,3.27,0.02,20.0 4 | 18.92,29.62,0.72,11.5 5 | 24.64,11.46,0.21,25.0 6 | 51.19,7.59,0.13,62.5 7 | 38.33,2.83,0.04,82.5 8 | 15.5,17.25,0.24,25.0 9 | 17.17,24.16,0.36,30.67 10 | 23.62,28.89,0.24,12.0 11 | 69.05,46.18,0.22,61.25 12 | 59.42,49.86,0.13,60.0 13 | 41.77,4.53,0.08,67.5 14 | 62.52,15.89,0.05,31.0 15 | 77.06,13.59,0.05,60.0 16 | 46.85,5.42,0.08,72.5 17 | 17.09,33.34,0.66,60.33 18 | 31.84,5.54,0.12,49.75 19 | 69.88,31.48,0.07,8.5 20 | 27.14,31.2,0.27,36.5 21 | 21.33,1.53,0.1,60.0 22 | 44.56,16.7,0.15,16.25 23 | 34.46,4.2,0.03,50.0 24 | 20.64,23.81,0.24,11.6 25 | 26.09,28.5,0.21,35.0 26 | 53.89,53.16,0.24,75.0 27 | 18.25,16.12,0.32,31.56 28 | 71.41,21.37,0.05,48.5 29 | 83.9,5.44,0.04,77.5 30 | 12.36,11.13,0.12,21.67 31 | 31.55,23.47,0.19,19.75 32 | 26.94,2.48,0.1,56.0 33 | 59.48,35.9,0.32,25.0 34 | 12.42,8.69,0.41,40.0 35 | 48.5,6.82,0.08,56.67 36 | 21.73,6.58,0.06,51.79 37 | 65.94,22.1,0.09,96.67 38 | 81.4,4.54,0.05,50.83 39 | 24.2,6.29,0.06,34.33 40 | 26.86,53.73,0.43,48.75 41 | 65.0,13.24,0.08,25.8 42 | 26.68,58.6,0.23,20.0 43 | 69.0,31.23,0.08,16.0 44 | 58.71,7.4,0.04,48.67 45 | 51.0,50.5,0.24,20.78 46 | 72.25,20.37,0.05,32.5 47 | 40.41,4.29,0.1,19.0 48 | 69.42,6.63,0.04,51.5 49 | 38.68,14.55,0.17,49.17 50 | 57.54,14.98,0.11,85.0 51 | 50.32,21.36,0.19,58.76 52 | 65.74,7.71,0.02,19.33 53 | 82.0,7.89,0.03,5.0 54 | 54.17,5.57,0.06,65.0 55 | 58.83,45.46,0.16,20.0 56 | 6.17,13.68,0.18,62.5 57 | 66.0,14.25,0.15,35.0 58 | 9.0,8.89,0.08,99.17 59 | 46.2,31.62,0.26,40.25 60 | 62.83,29.98,0.17,39.17 61 | 53.95,42.54,0.25,37.5 62 | 75.73,35.43,0.05,26.25 63 | 48.46,27.4,0.12,52.14 64 | 21.89,43.7,0.36,22.5 65 | 20.0,40.18,0.56,90.0 66 | 59.88,32.99,0.21,28.0 67 | 26.94,8.28,0.1,50.0 68 | 36.28,5.85,0.1,24.5 69 | -------------------------------------------------------------------------------- /dataset.csv.out: -------------------------------------------------------------------------------- 1 | -0.19135555475584037 -------------------------------------------------------------------------------- /linear-regression.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import sys 3 | import pandas as pd 4 | from sklearn.model_selection import cross_val_score 5 | from sklearn.linear_model import LinearRegression 6 | 7 | # read dataset as the 1st command line argument 8 | dataset = sys.argv[1] 9 | 10 | #load dataset 11 | df = pd.read_csv("/work/"+dataset) 12 | 13 | # get x data .. only features, no target variable 14 | X = df.iloc[:,:-1] 15 | 16 | # this is the target variable 17 | y = df.iloc[:,-1] 18 | 19 | # create a Linear Regression object 20 | lreg = LinearRegression() 21 | # now apply 5 fold cross validaton and get array of scores 22 | scores = cross_val_score(lreg, X, y, cv=5) 23 | # compute avg score 24 | score = sum(scores)/scores.size 25 | 26 | with open("/work/"+dataset+".out", "a") as text_file: 27 | text_file.write(str(score)) 28 | --------------------------------------------------------------------------------