├── .gitignore
├── images
├── architecture.png
├── february_sales_plot.png
├── january_sales_excel.png
├── january_sales_plot.png
├── january_sales_table.png
├── process_flow.png
├── reach_out.png
└── template_notebook_screenshot.png
├── report-automation-part-1.md
├── report-automation-part-2.md
└── src
├── base.ipynb
├── cloud_reporter.py
├── sales_february.ipynb
├── sales_february.xlsx
├── sales_january.html
├── sales_january.ipynb
├── sales_january.xlsx
└── template.ipynb
/.gitignore:
--------------------------------------------------------------------------------
1 | .ipynb_checkpoints
--------------------------------------------------------------------------------
/images/architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/duarteocarmo/automation-post/5e1d3f80b2e8e6d5961397e8349469e262fe3da1/images/architecture.png
--------------------------------------------------------------------------------
/images/february_sales_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/duarteocarmo/automation-post/5e1d3f80b2e8e6d5961397e8349469e262fe3da1/images/february_sales_plot.png
--------------------------------------------------------------------------------
/images/january_sales_excel.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/duarteocarmo/automation-post/5e1d3f80b2e8e6d5961397e8349469e262fe3da1/images/january_sales_excel.png
--------------------------------------------------------------------------------
/images/january_sales_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/duarteocarmo/automation-post/5e1d3f80b2e8e6d5961397e8349469e262fe3da1/images/january_sales_plot.png
--------------------------------------------------------------------------------
/images/january_sales_table.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/duarteocarmo/automation-post/5e1d3f80b2e8e6d5961397e8349469e262fe3da1/images/january_sales_table.png
--------------------------------------------------------------------------------
/images/process_flow.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/duarteocarmo/automation-post/5e1d3f80b2e8e6d5961397e8349469e262fe3da1/images/process_flow.png
--------------------------------------------------------------------------------
/images/reach_out.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/duarteocarmo/automation-post/5e1d3f80b2e8e6d5961397e8349469e262fe3da1/images/reach_out.png
--------------------------------------------------------------------------------
/images/template_notebook_screenshot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/duarteocarmo/automation-post/5e1d3f80b2e8e6d5961397e8349469e262fe3da1/images/template_notebook_screenshot.png
--------------------------------------------------------------------------------
/report-automation-part-1.md:
--------------------------------------------------------------------------------
1 | About the author:
2 |
3 | *My name is [Duarte Carmo](https://duarteocarmo.com/) and I'm a product manager and digital consultant. Originally from Lisbon - Portugal, but currently living and working in Copenhagen - Denmark. Find more about my work and leisure in [my website](https://duarteocarmo.com/).*
4 |
5 | # Automating report generation with Papermill and Rclone: Part 1 - Tool roundup
6 |
7 | Welcome to part 1 of this two-part series post about automating report generation using python, jupyter, papermill, and a couple of other tools.
8 |
9 | In the first part, we will cover 4 main important workflows that are part of the automation process. In the second and final part, we will bring everything together and build our own report automation system.
10 |
11 | *Note: This code was written in python 3.7. You might have to adapt the code for older versions of python.*
12 |
13 | Alright, let's get to work.
14 |
15 | ## Automating report generation with Python - Why?
16 |
17 | Not everyone can code. This might seem like an obvious statement, but once you start using python to automate or analyze things around you, you start to encounter a big problem: **reproducibility**. Not everyone knows how to run your scripts, use your tools, or even use a modern browser.
18 |
19 | Let us say you built a killer script. How exactly do you make someone who has never heard the word "python" use it? You could teach them python, but that would take a long time.
20 |
21 | In this series, we will teach you how you can automatically generate shareable Html reports from any excel file using a combination of tools, centered around python.
22 |
23 | ## Creating a Jupyter Notebook reports from Excel files
24 |
25 | Let us say you have an excel file (`sales_january.xlsx`) with a list of the sales generated by a group of employees. Just like this:
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 | Let's start by using a jupyter notebook `sales_january.ipynb` to create a very simple analysis of that sales data.
34 |
35 | We start by importing the [pandas](https://pandas.pydata.org/) and [maplotlib](https://matplotlib.org/) libraries. After that, we specify the name of our file using the `filename` variable. Finally, we use the `read_excel` function to read our data into a pandas DataFrame.
36 |
37 | ```python
38 | import pandas as pd
39 | import matplotlib.pyplot as plt
40 | %matplotlib inline # so plots are printed automatically
41 |
42 | filename = "sales_january.xlsx"
43 | data = pd.read_excel(filename, index_col=0)
44 | ```
45 | When printing the `data` dataframe, we get the following:
46 |
47 |
48 |
49 |
50 |
51 |
52 |
53 | After that, we plot the data using pandas:
54 |
55 | ```python
56 | data.plot(kind="bar", title=f"Sales report from {filename}")
57 | ```
58 |
59 | And we get the following:
60 |
61 |
62 |
63 |
64 |
65 |
66 |
67 | And that's it! We got ourselves a [jupyter notebook](src/sales_january.ipynb) that analyzes (a very simple analysis let us say) a sales report in excel. Now let's say we want to share that report with other people in the organization, what do we do?
68 |
69 |
70 | ## Generating Html reports from Jupyter Notebooks to share with colleagues
71 |
72 | In my experience, the easiest way to share a report with colleagues is to use a little tool called [nbconvert](https://nbconvert.readthedocs.io/en/latest/). Nbconvert allows you to generate an Html version of your notebook. To install it simply run `pip install nbconvert`.
73 |
74 | To do this, start by navigating to the same directory where your notebook is and run the following from your terminal:
75 |
76 | ```bash
77 | $ jupyter nbconvert sales_january.ipynb
78 | ```
79 | You will see that a new file named `sales_january.html` was created. Html files are better than `ipynb` in the measure that they are easily shareable via email, message, or any other way. Just make sure the person receiving the file opens it via a relatively modern browser.
80 |
81 | But lets us say that this sales report comes in every month, how can we automatically run this notebook with any excel file that has the same format?
82 |
83 | ## Automating report generation using papermill
84 |
85 | [Papermill](https://papermill.readthedocs.io/en/latest/) is a handy tool that allows us to "parameterize and execute" Jupyter Notebooks. This basically means that papermill allows you to execute the same jupyter notebook, with different variables defined outside its context.
86 |
87 | To install it, run `pip install papermill`, or follow the more complete [installation instructions](https://papermill.readthedocs.io/en/latest/installation.html).
88 |
89 | Let us say we want to generate the same report as above, but with another excel file: `sales_february.xlsx`. You should have in your directory, the following:
90 |
91 | ```bash
92 | ├── sales_february.xlsx
93 | ├── sales_january.html
94 | ├── sales_january.ipynb
95 | └── sales_january.xlsx
96 | ```
97 |
98 | The first step is to parameterize our notebook, to do this, let us create a `template.ipynb` file. This notebook is very similar to `sales_january.ipynb` but with a small difference: a new cell with a tag `parameters`. Just like this:
99 |
100 |
101 |
102 |
103 |
104 |
105 |
106 | (If you have trouble adding a tag to your notebook, visit [this link](https://papermill.readthedocs.io/en/latest/usage-parameterize.html#notebook))
107 |
108 |
109 | The cell with the `parameters` tag, will allow you to run this notebook from another python script while feeding the `filename` variable, any value you would like.
110 |
111 | Your directory should look like this:
112 |
113 | ```bash
114 | ├── sales_february.xlsx
115 | ├── sales_january.html
116 | ├── sales_january.ipynb
117 | ├── sales_january.xlsx
118 | └── template.ipynb
119 | ```
120 |
121 | You can always browse the code in the [GitHub repo](https://github.com/duarteocarmo/automation-post).
122 |
123 | Now that we have everything in place, let's generate a report for a new `february_sales.xlsx` excel file.
124 |
125 | To do it, in a new python file, or python console, run the following:
126 |
127 | ```python
128 | import papermill as pm
129 |
130 | pm.execute_notebook(
131 | 'template.ipynb',
132 | 'sales_february.ipynb',
133 | parameters=dict(filename="sales_february.xlsx")
134 | )
135 | ```
136 |
137 | Let's break this down: the `pm.execute_notebook` function takes 3 arguments. The first, `template.ipynb` is the name of the file what we will use as a base to run our notebook, the one with the `parameters` tag. The second argument is the name of the new notebook that we will generate with the new arguments. Finally, `parameters` is a dictionary of the variables that we want to impose to our template, in this case, the `filename` variable, that will now point to our February sales report.
138 |
139 | After running the above code, you will notice a new file in your directory:
140 |
141 |
142 | ```bash
143 | ├── sales_february.ipynb <- This one!
144 | ├── sales_february.xlsx
145 | ├── sales_january.html
146 | ├── sales_january.ipynb
147 | ├── sales_january.xlsx
148 | └── template.ipynb
149 | ```
150 |
151 | Which means, that Papermill has generated a new notebook for us, based on the `sales_february.xlsx` sales report. When openning this notebook, we see a new graph with the new february numbers:
152 |
153 |
154 |
155 |
156 |
157 |
158 |
159 | This is pretty handy! We could have a continuous script that always runs this notebook with different sales reports from different months. But how can we automate the process even more? Stay tuned to learn how!
160 |
161 | In the second part of this series, you will learn how to bring all of this together to build a full report automation workflow that your colleagues can use! Sign up to the [mailing list](https://pbpython.com/pages/mailinglist.html) to make sure you are alerted when the next part comes out!
162 |
163 | Browse the all of the notebooks and files in the [GitHub repo](https://github.com/duarteocarmo/automation-post).
--------------------------------------------------------------------------------
/report-automation-part-2.md:
--------------------------------------------------------------------------------
1 | MAKE SURE TO ADD LINKS TO PART 1 BEFORE PUBLISHING
2 |
3 | About the author:
4 |
5 | *My name is [Duarte Carmo](https://duarteocarmo.com/) and I'm a product manager and digital consultant. Originally from Lisbon - Portugal, but currently living and working in Copenhagen - Denmark. Find more about my work and leisure in [my website](https://duarteocarmo.com/).*
6 |
7 | # Automating report generation with Papermill and Rclone: Part 2 - Designing a solution
8 |
9 | Welcome to part 2 of this two-part series post about automating report generation using python, jupyter, papermill, and a couple of other tools.
10 |
11 | In the [first part](https://pbpython.com/papermil-rclone-report-1.html), we covered 4 main important processes that are part of the automation process. In this second and final part, we will bring everything together and build our report automation system.
12 |
13 | *Note: This code was written in python 3.7. You might have to adapt the code for older versions of python.*
14 |
15 | All of the code for this article is available on [GitHub](https://github.com/duarteocarmo/automation-post).
16 |
17 | ## A workflow to automatically generate reports in a shared cloud folder
18 |
19 | Let's imagine you want to generate automatic reports for every similar excel file of sales reports. You also want to share them with your colleagues. Your colleagues are interested in the reports, but not in learning how to program python, how would you proceed?
20 |
21 | There are a lot of options, and hardly any incorrect ones, but one I found particularly interesting was using what we already had company: a cloud folder (Google Drive, OneDrive, Dropbox).
22 |
23 | Cloud folders are very popular in companies, particularly shared ones. So a good idea would be to create a shared folder where everyone can upload sales excel reports, and automatically generate Html reports from them, so everyone can read!
24 |
25 | Here is the basic architecture of the solution:
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 | Let's describe each one of the steps:
34 | - A user uploads a new excel sales report to a shared cloud folder.
35 | - We sync the cloud folder with a local folder and detect that new excel sales report.
36 | - We use papermill to generate a new notebook file from that new excel sales report.
37 | - We use nbconvert to generate an Html file from that new notebook file.
38 | - We upload the Html file to the cloud folder, so the user can read it.
39 |
40 | Let's start building this step by step:
41 |
42 | #### 1.Sync a cloud folder with a local folder and detect new files
43 | To sync cloud directories with local directories, we will a tool called [Rclone](https://rclone.org/). Of course, we will integrate it with python.
44 |
45 | Start by installing rclone in the same machine as your local folder (your personal computer or a virtual private server for example).
46 |
47 | To do so, on a Mac or Linux machine, you should:
48 |
49 | ```bash
50 | $ curl https://rclone.org/install.sh | sudo bash
51 | ```
52 | On Windows, download the executable in the [Rclone downloads page](https://rclone.org/downloads/).
53 |
54 | Once rclone is installed, we must configure it. Depending on your cloud provider (Dropbox, Google Drive, OneDrive), the instructions will vary, so make sure to follow the [configuration instructions](https://rclone.org/).
55 |
56 | Once configured, let us do a first sync from the command line:
57 |
58 | ```bash
59 | $ rclone sync remote:REMOTE_FOLDER_NAME LOCAL_FOLDER_NAME
60 | ```
61 | This will sync your local folder with your remote folder.
62 |
63 | We can also spark this command from a python script using the core [`subprocess` library](https://docs.python.org/3/library/subprocess.html). That allows you to run command-line programs from python:
64 |
65 | ```python
66 | import subprocess
67 |
68 | # define our variables
69 | REMOTE_FOLDER_NAME="shared folder"
70 | LOCAL_FOLDER="local folder"
71 |
72 | # run the rclone sync command from python
73 | subprocess.run(
74 | ["rclone", "sync", f"remote:{REMOTE_FOLDER_NAME}", LOCAL_FOLDER]
75 | )
76 | ```
77 | Now that we know how to sync a local and a cloud directory, how do we detect if a user has uploaded a new file to our cloud directory? Well, an option would be to navigate to our local directory and use the `ls` command and see what pops out.
78 |
79 | Rclone also [allows us](https://rclone.org/commands/rclone_ls/) to list files in our cloud directory. Having this, we can create a python function that detects new files if they have been uploaded to the cloud folder:
80 |
81 | ```python
82 | def get_new_files(remote_folder, local_folder):
83 | """
84 | A function that returns files that were uploaded to the cloud folder and
85 | do not exist in our local folder.
86 | """
87 | # list the files in our cloud folder
88 | list_cloud = subprocess.run(
89 | ["rclone", "lsf", f"remote:{remote_folder}"],
90 | capture_output=True,
91 | text=True,
92 | )
93 |
94 | # transform the command output into a list
95 | cloud_directories = list_cloud.split("\n")[0:-1]
96 |
97 | print(f"In the cloud we have: \n{cloud_directories}")
98 |
99 | # list the files in our local folder
100 | list_cloud = subprocess.run(
101 | ["ls", local_folder], capture_output=True, text=True
102 | )
103 |
104 | # transform the command output into a list
105 | local_directories = list_cloud.stdout.split("\n")[0:-1]
106 |
107 | print(f"In the local copy we have: \n{local_directories}")
108 |
109 | # create a list with the differences between the two lists above
110 | new_files = list(set(cloud_directories) - set(local_directories))
111 |
112 | return new_files
113 | ```
114 | A couple of notes about the script above:
115 | - The `capture_output` file in the `subprocess.run` function, allows us to capture the output of the command. The `text` flag allows us to treat everything as text, avoiding problems with spaces for example.
116 | - After running `subprocess.run`, we apply the `.split` function, this is because of the output of the `subprocess.run` function is a string of different files separated by a line break (`\n`). This split function allows us to but all the elements into a nicely formatted python list.
117 | - The `new_files` list will contain only files that are in the cloud directory, but not in the local directory, or in other words: the excel file that users have uploaded to the cloud drive. In case there are no differences, the function will return an empty list.
118 |
119 | #### 2.Using Papermill and Nbconvert to generate new reports
120 |
121 | Once we have a reliable way of detecting if new files are uploaded to the cloud, we now need to process that new file and generate an `html` report from it.
122 |
123 | We will use two of the tools mentioned in the [first part of this article](https://pbpython.com/papermil-rclone-report-1.html): papermill, and nbconvert.
124 |
125 | We start by creating a function that will produce a new notebook file, based on an excel report. Using, of course, a notebook template (for example `template.ipynb`) as [previously described in part 1](https://pbpython.com/papermil-rclone-report-1.html).
126 |
127 | ```python
128 | import papermill as pm
129 |
130 | def run_notebook(excel_report, notebook_template):
131 | # take only the name of the file, and ignore the .xlsx ending
132 | no_extension_name = excel_report.split(".")[0]
133 | # run with papermill
134 | pm.execute_notebook(
135 | notebook_template,
136 | f"{no_extension_name}.ipynb",
137 | parameters=dict(filename=excel_report),
138 | )
139 | return no_extension_name
140 | ```
141 |
142 | Then, we must convert the notebook to an Html file. To do this, we create another function that calls the `nbconvert` command from the python interpreter.
143 |
144 | ```python
145 | import subprocess
146 |
147 | def generate_html_report(notebook_file):
148 | generate = subprocess.run(
149 | [
150 | "jupyter",
151 | "nbconvert",
152 | notebook_file,
153 | "--to=html",
154 | ]
155 | )
156 | print("HTML Report was generated")
157 | return True
158 | ```
159 |
160 | This function runs the nbconvert command previously described in the beginning of the article, from a python script.
161 |
162 | #### 4.Uploading an Html file back to the cloud folder
163 |
164 | There is another Rclone command that is pretty handy. If you want to push a file from a local folder to a cloud folder, you can use the following from the command line:
165 |
166 | ```bash
167 | $ rclone copy FILENAME remote:REMOTE_FOLDER_NAME
168 | ```
169 |
170 | We could do it from the command line, but why do it from python? With the subprocess library, it's pretty straightforward:
171 |
172 |
173 | ```python
174 | import subprocess
175 |
176 | def push_to_cloud(remote_folder, html_report):
177 | push = subprocess.run(
178 | ["rclone", "copy", html_report, f"remote:{remote_folder}"]
179 | )
180 | print("Report Published!!!")
181 | ```
182 |
183 | #### 5.Bringing it all together
184 | So finally, after giving you a rundown of all of the major tools and processes, here is the full script that scans the cloud folder for new excel sales reports, and generates and uploads an Html analysis of them is presented.
185 |
186 | The script, `cloud_reporter.py` follows:
187 |
188 | ```python
189 | import subprocess
190 | import sys
191 | import papermill as papermill
192 |
193 |
194 | REMOTE_FOLDER = "your cloud folder name"
195 | LOCAL_FOLDER = "your local folder name"
196 | TEMPLATE_NOTEBOOK = "template_notebook.ipynb"
197 |
198 |
199 | def get_new_files(remote_folder, local_folder):
200 | """
201 | A function that returns files that were uploaded to the cloud folder and do not exist in our local folder.
202 | """
203 | # list the files in our cloud folder
204 | list_cloud = subprocess.run(
205 | ["rclone", "lsf", f"remote:{remote_folder}"],
206 | capture_output=True,
207 | text=True,
208 | )
209 |
210 | # transform the command output into a list
211 | cloud_directories = list_cloud.split("\n")[0:-1]
212 |
213 | print(f"In the cloud we have: \n{cloud_directories}")
214 |
215 | # list the files in our local folder
216 | list_cloud = subprocess.run(
217 | ["ls", local_folder], capture_output=True, text=True
218 | )
219 |
220 | # transform the command output into a list
221 | local_directories = list_cloud.stdout.split("\n")[0:-1]
222 |
223 | print(f"In the local copy we have: \n{local_directories}")
224 |
225 | # create a list with the differences between the two lists above
226 | new_files = list(set(cloud_directories) - set(local_directories))
227 |
228 | return new_files
229 |
230 |
231 | def sync_directories(remote_folder, local_folder):
232 | """
233 | A function that syncs a remote folder with a local folder
234 | with rclone.
235 | """
236 | sync = subprocess.run(
237 | ["rclone", "sync", f"remote:{remote_folder}", local_folder]
238 | )
239 |
240 | print("Syncing local directory with cloud....")
241 | return sync.returncode
242 |
243 |
244 | def run_notebook(excel_report, template_notebook):
245 | """
246 | A function that runs a notebook against an excel report
247 | via papermill.
248 | """
249 | no_extension_name = excel_report.split(".")[0]
250 | papermill.execute_notebook(
251 | template_notebook,
252 | f"{no_extension_name}.ipynb",
253 | parameters=dict(filename=excel_report),
254 | )
255 | return no_extension_name
256 |
257 |
258 | def generate_html_report(notebook_file):
259 | """
260 | A function that converts a notebook into an html
261 | file.
262 | """
263 | generate = subprocess.run(
264 | ["jupyter", "nbconvert", notebook_file, "--to=html"]
265 | )
266 | print("HTML Report was generated")
267 | return True
268 |
269 |
270 | def push_to_cloud(remote_folder, filename):
271 | """
272 | A function that pushes to a remote cloud folder
273 | a specific file.
274 | """
275 |
276 | push = subprocess.run(
277 | ["rclone", "copy", filename, f"remote:{remote_folder}"]
278 | )
279 | print("Report Published!!!")
280 |
281 | def main():
282 | print("Starting updater..")
283 |
284 | # detect if there are new files in the remote folder
285 | new_files = get_new_files(
286 | remote_folder=REMOTE_FOLDER, local_folder=LOCAL_FOLDER
287 | )
288 |
289 | # if there are none, exit
290 | if not new_files:
291 | print("Everything is synced. No new files.")
292 | sys.exit()
293 | # else, continue
294 | else:
295 | print("There are files missing.")
296 | print(new_files)
297 |
298 | # sync directories to get new excel report
299 | sync_directories(remote_folder=REMOTE_FOLDER, local_folder=LOCAL_FOLDER)
300 |
301 | # generate new notebook and extract the name
302 | clean_name = run_notebook(new_files[0])
303 |
304 | # the new notebook generate will have the following name
305 | notebook_name = f"{clean_name}.ipynb"
306 |
307 | # generate the html report from the notebook
308 | generate_html_report(notebook_name)
309 |
310 | # the notebook name will be the following
311 | html_report_name = f"{clean_name}.html"
312 |
313 | # push the new notebook to the cloud
314 | push_to_cloud(html_report=html_report_name, remote_folder=ONEDRIVE_FOLDER)
315 |
316 | # make sure everything is synced again
317 | sync_directories(remote_folder=REMOTE_FOLDER, local_folder=LOCAL_FOLDER)
318 |
319 | print("Updater finished.")
320 |
321 | return True
322 |
323 |
324 | if __name__ == "main":
325 | main()
326 | ```
327 |
328 | #### 6.Running the updater regularly
329 |
330 | Once you get the script running, one option is to copy it to a virtual private server (you can get one in [digitalocean.com](https://www.digitalocean.com/products/linux-distribution/ubuntu/) for example) and have it run regularly via something like `cron`.
331 |
332 | ⚠️Warning: If you are going to sync sensitive company information to a virtual private server, please make sure that you have permission, and that you take [necessary security measures](https://www.digitalocean.com/community/tutorials/7-security-measures-to-protect-your-servers) to protect the server.
333 |
334 | You should [read more about `cron`](https://www.ostechnix.com/a-beginners-guide-to-cron-jobs/) before messing with it. It allows you to run scripts every X amount of time. A simple approach to our problem would be:
335 |
336 | 1. Make sure the script is running successfully in your server by:
337 | - Installing and configuring rclone.
338 | - Installing jupyter and nbconvert.
339 | - Creating a local folder to serve as a remote copy.
340 | - Modifying the script above with your variables (base notebook, remote folder name, and local folder name).
341 | - Making sure the script runs.
342 |
343 | 2. Editing your crontab by:
344 |
345 | ```bash
346 | $ crontab -e
347 | ```
348 |
349 | 3. Adding a crontab job that navigates to a certain directory and runs the `cloud_reporter.py` file, every X minutes using python.
350 |
351 | Here is an example of it running every 4 minutes:
352 |
353 | ```bash
354 | */4 * * * * python /path/to/your/folder/cloud_reporter.py
355 | ```
356 |
357 | 4. Upload a new excel file to your cloud folder and wait a minimum of 4 minutes, and a new Html report should be generated and uploaded automatically!
358 |
359 | 5. Give access to the shared cloud folder (Dropbox, Google Drive) to your colleagues, and let them upload any excel report.
360 |
361 | ## Final thoughts
362 |
363 | And just like this, we reach the end of this article series!
364 |
365 | Hopefully, these tools and scripts will inspire you to go out and automate report generation or any other process around you. Making it as simple as possible to your colleagues to generate reports.
366 |
367 | I would like to thank [Chris](https://twitter.com/chris1610) for allowing me to collaborate with him in these posts. I really had a blast building these tools and writing these "guides". [A team effort](https://github.com/duarteocarmo/automation-post/issues/1) that started with a simple reach out on twitter:
368 |
369 |
370 |
371 |
372 |
373 |
374 |
375 |
376 |
377 | All of the code for this article series is in [this GitHub repo](https://github.com/duarteocarmo/automation-post).
378 |
379 |
380 |
381 |
--------------------------------------------------------------------------------
/src/cloud_reporter.py:
--------------------------------------------------------------------------------
1 | import subprocess
2 | import sys
3 | import papermill as papermill
4 |
5 |
6 | REMOTE_FOLDER = "your cloud folder name"
7 | LOCAL_FOLDER = "your local folder name"
8 | TEMPLATE_NOTEBOOK = "template_notebook.ipynb"
9 |
10 |
11 | def get_new_files(remote_folder, local_folder):
12 | """
13 | A function that returns files that were uploaded to the cloud folder and
14 | do not exist in our local folder.
15 | """
16 | # list the files in our cloud folder
17 | list_cloud = subprocess.run(
18 | ["rclone", "lsf", f"remote:{remote_folder}"],
19 | capture_output=True,
20 | text=True,
21 | )
22 |
23 | # transform the command output into a list
24 | cloud_directories = list_cloud.split("\n")[0:-1]
25 |
26 | print(f"In the cloud we have: \n{cloud_directories}")
27 |
28 | # list the files in our local folder
29 | list_cloud = subprocess.run(
30 | ["ls", local_folder], capture_output=True, text=True
31 | )
32 |
33 | # transform the command output into a list
34 | local_directories = list_cloud.stdout.split("\n")[0:-1]
35 |
36 | print(f"In the local copy we have: \n{local_directories}")
37 |
38 | # create a list with the differences between the two lists above
39 | new_files = list(set(cloud_directories) - set(local_directories))
40 |
41 | return new_files
42 |
43 |
44 | def sync_directories(remote_folder, local_folder):
45 | """
46 | A function that syncs a remote folder with a local folder
47 | with rclone.
48 | """
49 | sync = subprocess.run(
50 | ["rclone", "sync", f"remote:{remote_folder}", local_folder]
51 | )
52 |
53 | print("Syncing local directory with cloud....")
54 | return sync.returncode
55 |
56 |
57 | def run_notebook(excel_report, template_notebook):
58 | """
59 | A function that runs a notebook against an excel report
60 | via papermill.
61 | """
62 | no_extension_name = excel_report.split(".")[0]
63 | papermill.execute_notebook(
64 | template_notebook,
65 | f"{no_extension_name}.ipynb",
66 | parameters=dict(filename=excel_report),
67 | )
68 | return no_extension_name
69 |
70 |
71 | def generate_html_report(notebook_file):
72 | """
73 | A function that converts a notebook into an html
74 | file.
75 | """
76 | generate = subprocess.run(
77 | ["jupyter", "nbconvert", notebook_file, "--to=html"]
78 | )
79 | print("HTML Report was generated")
80 | return True
81 |
82 |
83 | def push_to_cloud(remote_folder, filename):
84 | """
85 | A function that pushes to a remote cloud folder
86 | a specific file.
87 | """
88 |
89 | push = subprocess.run(
90 | ["rclone", "copy", filename, f"remote:{remote_folder}"]
91 | )
92 | print("Report Published!!!")
93 |
94 | def main():
95 | print("Starting updater..")
96 |
97 | # detect if there are new files in the remote folder
98 | new_files = get_new_files(
99 | remote_folder=REMOTE_FOLDER, local_folder=LOCAL_FOLDER
100 | )
101 |
102 | # if there are none, exit
103 | if not new_files:
104 | print("Everything is synced. No new files.")
105 | sys.exit()
106 | # else, continue
107 | else:
108 | print("There are files missing.")
109 | print(new_files)
110 |
111 | # sync directories to get new excel report
112 | sync_directories(remote_folder=REMOTE_FOLDER, local_folder=LOCAL_FOLDER)
113 |
114 | # generate new notebook and extract the name
115 | clean_name = run_notebook(new_files[0])
116 |
117 | # the new notebook generate will have the following name
118 | notebook_name = f"{clean_name}.ipynb"
119 |
120 | # generate the html report from the notebook
121 | generate_html_report(notebook_name)
122 |
123 | # the notebook name will be the following
124 | html_report_name = f"{clean_name}.html"
125 |
126 | # push the new notebook to the cloud
127 | push_to_cloud(html_report=html_report_name, remote_folder=ONEDRIVE_FOLDER)
128 |
129 | # make sure everything is synced again
130 | sync_directories(remote_folder=REMOTE_FOLDER, local_folder=LOCAL_FOLDER)
131 |
132 | print("Updater finished.")
133 |
134 | return True
135 |
136 |
137 | if __name__ == "main":
138 | main()
--------------------------------------------------------------------------------
/src/sales_february.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "papermill": {
7 | "duration": 0.021948,
8 | "end_time": "2019-07-16T08:19:31.413195",
9 | "exception": false,
10 | "start_time": "2019-07-16T08:19:31.391247",
11 | "status": "completed"
12 | },
13 | "tags": []
14 | },
15 | "source": [
16 | "# Sales Report"
17 | ]
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": 1,
22 | "metadata": {
23 | "papermill": {
24 | "duration": 0.826242,
25 | "end_time": "2019-07-16T08:19:32.251470",
26 | "exception": false,
27 | "start_time": "2019-07-16T08:19:31.425228",
28 | "status": "completed"
29 | },
30 | "tags": []
31 | },
32 | "outputs": [],
33 | "source": [
34 | "import pandas as pd\n",
35 | "import matplotlib.pyplot as plt\n",
36 | "%matplotlib inline"
37 | ]
38 | },
39 | {
40 | "cell_type": "code",
41 | "execution_count": 2,
42 | "metadata": {
43 | "papermill": {
44 | "duration": 0.028103,
45 | "end_time": "2019-07-16T08:19:32.291574",
46 | "exception": false,
47 | "start_time": "2019-07-16T08:19:32.263471",
48 | "status": "completed"
49 | },
50 | "tags": [
51 | "parameters"
52 | ]
53 | },
54 | "outputs": [],
55 | "source": [
56 | "filename = \"sales_january.xlsx\""
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": 3,
62 | "metadata": {
63 | "papermill": {
64 | "duration": 0.022058,
65 | "end_time": "2019-07-16T08:19:32.327141",
66 | "exception": false,
67 | "start_time": "2019-07-16T08:19:32.305083",
68 | "status": "completed"
69 | },
70 | "tags": [
71 | "injected-parameters"
72 | ]
73 | },
74 | "outputs": [],
75 | "source": [
76 | "# Parameters\n",
77 | "filename = \"sales_february.xlsx\"\n"
78 | ]
79 | },
80 | {
81 | "cell_type": "code",
82 | "execution_count": 4,
83 | "metadata": {
84 | "papermill": {
85 | "duration": 0.071755,
86 | "end_time": "2019-07-16T08:19:32.412446",
87 | "exception": false,
88 | "start_time": "2019-07-16T08:19:32.340691",
89 | "status": "completed"
90 | },
91 | "tags": []
92 | },
93 | "outputs": [],
94 | "source": [
95 | "data = pd.read_excel(filename, index_col=0)"
96 | ]
97 | },
98 | {
99 | "cell_type": "code",
100 | "execution_count": 5,
101 | "metadata": {
102 | "papermill": {
103 | "duration": 0.03584,
104 | "end_time": "2019-07-16T08:19:32.460286",
105 | "exception": false,
106 | "start_time": "2019-07-16T08:19:32.424446",
107 | "status": "completed"
108 | },
109 | "tags": []
110 | },
111 | "outputs": [
112 | {
113 | "data": {
114 | "text/html": [
115 | "