├── images
    ├── 2021-02-04-04-07-46.png
    └── 2021-02-04-04-53-08.png
├── requirements.txt
├── server.py
├── master.py
├── README.md
├── .gitignore
├── device.py
└── templates
    └── index.html


/images/2021-02-04-04-07-46.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kehanlu/server-monitor/HEAD/images/2021-02-04-04-07-46.png


--------------------------------------------------------------------------------
/images/2021-02-04-04-53-08.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kehanlu/server-monitor/HEAD/images/2021-02-04-04-53-08.png


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | click==7.1.2
 2 | fastapi==0.63.0
 3 | gunicorn==20.0.4
 4 | h11==0.12.0
 5 | httptools==0.1.1
 6 | pydantic==1.7.3
 7 | starlette==0.13.6
 8 | uvicorn==0.13.3
 9 | uvloop==0.15.2
10 | 


--------------------------------------------------------------------------------
/server.py:
--------------------------------------------------------------------------------
 1 | from fastapi import FastAPI
 2 | from fastapi.middleware.cors import CORSMiddleware
 3 | 
 4 | from device import NvidiaSMI, RAM
 5 | 
 6 | app = FastAPI()
 7 | 
 8 | origins = [
 9 |     "http://localhost:3000",
10 |     "http://140.118.127.80:3000",
11 | ]
12 | 
13 | app.add_middleware(
14 |     CORSMiddleware,
15 |     allow_origins=origins,
16 |     allow_origin_regex="https?:\/\/.*\.ntust\.edu\.tw",
17 |     allow_methods=["*"],
18 |     allow_headers=["*"],
19 | )
20 | 
21 | # app.add_middleware(
22 | #     CORSMiddleware,
23 | #     allow_origin_regex="https?:\/\/.*\.ntust\.edu\.tw",
24 | #     allow_methods=["*"],
25 | #     allow_headers=["*"],
26 | # )
27 | 
28 | # 
29 | 
30 | 
31 | @app.get("/")
32 | async def root():
33 |     return {
34 |         "nvidia_smi": NvidiaSMI(),
35 |         "ram": RAM()
36 |     }
37 | 


--------------------------------------------------------------------------------
/master.py:
--------------------------------------------------------------------------------
 1 | from flask import Flask
 2 | from flask import render_template
 3 | import requests
 4 | import json
 5 | from datetime import datetime
 6 | import pytz
 7 | from config import CONFIG
 8 | 
 9 | tz = pytz.timezone("Asia/Taipei")
10 | 
11 | app = Flask(__name__)
12 | app.config["TEMPLATES_AUTO_RELOAD"] = True
13 | 
14 | SITE_TITLE = CONFIG.get("site_title", "Server status")
15 | TOP_MESSAGE = CONFIG.get("top_message", "Hello world")
16 | 
17 | if CONFIG.get("server_ips") is None:
18 |     raise ValueError()
19 | SERVER_IPS = CONFIG.get("server_ips")
20 | 
21 | 
22 | @app.route('/')
23 | def server():
24 |     servers = list()
25 |     now = datetime.now(tz=tz).strftime("%Y-%m-%d %T")
26 |     for ip in SERVER_IPS:
27 |         resp = requests.get(f"http://{ip}:23333")
28 |         if resp.status_code != 200:
29 |             data = {
30 |                 "ip": ip,
31 |                 "active": False
32 |             }
33 |         else:
34 |             data = json.loads(resp.text)
35 |             data["ip"] = ip
36 |             data["active"] = True
37 |         servers.append(data)
38 | 
39 |     context = {"title": SITE_TITLE,
40 |                "top_message": TOP_MESSAGE,
41 |                "now": now,
42 |                "servers": servers}
43 |     return render_template("index.html", **context)
44 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | To monitor RAM and GPU usage on multiple servers. 
 2 | 
 3 | In a computer science lab or company, you usually have multiple servers and GPUs running many deep learning experiments. You want to know which device is working and which is available at a glance with a minimum setup.
 4 | 
 5 | ## Screenshots
 6 | 
 7 | ![](images/2021-02-04-04-07-46.png)
 8 | 
 9 | ## Installation
10 | 
11 | ```shell
12 | git clone https://github.com/kehanlu/server-monitor
13 | cd server-monitor
14 | pip install -r requirements.txt
15 | ```
16 | 
17 | - `nvidia-smi`: https://www.nvidia.com
18 | 
19 | 
20 | ## Usage
21 | 
22 | ![](images/2021-02-04-04-53-08.png)
23 | 
24 | ### Server
25 | 
26 | "Server" means the server you want to monitor.
27 | 
28 | 1. Go to server you want to monitor
29 |     - You have to be sure that `nvidia-smi` command is installed.
30 | 
31 | 2. run the command to start an API.
32 | 
33 | ```shell
34 | gunicorn -w 1 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:23333 server:app --daemon
35 | ```
36 | 
37 | ### Master
38 | 
39 | "Master" means the web server which is going to fetch data from each servers. You can run this web server on any computer. In some case, you might want this web server to be accessible from public network, but still hide servers behind a firewall.
40 | 
41 | 1. create a file named `config.py`
42 | 
43 | 2. In `config.py`, you need to have a list of server ips. Then the web server will iterate from the list and GET the API at `http://{ip}:23333`.
44 | 
45 | - `server_ips`
46 | - `site_title(optional)`: the title of website
47 | - `top_message(optional)`: the message shows on the top
48 | 
49 | ```python
50 | CONFIG = {
51 |     "site_title": "Server status",
52 |     "top_message": "Hello world",
53 |     "server_ips": [
54 |         "192.168.0.2",
55 |         "192.168.0.3",
56 |         "192.168.0.4",
57 |     ],
58 | }
59 | 
60 | ```
61 | 
62 | 3. run the command to start the server.
63 | 
64 | ```shell
65 | gunicorn -w 1 -b 0.0.0.0:8787 master:app
66 | ```
67 | 
68 | 4. Visit `127.0.0.1:8787` or `<your_ip>:8787` to see the website.
69 | 
70 | ## Contribution
71 | 
72 | Pull requests are welcome. This is still an early project (and just for fun).
73 | 
74 | TODOs:
75 | 
76 | - Fast installation script.
77 | - Handle error.
78 | - Use Nginx to serve the sites.
79 | - Use CI/CD to automatically update projects on servers.


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | config.py
  2 | 
  3 | # Byte-compiled / optimized / DLL files
  4 | __pycache__/
  5 | *.py[cod]
  6 | *$py.class
  7 | 
  8 | # C extensions
  9 | *.so
 10 | 
 11 | # Distribution / packaging
 12 | .Python
 13 | build/
 14 | develop-eggs/
 15 | dist/
 16 | downloads/
 17 | eggs/
 18 | .eggs/
 19 | lib/
 20 | lib64/
 21 | parts/
 22 | sdist/
 23 | var/
 24 | wheels/
 25 | pip-wheel-metadata/
 26 | share/python-wheels/
 27 | *.egg-info/
 28 | .installed.cfg
 29 | *.egg
 30 | MANIFEST
 31 | 
 32 | # PyInstaller
 33 | #  Usually these files are written by a python script from a template
 34 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 35 | *.manifest
 36 | *.spec
 37 | 
 38 | # Installer logs
 39 | pip-log.txt
 40 | pip-delete-this-directory.txt
 41 | 
 42 | # Unit test / coverage reports
 43 | htmlcov/
 44 | .tox/
 45 | .nox/
 46 | .coverage
 47 | .coverage.*
 48 | .cache
 49 | nosetests.xml
 50 | coverage.xml
 51 | *.cover
 52 | *.py,cover
 53 | .hypothesis/
 54 | .pytest_cache/
 55 | 
 56 | # Translations
 57 | *.mo
 58 | *.pot
 59 | 
 60 | # Django stuff:
 61 | *.log
 62 | local_settings.py
 63 | db.sqlite3
 64 | db.sqlite3-journal
 65 | 
 66 | # Flask stuff:
 67 | instance/
 68 | .webassets-cache
 69 | 
 70 | # Scrapy stuff:
 71 | .scrapy
 72 | 
 73 | # Sphinx documentation
 74 | docs/_build/
 75 | 
 76 | # PyBuilder
 77 | target/
 78 | 
 79 | # Jupyter Notebook
 80 | .ipynb_checkpoints
 81 | 
 82 | # IPython
 83 | profile_default/
 84 | ipython_config.py
 85 | 
 86 | # pyenv
 87 | .python-version
 88 | 
 89 | # pipenv
 90 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 91 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 92 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 93 | #   install all needed dependencies.
 94 | #Pipfile.lock
 95 | 
 96 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 97 | __pypackages__/
 98 | 
 99 | # Celery stuff
100 | celerybeat-schedule
101 | celerybeat.pid
102 | 
103 | # SageMath parsed files
104 | *.sage.py
105 | 
106 | # Environments
107 | .env
108 | .venv
109 | env/
110 | venv/
111 | ENV/
112 | env.bak/
113 | venv.bak/
114 | 
115 | # Spyder project settings
116 | .spyderproject
117 | .spyproject
118 | 
119 | # Rope project settings
120 | .ropeproject
121 | 
122 | # mkdocs documentation
123 | /site
124 | 
125 | # mypy
126 | .mypy_cache/
127 | .dmypy.json
128 | dmypy.json
129 | 
130 | # Pyre type checker
131 | .pyre/
132 | 


--------------------------------------------------------------------------------
/device.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import json
 3 | 
 4 | class GPU():
 5 |     def __init__(self, index, name, memory_total, memory_free, memory_used, utilization_gpu):
 6 |         self.index = index
 7 |         self.name = name
 8 |         self.memory_total = int(memory_total.split()[0])
 9 |         self.memory_free = int(memory_free.split()[0])
10 |         self.memory_used = int(memory_used.split()[0])
11 |         self.utilization_gpu = int(utilization_gpu.split()[0])
12 |     def __repr__(self):
13 |         return f"GPU {self.index}: {self.name:20.20},{self.memory_used:>10}/{self.memory_total:<10},{self.utilization_gpu}"
14 | 
15 |     def get_info(self):
16 |         return {
17 |             "index": self.index,
18 |             "name": self.name,
19 |             "memory_total": self.memory_total,
20 |             "memory_free": self.memory_free,
21 |             "memory_used": self.memory_used,
22 |             "utilization_gpu": self.utilization_gpu
23 |         }
24 | 
25 | class NvidiaSMI():    
26 |     def __init__(self):
27 |         query = [
28 |             "index",
29 |             "name",
30 |             "memory.total",
31 |             "memory.free",
32 |             "memory.used",
33 |             "utilization.gpu",
34 |         ]
35 |         self.gpu_list = list()
36 |         for gpu_info in os.popen(f"nvidia-smi --query-gpu={','.join(query)} --format=csv,noheader").readlines():
37 |             gpu_info = gpu_info.strip().split(',')
38 |             gpu = dict()
39 |             for c,info in zip(query, gpu_info):
40 |                 c = c.replace(".","_")
41 |                 gpu[c] = info.strip()
42 |             self.gpu_list.append(GPU(**gpu))
43 |         
44 |         self.processes = list()
45 |         for process in os.popen("nvidia-smi --query-compute-apps=gpu_name,name,used_gpu_memory --format=csv,noheader").readlines():
46 |             p = process.split(',')
47 |             self.processes.append({
48 |                 "gpu_name": p[0].strip(),
49 |                 "process_name": p[1].strip(),
50 |                 "memory_used": p[2].strip()
51 |             })
52 |             
53 |     def show(self):
54 |         for gpu in self.gpu_list:
55 |             print(gpu)
56 | 
57 |     def to_json(self):
58 |         return json.dumps({
59 |             "gpus": [gpu.get_info() for gpu in self.gpu_list],
60 |             "processes": self.processes
61 |         })
62 |     def get_info(self):
63 |         return {
64 |             "gpus": [gpu.get_info() for gpu in self.gpu_list],
65 |             "processes": self.processes
66 |         }
67 | 
68 | class RAM():
69 |     def __init__(self):
70 |         info = os.popen("free -g").read().split('\n')[1].split()
71 |         self.info = {
72 |             'total': int(info[1]),
73 |             'used': int(info[2]),
74 |             'available': int(info[6])
75 |         }
76 | 
77 |     def to_json(self):
78 |         return self.info
79 | 
80 |     def get_info(self):
81 |         return self.info


--------------------------------------------------------------------------------
/templates/index.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html lang="en">
 3 | 
 4 | <head>
 5 |     <meta charset="UTF-8">
 6 |     <meta name="viewport" content="width=device-width, initial-scale=1.0">
 7 |     <title>{{title}}</title>
 8 |     <link href="https://unpkg.com/nes.css@2.3.0/css/nes.min.css" rel="stylesheet" />
 9 |     <style>
10 |         @import url('https://fonts.googleapis.com/css2?family=Press+Start+2P&display=swap');
11 | 
12 |         body {
13 |             font-family: 'Press Start 2P', cursive;
14 |         }
15 | 
16 |         .nes-progress {
17 |             height: 20px;
18 |         }
19 | 
20 |         pre {
21 |             margin: 0;
22 |         }
23 | 
24 |         .container {
25 |             margin: 30px auto;
26 |             max-width: 1080px;
27 |         }
28 | 
29 |         .server {
30 |             margin-bottom: 3rem;
31 |         }
32 |     </style>
33 | </head>
34 | 
35 | <body>
36 | 
37 |     <div class="container">
38 |         <p class="nes-balloon from-left nes-pointer">
39 |             {{top_message}}
40 |         </p>
41 |         <p style="float: right;">Request time: {{now}}</p>
42 |         {% for server in servers %}
43 | 
44 |         <div class="server">
45 |             <p class="nes-badge" style="width: 100%;">
46 |                 <span class="is-dark">{{server.ip}} </span>
47 |             </p>
48 | 
49 |             {% if server.active %}
50 |             <p>RAM: {{server.ram.info.used}}GB / {{server.ram.info.total}}GB</p>
51 |             <div class="nes-table-responsive">
52 |                 <table class="nes-table">
53 |                     <thead>
54 |                         <tr>
55 |                             <th>ID</th>
56 |                             <th>Name</th>
57 |                             <th>Memory used</th>
58 |                             <th style="width: 300px;">Util gpu</th>
59 |                         </tr>
60 |                     </thead>
61 |                     <tbody>
62 |                         {% for gpu in server.nvidia_smi.gpu_list %}
63 |                         <tr>
64 |                             <td>{{gpu.index}}</td>
65 |                             <td>{{gpu.name}}</td>
66 |                             <td>
67 |                                 <pre>{{"%6sMiB /%6sMiB"|format(gpu.memory_used,gpu.memory_total) }}</pre>
68 |                                 <progress class="nes-progress is-success" value="{{gpu.memory_used}}"
69 |                                     max="{{gpu.memory_total}}"></progress>
70 |                             </td>
71 |                             <td>{{gpu.utilization_gpu}} % <progress class="nes-progress is-success"
72 |                                     value="{{gpu.utilization_gpu}}" max="100"></progress></td>
73 |                         </tr>
74 |                         {% endfor %}
75 |                     </tbody>
76 |                 </table>
77 |                 <br>
78 |             </div>
79 |             {% for process in server.nvidia_smi.processes%}
80 |             <pre>{{ "%-20s %-30.30s %-10s" | format(process.gpu_name, process.process_name,
81 |                             process.memory_used)}}</pre>
82 |             {% endfor %}
83 |             {% else %}
84 |             service down!
85 |             <!-- 502 or error -->
86 |             {% endif %}
87 |         </div>
88 |         {% endfor %}
89 |         <!-- server end -->
90 | 
91 |         <a href="https://github.com/kehanlu/server-monitor"><i class="nes-octocat animate"></i></a>
92 |     </div>
93 | 
94 | 
95 | </body>
96 | 
97 | </html>


--------------------------------------------------------------------------------