├── .gitignore
├── LICENSE
├── README.md
├── gppmc
    ├── gppmc.py
    ├── gppmc_config.yaml
    └── requirements.txt
├── gppmd
    ├── config.yaml
    ├── gppmd.py
    ├── gppmd_config.yaml.example
    ├── llamacpp_configs
    │   └── examples.yaml
    ├── requirements.txt
    └── templates
    │   └── home.html
└── tools
    ├── build_gppmc_deb.sh
    ├── build_gppmd_deb.sh
    ├── run_instance_1.sh
    ├── run_instance_2.sh
    ├── start_over.sh
    └── sync_to_remote.sh


/.gitignore:
--------------------------------------------------------------------------------
 1 | gppmd_config.yaml
 2 | gppm_config.yaml
 3 | *.swp
 4 | src
 5 | debian
 6 | .env
 7 | venv
 8 | *.spec
 9 | build/
10 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 Roni
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # gppm
  2 | ![gppm-banner](https://github.com/user-attachments/assets/af0a6d7b-818c-476f-b3e3-9217b848c5c7)
  3 | 
  4 | 
  5 | gppm power process manager
  6 | 
  7 | gppm is designed for use with llama.cpp and NVIDIA Tesla P40 GPUs. The standalone llama.cpp currently lacks functionality to reduce the power consumption of these GPUs in idle mode. Although there is a patch for llama.cpp, it switches the performance mode for all GPUs simultaneously, which can disrupt setups where multiple llama.cpp instances share one or more GPUs. Implementing a communication mechanism within llama.cpp to manage task distribution and GPU status is complex. gppm addresses this challenge externally, providing a more efficient solution.
  8 | gppm allows you to define llama.cpp instances as code, enabling automatic spawning, termination, and respawning.
  9 | 
 10 | > [!NOTE]
 11 | > Both the configuration and the API will most likely continue to change for a while. When changing to a newer version, please always take a look at the current documentation.
 12 |   
 13 | ## Table of Contents
 14 | 
 15 | - [How it works](#how-it-works)
 16 | - [Quickstart](#quickstart)
 17 | - [Installation](#installation)
 18 | - [Command line interface](#command-line-interface)
 19 | - [Configuration](#configuration)
 20 | 
 21 | ## How it works
 22 | 
 23 | gppm uses [nvidia-pstate](https://github.com/sasha0552/nvidia-pstate) under the hood which makes it possible to switch the performance state of P40 GPUs at all. gppm must be installed on the host where the GPUs are installed and llama.cpp is running. gppm monitors llama.cpp's output to recognize tasks and on which GPU lama.cpp runs them on and with this information accordingly changes the performance modes of installed P40 GPUs. It can manage any number of GPUs and llama.cpp instances. gppm switches each GPU to a low performance state as soon as none of the existing llama.cpp instances is running a task on that particular GPU and sets it into high performancemode as soon as the next task is going to be run. In doing so, gppm is able to control all GPUs independently of each other. gppm is designed as a wrapper and as such you have all llama.cpp instances configured at one place.
 24 |     
 25 | ## Quickstart
 26 | 
 27 | Clone the repository and cd into it:
 28 | 
 29 | ```shell
 30 | git clone https://github.com/crashr/gppm
 31 | cd gppm
 32 | ```
 33 | 
 34 | Edit the following files to your needs:
 35 | 
 36 | * gppmd/config.yaml
 37 | * gppmd/llamacpp_configs/examples.yaml 
 38 | 
 39 | In a separate terminal run nvidia-smi to monitor the llama.cpp instances we are going run:
 40 | 
 41 | ```shell
 42 | watch -n 0.1 nvidia-smi
 43 | ```
 44 | 
 45 | Run the gppm daemon:
 46 | 
 47 | ```
 48 | python3 gppmd/gppmd.py --config ./gppmd/config.yaml --llamacpp_configs_dir ./gppmd/llamacpp_configs
 49 | ```
 50 | 
 51 | Wait for the instances to show up in the nvidia-smi command teminal.
 52 | gppm ships with a command line client (see details below). In another terminal run the cli like this to list the instances you just started:
 53 | 
 54 | ```shell
 55 | python3 gppmc/gppmc.py get instances
 56 | ```
 57 | 
 58 | 
 59 | ## Installation
 60 | 
 61 | ### Build binaries and DEB package
 62 | 
 63 | ```shell
 64 | ./tools/build_gppmd_deb.sh
 65 | ./tools/build_gppmc_deb.sh
 66 | ```
 67 | 
 68 | You should now find binaries for the daemon and the cli in the build folder:
 69 | 
 70 | ```shell
 71 | ls -1 build/gppmd-$(git describe --tags --abbrev=0)-amd64/usr/bin/gppmd
 72 | ls -1 build/gppmc-$(git describe --tags --abbrev=0)-amd64/usr/bin/gppmc
 73 | ```
 74 | 
 75 | Copy them wherever you want or install the DEB packages (described in the next step):
 76 | 
 77 | ```shell
 78 | ls -1 build/*.deb
 79 | ```
 80 | 
 81 | ### Install DEB package
 82 | 
 83 | The DEB packages are tested for the following dsitributions:
 84 | 
 85 | * Ubuntu 22.04
 86 | 
 87 | Install the DEB packages like this:
 88 | 
 89 | ```sh
 90 | sudo dpkg -i build/gppmd-$(git describe --tags --abbrev=0)-amd64.deb
 91 | sudo dpkg -i build/gppmc-$(git describe --tags --abbrev=0)-amd64.deb
 92 | ```
 93 | 
 94 | gppmd awaits it's config file at /etc/gppmd/config.yaml so put your config there. It can be minimal as this:
 95 | 
 96 | ```
 97 | host: '0.0.0.0'
 98 | port: 5001
 99 | ```
100 | 
101 | gppmd looks for llama.cpp config files in /etc/gppmd/llamacpp_configs so put your configs there (see below for detailed explaination on how the configuration works).
102 | 
103 | Enable and run the daemon:
104 | 
105 | ```shell
106 | sudo systemctl enable --now gppmd.service
107 | ```
108 | 
109 | ## Command line interface
110 | 
111 | gppm comes with a cli client. It provides basic functionalities to interact with the daemon:
112 | 
113 | ```sh
114 | $ gppmc
115 | Usage: gppmc [OPTIONS] COMMAND [ARGS]...
116 | 
117 |   Group of commands for managing llama.cpp instances and configurations.
118 | 
119 | Options:
120 |   --host TEXT     The host to connect to.
121 |   --port INTEGER  The port to connect to.
122 |   --help          Show this message and exit.
123 | 
124 | Commands:
125 |   apply    Apply LlamaCpp configurations from a YAML file.
126 |   disable  Disable a LlamaCpp instance.
127 |   enable   Enable a LlamaCpp instance.
128 |   get      Get various resources.
129 |   reload   Reload LlamaCpp configurations.
130 | ```
131 | 
132 | For some usage example take a look at the configuration section. 
133 | 
134 | ## Configuration
135 | 
136 | After changing llama.cpp instance configuration files they can be reloded with the cli:
137 | 
138 | ```shell
139 | gppmc reload
140 | ```
141 | 
142 | This affects only instances which configs where changed. All other instances remain untouched.
143 | 
144 | The most basic configuration for a llama.cpp instance looks like this:
145 | 
146 | ```yaml
147 | - name: Biggie_SmolLM_0.15B_Base_q8_0_01
148 |   enabled: True
149 |   env:
150 |     CUDA_VISIBLE_DEVICES: "0"
151 |   command:
152 |     "/usr/local/bin/llama-server \
153 |       --host 0.0.0.0 \
154 |       -ngl 100 \
155 |       -m /models/Biggie_SmolLM_0.15B_Base_q8_0.gguf \
156 |       --port 8061 \
157 |       -sm none \
158 |       --no-mmap \
159 |       --log-format json" # Remove this for version >=1.2.0 
160 | ```
161 | 
162 | To enable gppmd to perform power state switching with NVIDIA Tesla P40 GPUs it is essential to specifiy CUDA_VISIBLE_DEVICES and json log format.
163 | 
164 | gppm allows to configure post launch hooks. With that it is possible to bundle complex setups. As an example the following configuration creates a setup consisting of two llama.cpp instances running Codestral on three GPUs behind a load balancer. For the load balancer [Paddler](https://github.com/distantmagic/paddler) is used:
165 | 
166 | ```yaml
167 | - name: "Codestral-22B-v0.1-Q8_0 (paddler balancer)"
168 |   enabled: True
169 |   command:
170 |     "/usr/local/bin/paddler balancer \
171 |       --management-host 0.0.0.0 \
172 |       --management-port 8085 \
173 |       --management-dashboard-enable=true \
174 |       --reverseproxy-host 192.168.178.56 \
175 |       --reverseproxy-port 8081"
176 | 
177 | - name: "Codestral-22B-v0.1-Q8_0 (llama.cpp 01)"
178 |   enabled: True
179 |   env:
180 |     CUDA_VISIBLE_DEVICES: "0,1,2"
181 |   command:
182 |     "/usr/local/bin/llama-server \
183 |       --host 0.0.0.0 \
184 |       -ngl 100 \
185 |       -m /models/Codestral-22B-v0.1-Q8_0.gguf \
186 |       --port 8082 \
187 |       -fa \
188 |       -sm row \
189 |       -mg 0 \
190 |       --no-mmap \
191 |       --slots \
192 |       --log-format json" # Remove this for version >=1.2.0
193 |   post_launch_hooks:
194 |   - name: Codestral-22B-v0.1-Q8_0_(paddler_01)
195 |     enabled: True
196 |     command:
197 |       "/usr/local/bin/paddler agent \
198 |         --name 'Codestral-22B-v0.1-Q8_0 (llama.cpp 01)' \
199 |         --external-llamacpp-host 192.168.178.56 \
200 |         --external-llamacpp-port 8082 \
201 |         --local-llamacpp-host 192.168.178.56 \
202 |         --local-llamacpp-port 8082 \
203 |         --management-host 192.168.178.56 \
204 |         --management-port 8085"
205 | 
206 | - name: "Codestral-22B-v0.1-Q8_0_(llama.cpp_02)"
207 |   enabled: True
208 |   env:
209 |     CUDA_VISIBLE_DEVICES: "0,1,2"
210 |   command:
211 |     "/usr/local/bin/llama-server \
212 |       --host 0.0.0.0 \
213 |       -ngl 100 \
214 |       -m /models/Codestral-22B-v0.1-Q8_0.gguf \
215 |       --port 8083 \
216 |       -fa \
217 |       -sm row \
218 |       -mg 1 \
219 |       --no-mmap \
220 |       --log-format json" # Remove this for version >=1.2.0
221 |   post_launch_hooks:
222 |   - name: "Codestral-22B-v0.1-Q8_0_Paddler_02"
223 |     enabled: True
224 |     command:
225 |       "/usr/local/bin/paddler agent \
226 |         --name 'Codestral-22B-v0.1-Q8_0 (llama.cpp 02)' \
227 |         --external-llamacpp-host 192.168.178.56 \
228 |         --external-llamacpp-port 8083 \
229 |         --local-llamacpp-host 192.168.178.56 \
230 |         --local-llamacpp-port 8083 \
231 |         --management-host 192.168.178.56 \
232 |         --management-port 8085"
233 | ```
234 | 
235 | ![image](https://github.com/user-attachments/assets/777e4c96-b960-449e-8647-6f28753d3d8b)
236 | 
237 | 
238 | ***More to come soon***
239 | 


--------------------------------------------------------------------------------
/gppmc/gppmc.py:
--------------------------------------------------------------------------------
  1 | import click
  2 | import requests
  3 | import json
  4 | import yaml
  5 | from halo import Halo
  6 | import click_completion
  7 | 
  8 | 
  9 | click_completion.init()
 10 | 
 11 | 
 12 | @click.group()
 13 | @click.option("--host", default="localhost", help="The host to connect to.")
 14 | @click.option("--port", default=5002, type=int, help="The port to connect to.")
 15 | @click.pass_context
 16 | def gppmc(ctx, host, port):
 17 |     """Group of commands for managing LlamaCpp instances and configurations."""
 18 |     ctx.obj["BASE_URL"] = f"http://{host}:{port}"
 19 | 
 20 | 
 21 | @gppmc.group("get")
 22 | def get_group():
 23 |     """Get various resources."""
 24 |     pass
 25 | 
 26 | 
 27 | @get_group.command("instances")
 28 | @click.option(
 29 |     "--format",
 30 |     default="text",
 31 |     type=click.Choice(["json", "text"]),
 32 |     help="Print output in JSON or text format.",
 33 | )
 34 | @click.pass_context
 35 | def get_instances(ctx, format):
 36 |     """Get all LlamaCpp instances."""
 37 |     base_url = ctx.obj["BASE_URL"]
 38 |     if format == "text":
 39 |         with Halo(text="Loading instances", spinner="dots"):
 40 |             response = requests.get(f"{base_url}/get_llamacpp_instances")
 41 |     else:
 42 |         response = requests.get(f"{base_url}/get_llamacpp_instances")
 43 | 
 44 |     if format == "json":
 45 |         print(response.json())
 46 |     else:
 47 |         instances = response.json()["llamacpp_instances"]
 48 |         for instance in instances:
 49 |             print(instance)
 50 | 
 51 | 
 52 | @get_group.command("configs")
 53 | @click.option(
 54 |     "--format",
 55 |     default="text",
 56 |     type=click.Choice(["json", "text"]),
 57 |     help="Print output in JSON or text format.",
 58 | )
 59 | @click.pass_context
 60 | def get_configs(ctx, format):
 61 |     """Get all LlamaCpp configurations."""
 62 |     base_url = ctx.obj["BASE_URL"]
 63 |     if format == "text":
 64 |         with Halo(text="Loading configurations", spinner="dots"):
 65 |             response = requests.get(f"{base_url}/get_llamacpp_configs")
 66 | 
 67 |     if format == "json":
 68 |         print(response.json())
 69 |     else:
 70 |         # print(response.json())
 71 |         configs = response.json()["llamacpp_configs"]
 72 |         for config in configs:
 73 |             print(config)
 74 | 
 75 | 
 76 | @gppmc.command("apply")
 77 | @click.argument("file", type=click.File("rb"))
 78 | @click.option(
 79 |     "--format",
 80 |     default="text",
 81 |     type=click.Choice(["json", "text"]),
 82 |     help="Print output in JSON or text format.",
 83 | )
 84 | @click.pass_context
 85 | def apply_configs(ctx, file, format):
 86 |     """Apply LlamaCpp configurations from a YAML file."""
 87 |     data = yaml.safe_load(file)
 88 |     base_url = ctx.obj["BASE_URL"]
 89 |     if format == "text":
 90 |         with Halo(text="Applying configurations", spinner="dots"):
 91 |             response = requests.post(f"{base_url}/apply_llamacpp_configs", json=data)
 92 |     else:
 93 |         pass
 94 |         # response = requests.post(f"{base_url}/apply_llamacpp_configs", json=data)
 95 |         # print(response.json())
 96 | 
 97 | 
 98 | @get_group.command("subprocesses")
 99 | @click.pass_context
100 | def get_subprocesses(ctx):
101 |     """Get all LlamaCpp subprocesses."""
102 |     base_url = ctx.obj["BASE_URL"]
103 |     with Halo(text="Loading subprocesses", spinner="dots"):
104 |         response = requests.get(f"{base_url}/get_llamacpp_subprocesses")
105 |     print(json.dumps(response.json(), indent=4))
106 |     # print(response)
107 | 
108 | 
109 | @gppmc.command("reload")
110 | @click.pass_context
111 | def reload_configs(ctx):
112 |     """Reload LlamaCpp configurations."""
113 |     base_url = ctx.obj["BASE_URL"]
114 |     with Halo(text="Reloading configurations", spinner="dots"):
115 |         response = requests.get(f"{base_url}/reload_llamacpp_configs")
116 |     # print(json.dumps(response.json(), indent=4))  # TODO
117 |     # print(response)
118 | 
119 | 
120 | @gppmc.command("enable")
121 | @click.argument("name")
122 | @click.pass_context
123 | def enable_instance(ctx, name):
124 |     """Enable a LlamaCpp instance."""
125 |     base_url = ctx.obj["BASE_URL"]
126 |     with Halo(text=f"Enabling instance {name}", spinner="dots"):
127 |         response = requests.post(
128 |             f"{base_url}/enable_llamacpp_instance", json={"name": name}
129 |         )
130 |     # print(response.json())
131 |     # print(response)
132 | 
133 | 
134 | @gppmc.command("disable")
135 | @click.argument("name")
136 | @click.pass_context
137 | def disable_instance(ctx, name):
138 |     """Disable a LlamaCpp instance."""
139 |     base_url = ctx.obj["BASE_URL"]
140 |     with Halo(text=f"Disabling instance {name}", spinner="dots"):
141 |         response = requests.post(
142 |             f"{base_url}/disable_llamacpp_instance", json={"name": name}
143 |         )
144 |     # print(response.json())
145 |     # print(response)
146 | 
147 | 
148 | if __name__ == "__main__":
149 |     gppmc(obj={})
150 | 


--------------------------------------------------------------------------------
/gppmc/gppmc_config.yaml:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crashr/gppm/ff920f77d64c5769f9b5778cc44b78ce3368d34f/gppmc/gppmc_config.yaml


--------------------------------------------------------------------------------
/gppmc/requirements.txt:
--------------------------------------------------------------------------------
 1 | blinker==1.8.2
 2 | certifi==2024.6.2
 3 | charset-normalizer==3.3.2
 4 | click==8.1.7
 5 | Flask==3.0.3
 6 | idna==3.7
 7 | itsdangerous==2.2.0
 8 | Jinja2==3.1.4
 9 | MarkupSafe==2.1.5
10 | nvidia_pstate==1.0.5
11 | PyYAML==6.0.1
12 | requests==2.32.3
13 | urllib3==2.2.2
14 | Werkzeug==3.0.3
15 | pyinstaller==6.9.0
16 | click-completion
17 | halo


--------------------------------------------------------------------------------
/gppmd/config.yaml:
--------------------------------------------------------------------------------
1 | host: '127.0.0.1'
2 | port: 5002
3 | 


--------------------------------------------------------------------------------
/gppmd/gppmd.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import yaml
  3 | import logging
  4 | import time
  5 | import threading
  6 | import json
  7 | import os
  8 | from nvidia_pstate import set_pstate_low, set_pstate_high, set_pstate
  9 | from flask import Flask
 10 | from flask import jsonify
 11 | from flask import render_template
 12 | from flask import request
 13 | import subprocess
 14 | import tempfile
 15 | import re
 16 | import shlex
 17 | from werkzeug.serving import make_server
 18 | import select
 19 | import signal
 20 | import sys
 21 | 
 22 | 
 23 | global llamacpp_configs_dir
 24 | global configs
 25 | global threads
 26 | configs = []
 27 | threads = []
 28 | subprocesses = {}
 29 | 
 30 | app = Flask(__name__, template_folder=os.path.abspath("/etc/gppmd/templates"))
 31 | app.config["DEBUG"] = True
 32 | 
 33 | parser = argparse.ArgumentParser(description="gppm power process manager")
 34 | parser.add_argument(
 35 |     "--config",
 36 |     type=str,
 37 |     default="/etc/gppmd/config.yaml",
 38 |     help="Path to the configuration file",
 39 | )
 40 | parser.add_argument(
 41 |     "--llamacpp_configs_dir",
 42 |     type=str,
 43 |     default="/etc/gppmd/llamacpp_configs",
 44 |     help="Path to the llama.cpp configuration file",
 45 | )
 46 | parser.add_argument(
 47 |     "--port",
 48 |     type=int,
 49 |     default=5002,
 50 |     help="Port number for the API to listen on",
 51 | )
 52 | args = parser.parse_args()
 53 | 
 54 | with open(args.config, "r") as file:
 55 |     config = yaml.safe_load(file)
 56 | 
 57 | # for key, value in config.items():
 58 | #    parser.add_argument(f"--{key}", type=type(value), default=value, help=f"Set {key}")
 59 | for key, value in config.items():
 60 |     if key != "port":  # Exclude "--port" from the loop
 61 |         parser.add_argument(
 62 |             f"--{key}", type=type(value), default=value, help=f"Set {key}"
 63 |         )
 64 | 
 65 | args = parser.parse_args()
 66 | 
 67 | for key, value in vars(args).items():
 68 |     config[key] = value
 69 | 
 70 | # logging.basicConfig(
 71 | #    filename=config.get("log_file", "/var/log/gppmd/gppmd.log"), level=logging.INFO
 72 | # )
 73 | 
 74 | result = subprocess.run(
 75 |     ["nvidia-smi", "-L"], stdout=subprocess.PIPE, stderr=subprocess.PIPE
 76 | )
 77 | num_gpus = len(result.stdout.decode("utf-8").strip().split("\n"))
 78 | 
 79 | gpu_semaphores = {}
 80 | for gpu in range(num_gpus):
 81 |     set_pstate([gpu], int(os.getenv("NVIDIA_PSTATE_LOW", "8")), silent=True)
 82 |     gpu_semaphores[gpu] = threading.Semaphore(config.get("max_llamacpp_instances", 10))
 83 | 
 84 | 
 85 | def run_post_launch_hooks(config, subprocesses):
 86 |     if "post_launch_hooks" in config:
 87 |         for post_launch_hook in config["post_launch_hooks"]:
 88 |             if post_launch_hook["enabled"]:
 89 |                 with open("/dev/null", "w") as devnull:
 90 |                     new_subprocess = subprocess.Popen(
 91 |                         shlex.split(post_launch_hook["command"]),
 92 |                         shell=False,
 93 |                         stdout=devnull,
 94 |                         stderr=devnull,
 95 |                     )
 96 |                 subprocesses.append(new_subprocess)
 97 |                 run_post_launch_hooks(post_launch_hook, subprocesses)
 98 | 
 99 | 
100 | def list_thread_names():
101 |     thread_names = "\n".join([thread._args[0]["name"] for thread in threads])
102 |     return thread_names
103 | 
104 | 
105 | def process_line(data, config):  # Need config for hooks
106 | 
107 |     gpus = [int(x) for x in data["gppm"]["gppm_cvd"].split(",")]
108 | 
109 |     pid = data["gppm"][
110 |         "llamacpp_pid"
111 |     ]  # TODO This needs to be changed to work with ollama
112 | 
113 |     tid = data["tid"]
114 | 
115 |     if "processing task" in data["msg"]:
116 |         # logging.info(f"Task {tid} started")
117 |         for gpu in gpus:
118 |             gpu_semaphores[gpu].acquire(blocking=True)
119 |             # logging.info(f"Aquired semaphore for GPU {gpu}")
120 |         for gpu in gpus:
121 |             # logging.info(f"Setting GPU {gpu} into high performance mode")
122 |             set_pstate([gpu], int(os.getenv("NVIDIA_PSTATE_HIGH", "16")), silent=True)
123 |     elif "stop processing: " in data["msg"]:
124 |         # logging.info(f"Task {tid} terminated")
125 |         for gpu in gpus:
126 |             gpu_semaphores[gpu].release()
127 |             # logging.info(f"Released semaphore for GPU {gpu}")
128 |             if gpu_semaphores[gpu]._value is config.get("max_llamacpp_instances", 10):
129 |                 # logging.info(f"Setting GPU {gpu} into low performance mode")
130 |                 set_pstate([gpu], int(os.getenv("NVIDIA_PSTATE_LOW", "8")), silent=True)
131 | 
132 |     # for gpu, semaphore in gpu_semaphores.items():
133 |     #    logging.info(f"Semaphore value for GPU {gpu}: {semaphore._value}")
134 | 
135 | 
136 | def launch_llamacpp(llamacpp_config, stop_event):
137 |     """
138 |     tmp_dir = tempfile.TemporaryDirectory(dir="/tmp")
139 |     os.makedirs(tmp_dir.name, exist_ok=True)
140 |     pipe = os.path.join(tmp_dir.name, "pipe")
141 |     os.mkfifo(pipe)
142 |     """
143 | 
144 |     env = os.environ.copy()
145 |     # env["CUDA_VISIBLE_DEVICES"] = llamacpp_config[
146 |     #    "cuda_visible_devices"
147 |     # ]  # TODO remove this
148 |     if "env" in llamacpp_config:
149 |         for key, value in llamacpp_config["env"].items():
150 |             # print(f"ENV: {key}:{value}")
151 |             env[key] = value
152 | 
153 |     llamacpp_cmd = shlex.split(llamacpp_config["command"])
154 | 
155 |     llamacpp_process = subprocess.Popen(
156 |         llamacpp_cmd,
157 |         env=env,
158 |         stdout=subprocess.PIPE,
159 |         stderr=subprocess.PIPE,
160 |         shell=False,
161 |         bufsize=1,
162 |         universal_newlines=True,
163 |     )
164 | 
165 |     if llamacpp_process.pid not in subprocesses:
166 |         subprocesses[llamacpp_process.pid] = []
167 | 
168 |     run_post_launch_hooks(llamacpp_config, subprocesses[llamacpp_process.pid])
169 | 
170 |     pattern = re.compile(r"slot .* \| .* \| .+")
171 | 
172 |     while not stop_event.is_set():
173 |         # Wait for data to be available for reading
174 |         ready_to_read, _, _ = select.select([llamacpp_process.stderr], [], [], 0.1)
175 |         if ready_to_read:
176 |             # New data available, read it
177 |             line = llamacpp_process.stderr.readline()
178 |             # if line:
179 |             #    print(f"DEBUG line: {line}", end="")
180 |             #    pass
181 |             if pattern.search(line):
182 |                 # FIXME
183 |                 line = line.strip()
184 |                 parts = line.split(" | ")
185 |                 id_slot = 0  # TODO
186 |                 id_task = 0  # TODO
187 | 
188 |                 data = {
189 |                     "tid": "",
190 |                     "timestamp": "",
191 |                     "msg": parts[2],
192 |                     "id_slot": id_slot,
193 |                     "id_task": id_task,
194 |                     "gppm": {
195 |                         "llamacpp_pid": llamacpp_process.pid,
196 |                         "gppm_cvd": env["CUDA_VISIBLE_DEVICES"],
197 |                     },
198 |                 }
199 | 
200 |                 process_line(data, llamacpp_config)
201 |         else:
202 |             # No new data available, check if the subprocess has terminated
203 |             if llamacpp_process.poll() is not None:
204 |                 break
205 | 
206 |     # Check if subprocesses are still running
207 |     for llamacpp_subprocess in subprocesses[llamacpp_process.pid]:
208 |         if llamacpp_subprocess.poll() is None:
209 |             llamacpp_subprocess.terminate()
210 |             while llamacpp_subprocess.poll() is None:
211 |                 pass
212 | 
213 |     llamacpp_process.terminate()
214 |     llamacpp_process.wait()
215 | 
216 | 
217 | # WIP with very low prio
218 | def launch_ollama(ollama_config, stop_event):
219 |     tmp_dir = tempfile.TemporaryDirectory(dir="/tmp")
220 |     os.makedirs(tmp_dir.name, exist_ok=True)
221 |     pipe = os.path.join(tmp_dir.name, "pipe")
222 |     os.mkfifo(pipe)
223 | 
224 |     ollama_options = []
225 | 
226 |     for option in ollama_config["options"]:
227 |         if isinstance(option, dict):
228 |             pass
229 |         else:
230 |             ollama_options.append(str(option))
231 | 
232 |     ollama_cmd = shlex.split(ollama_config["command"] + " " + " ".join(ollama_options))
233 | 
234 |     env = os.environ.copy()
235 |     env["CUDA_VISIBLE_DEVICES"] = ollama_config["cuda_visible_devices"]
236 |     env["OLLAMA_HOST"] = ollama_config["ollama_host"]
237 |     env["OLLAMA_DEBUG"] = "1"
238 | 
239 |     # for env_var in ollama_config["env_vars"]:
240 |     #    for k, v in env_var.items():
241 |     #        env[k] = v
242 | 
243 |     ollama_process = subprocess.Popen(
244 |         ollama_cmd,
245 |         env=env,
246 |         stdout=subprocess.PIPE,
247 |         stderr=subprocess.PIPE,
248 |         shell=False,
249 |         bufsize=1,
250 |         universal_newlines=True,
251 |     )
252 | 
253 |     data = {"model": ollama_config["model"], "keep_alive": -1}
254 |     response = requests.post(
255 |         f"http://" + ollama_config["ollama_host"] + "/api/generate", data
256 |     )
257 | 
258 |     pattern = re.compile(r"slot is processing task|slot released")
259 | 
260 |     while not stop_event.is_set():
261 |         # Wait for data to be available for reading
262 |         ready_to_read, _, _ = select.select([ollama_process.stdout], [], [], 0.1)
263 |         if ready_to_read:
264 |             # New data available, read it
265 |             line = ollama_process.stdout.readline()
266 |             # print(line) # FIXME
267 |             if pattern.search(line):
268 |                 try:
269 |                     data = json.loads(line)
270 |                     data["gppm"] = {
271 |                         "ollama_pid": ollama_process.pid,
272 |                         "gppm_cvd": env["CUDA_VISIBLE_DEVICES"],
273 |                     }
274 |                 except:
275 |                     data = {}
276 |                     data["gppm"] = {  # FIXME
277 |                         "llamacpp_pid": ollama_process.pid,
278 |                         "ollama_pid": ollama_process.pid,
279 |                         "gppm_cvd": env["CUDA_VISIBLE_DEVICES"],
280 |                     }
281 |                     data["tid"] = 0
282 |                     data["msg"] = line
283 |                 process_line(data)
284 |         else:
285 |             # No new data available, check if the subprocess has terminated
286 |             if ollama_process.poll() is not None:
287 |                 break
288 | 
289 |     ollama_process.terminate()
290 |     ollama_process.wait()
291 | 
292 | 
293 | llamacpp_configs_dir = config.get("llamacpp_configs_dir", "/etc/gppmd/llamacpp_configs")
294 | 
295 | 
296 | def load_llamacpp_configs(llamacpp_configs_dir=llamacpp_configs_dir):
297 |     new_configs = []
298 |     for filename in os.listdir(llamacpp_configs_dir):
299 |         if filename.endswith(".yaml"):
300 |             with open(os.path.join(llamacpp_configs_dir, filename), "r") as f:
301 |                 configs = yaml.safe_load(f)
302 |                 for config in configs:
303 |                     new_configs.append(config)
304 |     return new_configs
305 | 
306 | 
307 | def purge_thread(thread):
308 |     thread._args[1].set()  # Signal to stop
309 |     thread.join()  # Wait for the thread to finish
310 |     threads.remove(thread)  # Remove the thread from the list
311 | 
312 | 
313 | def sync_threads_with_configs(threads, configs, launch_llamacpp):
314 |     existing_config_names = [thread._args[0]["name"] for thread in threads]
315 | 
316 |     # new_config_names = [config['name'] for config in configs]
317 |     new_config_names = []
318 |     for config in configs:
319 |         new_config_names.append(config["name"])
320 | 
321 |     # Remove threads that are not in the configs
322 |     for thread in threads[:]:
323 |         if thread._args[0]["name"] not in new_config_names:
324 |             purge_thread(thread)
325 |             # threads.remove(thread)
326 | 
327 |     # Add or update threads based on the configs
328 |     for config in configs:
329 |         if config["name"] in existing_config_names:  # FIXME is that needed?
330 |             # Update existing thread
331 |             for thread in threads:
332 |                 if thread._args[0]["name"] == config["name"]:
333 |                     if (
334 |                         thread._args[0] != config or config["enabled"] == False
335 |                     ):  # FIXME does order of options in config file matter?
336 |                         # print("Thread config has changed. Purging thread.")
337 |                         purge_thread(thread)
338 |                         # print("DEBUG: Purged.")
339 |                         if config["enabled"] == True:
340 |                             stop_event = threading.Event()
341 |                             new_thread = threading.Thread(
342 |                                 target=launch_llamacpp, args=(config, stop_event)
343 |                             )
344 |                             new_thread.start()
345 |                             threads.append(new_thread)
346 |         else:
347 |             if config["enabled"] == True:
348 |                 # Add new thread
349 |                 stop_event = threading.Event()  # FIXME
350 |                 new_thread = threading.Thread()  # FIXME
351 | 
352 |                 if config.get("type", "llamacpp") == "llamacpp":
353 |                     new_thread = threading.Thread(
354 |                         target=launch_llamacpp, args=(config, stop_event)
355 |                     )
356 |                 elif config.get("type") == "ollama":
357 |                     new_thread = threading.Thread(
358 |                         target=launch_ollama, args=(config, stop_event)
359 |                     )
360 |                 else:
361 |                     # Handle unknown type or default to llamacpp
362 |                     new_thread = threading.Thread(
363 |                         target=launch_llamacpp, args=(config, stop_event)
364 |                     )
365 | 
366 |                 new_thread.start()
367 |                 threads.append(new_thread)
368 | 
369 |     return threads
370 | 
371 | 
372 | @app.route("/apply_llamacpp_configs", methods=["POST"])
373 | def api_apply_llamacpp_configs():
374 |     global configs
375 |     global threads
376 |     configs = load_llamacpp_configs()
377 |     configs_to_apply = request.json
378 |     for config in configs_to_apply:
379 |         configs.append(config)
380 |     threads = sync_threads_with_configs(threads, configs, launch_llamacpp)
381 |     return {"status": "OK"}
382 | 
383 | 
384 | @app.route("/reload_llamacpp_configs", methods=["GET"])
385 | def api_reload_llamacpp_configs():
386 |     global configs
387 |     global threads
388 |     configs = load_llamacpp_configs()
389 |     threads = sync_threads_with_configs(threads, configs, launch_llamacpp)
390 |     return {"status": "OK"}
391 | 
392 | 
393 | @app.route("/get_llamacpp_instances", methods=["GET"])
394 | def api_get_llamacpp_instances():
395 |     instances = {"llamacpp_instances": []}
396 |     for thread in threads:
397 |         thread_name = thread._args[0]["name"]
398 |         instances["llamacpp_instances"].append(thread_name)
399 |     return jsonify(instances)
400 | 
401 | 
402 | @app.route("/get_llamacpp_subprocesses", methods=["GET"])
403 | def api_get_llamacpp_subprocesses():
404 |     serialized_subprocesses = {
405 |         pid: [
406 |             {"pid": p.pid, "args": p.args, "returncode": p.returncode}
407 |             for p in processes
408 |         ]
409 |         for pid, processes in subprocesses.items()
410 |     }
411 |     print(serialized_subprocesses)
412 |     return jsonify(serialized_subprocesses)
413 | 
414 | 
415 | @app.route("/get_llamacpp_configs", methods=["GET"])
416 | def api_get_llamacpp_configs():
417 |     return jsonify(configs)
418 | 
419 | 
420 | @app.route("/gui", methods=["GET"])
421 | def gui():
422 |     return render_template("home.html")
423 | 
424 | 
425 | @app.route("/get_instances", methods=["GET"])
426 | def api_get_instances():
427 |     instances = {"instances": []}
428 |     for thread in threads:
429 |         thread_data = {
430 |             "name": thread._args[0]["name"],
431 |             "pid": thread._args[0]["pid"],
432 |             "cvd": thread._args[0]["cuda_visible_devices"],
433 |             # add more data if needed
434 |         }
435 |         instances["instances"].append(thread_data)
436 |     return jsonify(instances)
437 | 
438 | 
439 | @app.route("/enable_llamacpp_instance", methods=["POST"])
440 | def api_enable_llamacpp_instance():
441 |     global configs
442 |     global threads
443 |     name = request.json.get("name")
444 | 
445 |     for config in configs:
446 |         print(config)
447 |         if config["name"] == name:
448 |             config["enabled"] = True
449 |             threads = sync_threads_with_configs(threads, configs, launch_llamacpp)
450 |             return {"status": "OK", "message": f"Instance {name} enabled"}
451 |     return {"status": "ERROR", "message": f"Instance {name} not found"}
452 | 
453 | 
454 | @app.route("/disable_llamacpp_instance", methods=["POST"])
455 | def api_disable_llamacpp_instance():
456 |     global configs
457 |     global threads
458 |     name = request.json.get("name")
459 | 
460 |     for config in configs:
461 |         print(config)
462 |         if config["name"] == name:
463 |             config["enabled"] = False
464 |             threads = sync_threads_with_configs(threads, configs, launch_llamacpp)
465 |             return {"status": "OK", "message": f"Instance {name} disabled"}
466 |     return {"status": "ERROR", "message": f"Instance {name} not found"}
467 | 
468 | 
469 | server = make_server("0.0.0.0", args.port, app)
470 | server_thread = threading.Thread(target=server.serve_forever)
471 | server_thread.start()
472 | 
473 | 
474 | if __name__ == "__main__":
475 | 
476 |     logging.info(f"Reading llama.cpp configs")
477 | 
478 |     configs = load_llamacpp_configs()
479 |     threads = sync_threads_with_configs(threads, configs, launch_llamacpp)
480 | 
481 |     logging.info(f"All llama.cpp instances launched")
482 | 
483 |     for thread in threads:
484 |         thread.join()
485 | 


--------------------------------------------------------------------------------
/gppmd/gppmd_config.yaml.example:
--------------------------------------------------------------------------------
1 | log_file: '/var/log/gppm/gppmd.log'
2 | sleep_time: 0.1
3 | timeout_time: 0.0
4 | log_file_to_monitor: '/var/log/llama.cpp/llama-server.log'
5 | host: '0.0.0.0'
6 | port: 5000
7 | max_llamacpp_instances: 10


--------------------------------------------------------------------------------
/gppmd/llamacpp_configs/examples.yaml:
--------------------------------------------------------------------------------
 1 | # To try the following examples edit it's configuration and set enabled to True.
 2 | # Multiple can be activated at the same time as long as their configurations do not interfere with each other.
 3 | 
 4 | - name: Biggie_SmolLM_0.15B_Base_q8_0_01
 5 |   enabled: False
 6 |   env:
 7 |     CUDA_VISIBLE_DEVICES: "0"
 8 |   command:
 9 |     "/usr/local/bin/llama-server \
10 |       --host 0.0.0.0 \
11 |       -ngl 100 \
12 |       -m /models/Biggie_SmolLM_0.15B_Base_q8_0.gguf \
13 |       --port 8061 \
14 |       -sm none \
15 |       --no-mmap \
16 |       --log-format json"
17 | 
18 | - name: Biggie_SmolLM_0.15B_Base_q8_0_02
19 |   enabled: False
20 |   env:
21 |     CUDA_VISIBLE_DEVICES: "1"
22 |   command:
23 |     "/usr/local/bin/llama-server \
24 |       --host 0.0.0.0 \
25 |       -ngl 100 \
26 |       -m /models/Biggie_SmolLM_0.15B_Base_q8_0.gguf \
27 |       --port 8062 \
28 |       -sm none \
29 |       --no-mmap \
30 |       --log-format json"
31 | 


--------------------------------------------------------------------------------
/gppmd/requirements.txt:
--------------------------------------------------------------------------------
 1 | blinker==1.8.2
 2 | certifi==2024.6.2
 3 | charset-normalizer==3.3.2
 4 | click==8.1.7
 5 | Flask==3.0.3
 6 | idna==3.7
 7 | itsdangerous==2.2.0
 8 | Jinja2==3.1.4
 9 | MarkupSafe==2.1.5
10 | nvidia_pstate==1.0.5
11 | PyYAML==6.0.1
12 | requests==2.32.3
13 | urllib3==2.2.2
14 | Werkzeug==3.0.3
15 | pyinstaller==6.9.0
16 | 


--------------------------------------------------------------------------------
/gppmd/templates/home.html:
--------------------------------------------------------------------------------
 1 | <!doctype html>
 2 | <html lang="en">
 3 |   <body>
 4 |     Hello, World!
 5 |     <script>
 6 |       function updateInstances() {
 7 |         fetch('/get_instances')
 8 |           .then(response => response.json())
 9 |           .then(data => {
10 |             const instancesList = document.getElementById('instances-list');
11 |             instancesList.innerHTML = ''; // clear the list
12 |             data.instances.forEach(instance => {
13 |               const listItem = document.createElement('li');
14 |               listItem.textContent = `Name: ${instance.name}, PID: ${instance.pid}, CVD: ${instance.cvd}`;
15 |               instancesList.appendChild(listItem);
16 |             });
17 |           });
18 |       }
19 |       
20 |       // Call updateInstances immediately
21 |       updateInstances();
22 |       
23 |       // Then call updateInstances every 5 seconds
24 |       setInterval(updateInstances, 5000);
25 |       </script>
26 |   </body>
27 | </html>
28 | 


--------------------------------------------------------------------------------
/tools/build_gppmc_deb.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/bash
  2 | 
  3 | # Ensure necessary tools are installed
  4 | for tool in python3 pip dpkg-deb
  5 | do
  6 |   if ! command -v $tool &> /dev/null
  7 |   then
  8 |     echo "$tool could not be found"
  9 |     exit
 10 |   fi
 11 | done
 12 | 
 13 | PACKAGE_NAME="gppmc"
 14 | VERSION="$(git describe --tags --abbrev=0)"
 15 | MAINTAINER="Roni"
 16 | DESCRIPTION="gppm power process manager command line tool"
 17 | ARCHITECTURE="amd64"
 18 | 
 19 | DEST_DIR="build/$PACKAGE_NAME-$VERSION-$ARCHITECTURE"
 20 | 
 21 | # Clean build dir 
 22 | sudo rm -rf $DEST_DIR
 23 | 
 24 | # Ensure the Python project file exists
 25 | if [ ! -f "$PACKAGE_NAME/$PACKAGE_NAME.py" ]
 26 | then
 27 |   echo "$PACKAGE_NAME/$PACKAGE_NAME.py could not be found"
 28 |   exit
 29 | fi
 30 | 
 31 | # Ensure the configuration file exists
 32 | if [ ! -f "$PACKAGE_NAME/${PACKAGE_NAME}_config.yaml" ]
 33 | then
 34 |   echo "$PACKAGE_NAME/${PACKAGE_NAME}_config.yaml could not be found"
 35 |   exit
 36 | fi
 37 | 
 38 | ## Ensure the requirements.txt file exists
 39 | #if [ ! -f "$PACKAGE_NAME/requirements.txt" ]
 40 | #then
 41 | #  echo "$PACKAGE_NAME/requirements.txt could not be found"
 42 | #  exit
 43 | #fi
 44 | 
 45 | # Ensure PyInstaller is installed
 46 | #if ! pip show PyInstaller &> /dev/null
 47 | #then
 48 | #  echo "PyInstaller is not installed"
 49 | #  exit
 50 | #fi
 51 | 
 52 | # Create the directory structure
 53 | mkdir -p $DEST_DIR/DEBIAN
 54 | mkdir -p $DEST_DIR/etc/bash_completion.d
 55 | mkdir -p $DEST_DIR/usr/bin
 56 | mkdir -p $DEST_DIR/usr/share/bash-completion/completions
 57 | mkdir -p $DEST_DIR/usr/share/$PACKAGE_NAME
 58 | 
 59 | # Create the control file
 60 | cat > $DEST_DIR/DEBIAN/control <<EOF
 61 | Package: $PACKAGE_NAME
 62 | Version: ${VERSION}
 63 | Section: base
 64 | Priority: optional
 65 | Architecture: $ARCHITECTURE
 66 | Essential: no
 67 | Maintainer: $MAINTAINER
 68 | Description: $DESCRIPTION
 69 | EOF
 70 | 
 71 | # Create a virtual environment and install dependencies
 72 | python3 -m venv $PACKAGE_NAME/venv
 73 | source $PACKAGE_NAME/venv/bin/activate
 74 | pip install -r $PACKAGE_NAME/requirements.txt
 75 | 
 76 | # Compile your Python project to a binary
 77 | pyinstaller --hidden-import=requests --onefile $PACKAGE_NAME/$PACKAGE_NAME.py --distpath $DEST_DIR/usr/bin
 78 | 
 79 | # Create bash completion file
 80 | _GPPMC_COMPLETE=bash_source $DEST_DIR/usr/bin/$PACKAGE_NAME > $DEST_DIR/usr/share/bash-completion/completions/$PACKAGE_NAME
 81 | 
 82 | # Deactivate the virtual environment
 83 | deactivate
 84 | 
 85 | # Create a basic configuration file
 86 | cp $PACKAGE_NAME/${PACKAGE_NAME}_config.yaml $DEST_DIR/usr/share/$PACKAGE_NAME/$PACKAGE_NAME.yaml
 87 | 
 88 | sudo chown -R root:root build/*
 89 | 
 90 | # Build the package
 91 | sudo dpkg-deb --build $DEST_DIR
 92 | 
 93 | # Ensure the package is built successfully
 94 | if [ ! -f "$DEST_DIR.deb" ]
 95 | then
 96 |   echo "Failed to build $PACKAGE_NAME.deb"
 97 |   exit
 98 | fi
 99 | 
100 | # Cleanup
101 | sudo rm -rf ./build/$PACKAGE_NAME
102 | rm -rf ./$PACKAGE_NAME/venv
103 | rm -rf ./*.spec
104 | 


--------------------------------------------------------------------------------
/tools/build_gppmd_deb.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/bash
  2 | 
  3 | # Ensure necessary tools are installed
  4 | for tool in python3 pip dpkg-deb
  5 | do
  6 |   if ! command -v $tool &> /dev/null
  7 |   then
  8 |     echo "$tool could not be found"
  9 |     exit
 10 |   fi
 11 | done
 12 | 
 13 | PACKAGE_NAME="gppmd"
 14 | VERSION="$(git describe --tags --abbrev=0)"
 15 | MAINTAINER="Roni"
 16 | DESCRIPTION="gppm power process manager daemon"
 17 | ARCHITECTURE="amd64"
 18 | 
 19 | DEST_DIR="build/$PACKAGE_NAME-$VERSION-$ARCHITECTURE"
 20 | 
 21 | # Clean build dir 
 22 | sudo rm -rf $DEST_DIR
 23 | 
 24 | # Ensure the Python project file exists
 25 | if [ ! -f "$PACKAGE_NAME/$PACKAGE_NAME.py" ]
 26 | then
 27 |   echo "$PACKAGE_NAME/$PACKAGE_NAME.py could not be found"
 28 |   exit
 29 | fi
 30 | 
 31 | ## Ensure the configuration file exists
 32 | #if [ ! -f "$PACKAGE_NAME/${PACKAGE_NAME}_config.yaml" ]
 33 | #then
 34 | #  echo "$PACKAGE_NAME/${PACKAGE_NAME}_config.yaml could not be found"
 35 | #  exit
 36 | #fi
 37 | 
 38 | # Ensure the requirements.txt file exists
 39 | if [ ! -f "$PACKAGE_NAME/requirements.txt" ]
 40 | then
 41 |   echo "$PACKAGE_NAME/requirements.txt could not be found"
 42 |   exit
 43 | fi
 44 | 
 45 | ## Ensure PyInstaller is installed
 46 | #if ! pip show PyInstaller &> /dev/null
 47 | #then
 48 | #  echo "PyInstaller is not installed"
 49 | #  exit
 50 | #fi
 51 | 
 52 | # Create the directory structure
 53 | mkdir -p $DEST_DIR/DEBIAN
 54 | mkdir -p $DEST_DIR/usr/bin
 55 | mkdir -p $DEST_DIR/etc/$PACKAGE_NAME
 56 | mkdir -p $DEST_DIR/etc/$PACKAGE_NAME/llamacpp_configs
 57 | mkdir -p $DEST_DIR/etc/$PACKAGE_NAME/templates
 58 | mkdir -p $DEST_DIR/var/log/$PACKAGE_NAME
 59 | mkdir -p $DEST_DIR/lib/systemd/system
 60 | 
 61 | # Create the control file
 62 | cat > $DEST_DIR/DEBIAN/control <<EOF
 63 | Package: $PACKAGE_NAME
 64 | Version: ${VERSION}
 65 | Section: base
 66 | Priority: optional
 67 | Architecture: $ARCHITECTURE
 68 | Essential: no
 69 | Maintainer: $MAINTAINER
 70 | Description: $DESCRIPTION
 71 | EOF
 72 | 
 73 | # Create the postinst script
 74 | cat > $DEST_DIR/DEBIAN/postinst <<EOF
 75 | #!/bin/sh
 76 | set -e
 77 | 
 78 | case "\$1" in
 79 |   configure)
 80 |     systemctl daemon-reload
 81 |     ;;
 82 |   *)
 83 |     ;;
 84 | esac
 85 | EOF
 86 | 
 87 | chmod +x $DEST_DIR/DEBIAN/postinst
 88 | 
 89 | ## Create the postrm script
 90 | #cat > $DEST_DIR/DEBIAN/postrm <<EOF
 91 | ##!/bin/sh
 92 | #set -e
 93 | #
 94 | #case "$1" in
 95 | #  remove)
 96 | #    systemctl daemon-reload
 97 | #    ;;
 98 | #  *)
 99 | #    ;;
100 | #esac
101 | #EOF
102 | #
103 | #chmod +x $DEST_DIR/DEBIAN/postrm
104 | 
105 | # Create a virtual environment and install dependencies
106 | #python3 -m venv $DEST_DIR/usr/local/lib/$PACKAGE_NAME/venv
107 | #source $DEST_DIR/usr/local/lib/$PACKAGE_NAME/venv/bin/activate
108 | #pip install -r $PACKAGE_NAME/requirements.txt
109 | python3 -m venv $PACKAGE_NAME/venv
110 | source $PACKAGE_NAME/venv/bin/activate
111 | pip install -r $PACKAGE_NAME/requirements.txt
112 | 
113 | # Compile your Python project to a binary
114 | pyinstaller --onefile $PACKAGE_NAME/$PACKAGE_NAME.py --distpath $DEST_DIR/usr/bin
115 | 
116 | # Deactivate the virtual environment
117 | deactivate
118 | 
119 | # Create a basic configuration file
120 | #cp $PACKAGE_NAME/${PACKAGE_NAME}_config.yaml $DEST_DIR/etc/$PACKAGE_NAME/$PACKAGE_NAME.yaml
121 | 
122 | # Create a basic configuration file
123 | cp $PACKAGE_NAME/llamacpp_configs/examples.yaml $DEST_DIR/etc/$PACKAGE_NAME/llamacpp_configs/examples.yaml
124 | 
125 | # Copy template files
126 | cp $PACKAGE_NAME/templates/* $DEST_DIR/etc/$PACKAGE_NAME/templates/
127 | 
128 | # Create the systemd service file
129 | cat > $DEST_DIR/lib/systemd/system/$PACKAGE_NAME.service <<EOF
130 | [Unit]
131 | Description=GPU Power and Performance Manager Daemon
132 | 
133 | [Service]
134 | ExecStart=/usr/bin/$PACKAGE_NAME
135 | Restart=always
136 | User=root
137 | Group=root
138 | StandardOutput=syslog
139 | StandardError=syslog
140 | SyslogIdentifier=$PACKAGE_NAME
141 | 
142 | [Install]
143 | WantedBy=multi-user.target
144 | EOF
145 | 
146 | # Ensure the systemd service file is created successfully
147 | if [ ! -f "$DEST_DIR/lib/systemd/system/$PACKAGE_NAME.service" ]
148 | then
149 |   echo "Failed to create $PACKAGE_NAME.service"
150 |   exit
151 | fi
152 | 
153 | sudo chown -R root:root build/*
154 | 
155 | # Build the package
156 | sudo dpkg-deb --build $DEST_DIR
157 | 
158 | # Ensure the package is built successfully
159 | if [ ! -f "$DEST_DIR.deb" ]
160 | then
161 |   echo "Failed to build $PACKAGE_NAME.deb"
162 |   exit
163 | fi
164 | 
165 | # Cleanup
166 | sudo rm -rf ./build/$PACKAGE_NAME
167 | rm -rf ./$PACKAGE_NAME/venv
168 | rm -rf ./*.spec
169 | 


--------------------------------------------------------------------------------
/tools/run_instance_1.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | mkfifo llama_server_output_1
 4 | CUDA_VISIBLE_DEVICES=0,1
 5 | CUDA_VISIBLE_DEVICES=0,1 llama-server --host 0.0.0.0 -ngl 100 -m ~/models/Codestral-22B-v0.1-Q8_0.gguf --port 8081 -fa -sm row -mg 0 --no-mmap --log-format json > llama_server_output_1 &
 6 | llamacpp_pid=$!
 7 | jq --unbuffered -c --arg pid "$llamacpp_pid" --arg cvd "$CUDA_VISIBLE_DEVICES" ".gppm = {\"llamacpp_pid\":\$pid,\"gppm_cvd\":\$cvd}" < llama_server_output_1 \
 8 | | egrep "slot is processing task|slot released" --line-buffered \
 9 | | tee -a ~/llama-server.log
10 | rm llama_server_output_1
11 | 
12 | 


--------------------------------------------------------------------------------
/tools/run_instance_2.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | mkfifo llama_server_output_2
 4 | CUDA_VISIBLE_DEVICES=1,2
 5 | CUDA_VISIBLE_DEVICES=1,2 llama-server --host 0.0.0.0 -ngl 100 -m ~/models/Replete-Coder-Llama3-8B-Q4_K_M.gguf --port 8082 -fa -sm row -mg 0 --no-mmap --log-format json > llama_server_output_2 &
 6 | llamacpp_pid=$!
 7 | jq --unbuffered -c --arg pid "$llamacpp_pid" --arg cvd "$CUDA_VISIBLE_DEVICES" ".gppm = {\"llamacpp_pid\":\$pid,\"gppm_cvd\":\$cvd}" < llama_server_output_2 \
 8 | | egrep "slot is processing task|slot released" --line-buffered \
 9 | | tee -a ~/llama-server.log
10 | rm llama_server_output_2
11 | 
12 | 


--------------------------------------------------------------------------------
/tools/start_over.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | SRC_DIR="./"
 4 | 
 5 | COMMAND="python3 gppmd.py --config gppmd_config.yaml"
 6 | 
 7 | start_command() {
 8 |     echo "Starting command..."
 9 |     $COMMAND &
10 |     COMMAND_PID=$!
11 |     echo "Command started with PID $COMMAND_PID"
12 | }
13 | 
14 | stop_command() {
15 |     if [ -n "$COMMAND_PID" ]; then
16 |         echo "Stopping command with PID $COMMAND_PID..."
17 |         kill $COMMAND_PID
18 |         wait $COMMAND_PID 2>/dev/null
19 |         echo "Command stopped."
20 |     fi
21 | }
22 | 
23 | trap stop_command EXIT
24 | 
25 | start_command
26 | 
27 | inotifywait --exclude ".*\.log|\..*" -m -r -e modify,create,delete,move "$SRC_DIR" |
28 | while read -r directory events filename; do
29 |     echo "Change detected in $directory$filename: $events"
30 |     stop_command
31 |     start_command
32 | done
33 | 


--------------------------------------------------------------------------------
/tools/sync_to_remote.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Description:
 4 | # This script synchronizes a source directory to a remote destination using rsync.
 5 | # It uses inotifywait to monitor the source directory for changes and then synchronizes the changes to the destination.
 6 | # The source and destination directories are specified as command line arguments.
 7 | 
 8 | # Check if source and destination are provided
 9 | if [ $# -ne 2 ]; then
10 |     echo "Usage: $0 <source_directory> <destination_directory>"
11 |     exit 1
12 | fi
13 | 
14 | SRC="$1"
15 | DEST="$2"
16 | #LOGFILE="/var/log/rsync.log"
17 | 
18 | inotifywait -m -r -e modify,create,delete,move "$SRC" --format '%w%f' |
19 | while read file; do
20 |     #rsync -avz --include='build/*.deb' --exclude='build/*' "$SRC" "$DEST" #>> "$LOGFILE" 2>&1
21 |     rsync -avz --include='build/*' "$SRC" "$DEST" #>> "$LOGFILE" 2>&1
22 | done
23 | 


--------------------------------------------------------------------------------