├── README.md └── monitoring ├── readme.md └── nodemonitor.sh /README.md: -------------------------------------------------------------------------------- 1 | # Quicksilver 2 | 3 | For the monitoring tools please head over to the folder and follow the readme there 4 | -------------------------------------------------------------------------------- /monitoring/readme.md: -------------------------------------------------------------------------------- 1 | # Monitoring Quicksilver node 2 | 3 | To have automatic monitoring of your Quicksilver Node & Validator enabled one can follow this guide. 4 | 5 | ## Script nodemonitor.sh 6 | 7 | To monitor the status of the Quicksilver Node & Validator it's possible to run the script **nodemonitor.sh** available in this repository. 8 | This script is based/build on the version already available. 9 | When the script is started it will create a file with log entries that monitors the most important stuff of the node. 10 | 11 | Since the script creates it's own logfile, it's advised to run it in a separate directory, e.g. **_monitoring_**. 12 | 13 | ## What is monitored by the script 14 | 15 | The script creates a log entry in the following format 16 | 17 | ```bash 18 | 2021-10-06 01:33:56+00:00 status=synced blockheight=1557207 node_stuck=NO tfromnow=7 npeers=12 npersistentpeersoff=1 isvalidator=yes pctprecommits=1.00 pcttotcommits=1.0 mpc_eligibility=OK 19 | ``` 20 | 21 | The log line entries are: 22 | 23 | * **status** can be {scriptstarted | error | catchingup | synced} 'error' can have various causes 24 | * **blockheight** blockheight from lcd call 25 | * **node_stuck** YES when last block read is the same as the last iteration, if not then NO 26 | * **tfromnow** time in seconds since blockheight 27 | * **npeers** number of connected peers 28 | * **npersistentpeersoff** number of disconnected persistent peers 29 | * **isvalidator** if validator metrics are enabled, can be {yes | no} 30 | * **pctprecommits** if validator metrics are enabled, percentage of last n precommits from blockheight as configured in nodemonitor.sh 31 | * **pcttotcommits** if validator metrics are enabled, percentage of total commits of the validator set at blockheight 32 | * **mpc_eligibility** OK if MPC eligibility test suceed (ie stake % above min_eligible_threshold), else NOK. ERR will occurs if curl fails 33 | 34 | ## Telegram Alerting 35 | 36 | for telegram alerts, update : 37 | 38 | ```text 39 | #TELEGRAM 40 | BOT_ID="bot" 41 | CHAT_ID="" 42 | ``` 43 | 44 | you can create your telegram bot following this : and obtain the chat_id 45 | 46 | ## Running the script as a service 47 | 48 | To have the script monitor the node constantly and have active alerting available it's possible to run it as a service. 49 | The following example shows how the service file will look like when running in Ubuntu 20.04. 50 | 51 | The service assumes: 52 | 53 | * you have the script placed in your **_$HOME/QCK-tools/monitoring_** directory 54 | * run chmod +x /home/$USER/Quicksilver-tools/monitoring/nodemonitor.sh 55 | * you used the rootless docker installation in the same repo 56 | 57 | Please be aware to run the service as the user that has sufficient right to access this directory (normally this will be the user that one used to logon to the system). Best practice would be to create a separate user for the monitoring service, but this guide doesn't cover that! 58 | 59 | Create a file called **Quicksilver-nodemonitor.service** in the **~/.config/systemd/user/** by following the commands: 60 | 61 | ```bash 62 | mkdir -p ~/.config/systemd/user 63 | cat<<-EOF > ~/.config/systemd/user/Quicksilver-nodemonitor.service 64 | [Unit] 65 | Description=Quicksilver NodeMonitor 66 | Wants=network-online.target 67 | After=network-online.target 68 | 69 | [Service] 70 | Type=simple 71 | Restart=always 72 | RestartSec=1 73 | ExecStart=/bin/bash -c '. "\$0" && exec "\$@"' /home/$USER/.profile /home/$USER/QCK-tools/monitoring/nodemonitor.sh 74 | 75 | [Install] 76 | WantedBy=multi-user.target 77 | EOF 78 | ``` 79 | 80 | Now the service file is created it can be started by the following command: 81 | 82 | ```bash 83 | systemctl --user start Quicksilver-nodemonitor 84 | ``` 85 | 86 | To make sure the service will be active even when a reboot takes place, use: 87 | 88 | ```bash 89 | systemctl --user enable Quicksilver-nodemonitor 90 | ``` 91 | 92 | Check the status of the service with: 93 | 94 | ```bash 95 | systemctl --user status Quicksilver-nodemonitor 96 | ``` 97 | 98 | If doing any changes to the files after it was first started do: 99 | 100 | ```bash 101 | systemctl --user daemon-reload 102 | ``` 103 | 104 | check the nodemonitor log 105 | 106 | ```bash 107 | journalctl --user -fu Quicksilver-nodemonitor 108 | ``` 109 | 110 | Update the nodemonitor.sh 111 | 112 | ```bash 113 | git stash 114 | git pull 115 | git stash pop 116 | systemctl --user stop Quicksilver-nodemonitor 117 | systemctl --user start Quicksilver-nodemonitor 118 | ``` 119 | -------------------------------------------------------------------------------- /monitoring/nodemonitor.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | ### packages required: jq, bc 4 | REQUIRED_PKG="bc" 5 | PKG_OK=$(dpkg-query -W --showformat='${Status}\n' $REQUIRED_PKG|grep "install ok installed") 6 | echo Checking for $REQUIRED_PKG: $PKG_OK 7 | if [ "" = "$PKG_OK" ]; then 8 | echo "No $REQUIRED_PKG. Setting up $REQUIRED_PKG." 9 | sudo apt-get --yes install $REQUIRED_PKG 10 | fi 11 | 12 | REQUIRED_PKG="jq" 13 | PKG_OK=$(dpkg-query -W --showformat='${Status}\n' $REQUIRED_PKG|grep "install ok installed") 14 | echo Checking for $REQUIRED_PKG: $PKG_OK 15 | if [ "" = "$PKG_OK" ]; then 16 | echo "No $REQUIRED_PKG. Setting up $REQUIRED_PKG." 17 | sudo apt-get --yes install $REQUIRED_PKG 18 | fi 19 | 20 | ### if suppressing error messages is preferred, run as './nodemonitor.sh 2> /dev/null' 21 | 22 | ### CONFIG ################################################################################################## 23 | CONFIG="" # config.toml file for node, eg. $HOME/.gaia/config/config.toml 24 | ### optional: # 25 | NAME="" # enter the name off the validator key 26 | KEYRINGPW="" # enter keyring password 27 | NPRECOMMITS="20" # check last n precommits, can be 0 for no checking 28 | VALIDATORADDRESS="" # if left empty default is from status call (validator) 29 | QuicksilverVALIDATORADDRESS="" #if left empty default is from status call (Quicksilver validator) 30 | CHECKPERSISTENTPEERS="1" # if 1 the number of disconnected persistent peers is checked (when persistent peers are configured in config.toml) 31 | VALIDATORMETRICS="on" # metrics for validator node 32 | LOGNAME="" # a custom log file name can be chosen, if left empty default is nodecheck-.log 33 | LOGPATH="$(pwd)" # the directory where the log file is stored, for customization insert path like: /my/path 34 | LOGSIZE=200 # the max number of lines after that the log will be trimmed to reduce its size 35 | LOGROTATION="1" # options for log rotation: (1) rotate to $LOGNAME.1 every $LOGSIZE lines; (2) append to $LOGNAME.1 every $LOGSIZE lines; (3) truncate $logfile to $LOGSIZE every iteration 36 | SLEEP1="15s" # polls every SLEEP1 sec 37 | ### internal: # 38 | colorI='\033[0;32m' # black 30, red 31, green 32, yellow 33, blue 34, magenta 35, cyan 36, white 37 39 | colorD='\033[0;90m' # for light color 9 instead of 3 40 | colorE='\033[0;31m' # 41 | colorW='\033[0;33m' # 42 | noColor='\033[0m' # no color 43 | ### END CONFIG ################################################################################################## 44 | 45 | ################### NOTIFICATION CONFIG ################### 46 | enable_notification="true" #true of false 47 | # TELEGRAM 48 | enable_telegram="false" 49 | BOT_ID="bot" 50 | CHAT_ID="" 51 | # DISCORD 52 | enable_discord="false" 53 | DISCORD_URL="" 54 | 55 | #variable below avoid spams for the same notification state along with their notification message 56 | #catchup 57 | synced_n="catchingup" # notification state either synced of catchingup (value possible catchingup/synced) 58 | nmsg_synced="Your Quicksilver node is now in synced" 59 | nmsg_unsynced="Your Quicksilver node is no longer in synced" 60 | 61 | #node stuck 62 | lastblockheight=0 63 | node_stuck_n="false" # true or false indicating the notification state of a node stuck 64 | nmsg_nodestuck="Your Quicksilver node is now stuck" 65 | nmsg_node_no_longer_stuck="Your Quicksilver node is no longer stuck, Yeah !" 66 | node_stuck_status="NA" #node stucktest status to print out to log file 67 | 68 | #Quicksilver validator run (Quicksilverd) 69 | Quicksilverd_run_n="true" # true or false indicating whether Quicksilverd is running or not 70 | nmsg_Quicksilverd_run_ok="$HOSTNAME: Your Quicksilverd is running ok now" 71 | nmsg_Quicksilverd_run_nok="@here $HOSTNAME: Your Quicksilverd has just stop running, fix it !" 72 | Quicksilverd_run_status="NA" #Quicksilverd test status to print out to log file 73 | 74 | ################### END NOTIFICATION CONFIG ################### 75 | 76 | echo "Notification enabled on telegram : ${enable_telegram} / on discord : ${enable_discord}" 77 | 78 | send_notification() { 79 | if [ "$enable_notification" == "true" ]; then 80 | message=$1 81 | 82 | if [ "$enable_telegram" == "true" ]; then 83 | curl -s -X POST https://api.telegram.org/${BOT_ID}/sendMessage -d parse_mode=html -d chat_id=${CHAT_ID=} -d text="$(hostname) - $(date) : ${message}" > /dev/null 2>&1 84 | fi 85 | if [ "$enable_discord" == "true" ]; then 86 | curl -s -X POST $DISCORD_URL -H "Content-Type: application/json" -d "{\"content\": \"${message}\"}" > /dev/null 2>&1 87 | fi 88 | fi 89 | } 90 | 91 | if [ -z $CONFIG ]; then 92 | CONFIG=~/.quicksilverd/config/config.toml; 93 | fi 94 | 95 | if [ -z $CONFIG ]; then 96 | echo "please configure config.toml in script" 97 | exit 1 98 | fi 99 | 100 | url=$(sudo sed '/^\[rpc\]/,/^\[/!d;//d' $CONFIG | grep "^laddr\b" | awk -v FS='("tcp://|")' '{print $2}') 101 | chainid=$(jq -r '.result.node_info.network' <<<$(curl -s "$url"/status)) 102 | if [ -z $url ]; then 103 | send_telegram_notification "nodemonitor exited : please configure config.toml in script correctly" 104 | echo "please configure config.toml in script correctly" 105 | exit 1 106 | fi 107 | url="http://${url}" 108 | 109 | if [ -z $LOGNAME ]; then LOGNAME="nodemonitor-${USER}.log"; fi 110 | 111 | logfile="${LOGPATH}/${LOGNAME}" 112 | touch $logfile 113 | 114 | echo "log file: ${logfile}" 115 | echo "rpc url: ${url}" 116 | echo "chain id: ${chainid}" 117 | 118 | if [ -z $VALIDATORADDRESS ]; then VALIDATORADDRESS=$(jq -r ''.result.validator_info.address'' <<<$(curl -s "$url"/status)); fi 119 | if [ -z $VALIDATORADDRESS ]; then 120 | echo "rpc appears to be down, start script again when data can be obtained" 121 | exit 1 122 | fi 123 | 124 | if [ -z $QuicksilverVALIDATORADDRESS ]; 125 | then 126 | if [ -z $NAME ]; 127 | then 128 | read -p "The name of your validator address :" NAME 129 | fi 130 | if [ -z $KEYRINGPW ]; 131 | then 132 | read -p "Please enter your keyring password :" KEYRINGPW 133 | fi 134 | QuicksilverVALIDATORADDRESS=$(echo $KEYRINGPW | quicksilverd keys show $NAME --bech val -a); 135 | 136 | fi 137 | 138 | # Checking validator RPC endpoints status 139 | consdump=$(curl -s "$url"/dump_consensus_state) 140 | validators=$(jq -r '.result.round_state.validators[]' <<<$consdump) 141 | isvalidator=$(grep -c "$VALIDATORADDRESS" <<<$validators) 142 | 143 | echo "validator address: $QuicksilverVALIDATORADDRESS" 144 | 145 | if [ "$CHECKPERSISTENTPEERS" -eq 1 ]; then 146 | persistentpeers=$(sudo sed '/^\[p2p\]/,/^\[/!d;//d' $CONFIG | grep "^persistent_peers\b" | awk -v FS='("|")' '{print $2}') 147 | persistentpeerids=$(sed 's/,//g' <<<$(sed 's/@[^ ^,]\+/ /g' <<<$persistentpeers)) 148 | totpersistentpeerids=$(wc -w <<<$persistentpeerids) 149 | npersistentpeersmatchcount=0 150 | netinfo=$(curl -s "$url"/net_info) 151 | if [ -z "$netinfo" ]; then 152 | echo "lcd appears to be down, start script again when data can be obtained" 153 | exit 1 154 | fi 155 | for id in $persistentpeerids; do 156 | npersistentpeersmatch=$(grep -c "$id" <<<$netinfo) 157 | if [ $npersistentpeersmatch -eq 0 ]; then 158 | persistentpeersmatch="$id $persistentpeersmatch" 159 | npersistentpeersmatchcount=$(expr $npersistentpeersmatchcount + 1) 160 | fi 161 | done 162 | npersistentpeersoff=$(expr $totpersistentpeerids - $npersistentpeersmatchcount) 163 | echo "$totpersistentpeerids persistent peer(s): $persistentpeerids" 164 | echo "$npersistentpeersmatchcount persistent peer(s) off: $persistentpeersmatch" 165 | fi 166 | 167 | if [ $NPRECOMMITS -eq 0 ]; then echo "precommit checks: off"; else echo "precommit checks: on"; fi 168 | if [ $CHECKPERSISTENTPEERS -eq 0 ]; then echo "persistent peer checks: off"; else echo "persistent peer checks: on"; fi 169 | echo "" 170 | 171 | status=$(curl -s "$url"/status) 172 | blockheight=$(jq -r '.result.sync_info.latest_block_height' <<<$status) 173 | blockinfo=$(curl -s "$url"/block?height="$blockheight") 174 | if [ $blockheight -gt $NPRECOMMITS ]; then 175 | if [ "$(grep -c 'precommits' <<<$blockinfo)" != "0" ]; then versionstring="precommits"; elif [ "$(grep -c 'signatures' <<<$blockinfo)" != "0" ]; then versionstring="signatures"; else 176 | echo "json parameters of this version not recognised" 177 | exit 1 178 | fi 179 | else 180 | echo "wait for $NPRECOMMITS blocks and start again..." 181 | exit 1 182 | fi 183 | 184 | nloglines=$(wc -l <$logfile) 185 | if [ $nloglines -gt $LOGSIZE ]; then sed -i "1,$(expr $nloglines - $LOGSIZE)d" $logfile; fi # the log file is trimmed for logsize 186 | 187 | date=$(date --rfc-3339=seconds) 188 | echo "$date status=scriptstarted chainid=$chainid" >>$logfile 189 | 190 | # Checking Quicksilverd process running 191 | if pgrep quicksilverd >/dev/null; then 192 | echo "Is Quicksilverd binary running: Yes"; 193 | Quicksilverd_run_status="OK" 194 | if [ $Quicksilverd_run_n == "false" ]; then #Quicksilverd process was not ok 195 | send_notification "$nmsg_Quicksilverd_run_ok" 196 | Quicksilverd_run_n="true" 197 | fi 198 | else 199 | echo "Is Quicksilverd binary running: No, please restart it;" 200 | Quicksilverd_run_status="NOK" 201 | if [ $Quicksilverd_run_n == "true" ]; then #Quicksilverd process was ok 202 | send_notification "$nmsg_Quicksilverd_run_nok" 203 | Quicksilverd_run_n="false" 204 | fi 205 | fi 206 | 207 | # Checking validator status 208 | consdump=$(curl -s "$url"/dump_consensus_state) 209 | validators=$(jq -r '.result.round_state.validators[]' <<<$consdump) 210 | isvalidator=$(grep -c "$VALIDATORADDRESS" <<<$validators) 211 | 212 | echo 213 | 214 | # testing machine/host resource 215 | free -m | awk 'NR==2{printf "Memory Usage: %s/%sMB (%.2f%%)\n", $3,$2,$3*100/$2 }' 216 | df -h | awk '$NF=="/"{printf "Disk Usage: %d/%dGB (%s)\n", $3,$2,$5}' 217 | top -bn1 | grep load | awk '{printf "CPU Load: %.2f\n", $(NF-2)}' 218 | 219 | echo 220 | # TBD Alert on resource monitoring 221 | 222 | status=$(curl -s "$url"/status) 223 | result=$(grep -c "result" <<<$status) 224 | if [ "$result" != "0" ]; then 225 | npeers=$(curl -s "$url"/net_info | jq -r '.result.n_peers') 226 | if [ -z $npeers ]; then npeers="na"; fi 227 | blockheight=$(jq -r '.result.sync_info.latest_block_height' <<<$status) 228 | blocktime=$(jq -r '.result.sync_info.latest_block_time' <<<$status) 229 | catchingup=$(jq -r '.result.sync_info.catching_up' <<<$status) 230 | if [ $catchingup == "false" ]; then 231 | catchingup="synced"; 232 | if [ $synced_n == "catchingup" ]; then #it was previously synching 233 | send_notification "$nmsg_synced" 234 | synced_n="synced" #change notification state 235 | fi 236 | elif [ $catchingup == "true" ]; then 237 | catchingup="catchingup"; 238 | if [ $synced_n == "synced" ]; then #it was previously synced 239 | send_notification $nmsg_unsynced 240 | synced_n="catchingup" #change notification state 241 | fi 242 | fi 243 | 244 | if [ "$CHECKPERSISTENTPEERS" -eq 1 ]; then 245 | npersistentpeersmatch=0 246 | netinfo=$(curl -s "$url"/net_info) 247 | for id in $persistentpeerids; do 248 | npersistentpeersmatch=$(expr $npersistentpeersmatch + $(grep -c "$id" <<<$netinfo)) 249 | done 250 | npersistentpeersoff=$(expr $totpersistentpeerids - $npersistentpeersmatch) 251 | else 252 | npersistentpeersoff=0 253 | fi 254 | if [ "$VALIDATORMETRICS" == "on" ]; then 255 | #isvalidator=$(grep -c "$VALIDATORADDRESS" <<<$(curl -s "$url"/block?height="$blockheight")) 256 | consdump=$(curl -s "$url"/dump_consensus_state) 257 | validators=$(jq -r '.result.round_state.validators[]' <<<$consdump) 258 | isvalidator=$(grep -c "$VALIDATORADDRESS" <<<$validators) 259 | pcttotcommits=$(jq -r '.result.round_state.last_commit.votes_bit_array' <<<$consdump) 260 | pcttotcommits=$(grep -Po "=\s+\K[^ ^]+" <<<$pcttotcommits) 261 | if [ "$isvalidator" != "0" ]; then 262 | isvalidator="yes" 263 | precommitcount=0 264 | for ((i = $(expr $blockheight - $NPRECOMMITS + 1); i <= $blockheight; i++)); do 265 | validatoraddresses=$(curl -s "$url"/block?height="$i") 266 | validatoraddresses=$(jq ".result.block.last_commit.${versionstring}[].validator_address" <<<$validatoraddresses) 267 | validatorprecommit=$(grep -c "$VALIDATORADDRESS" <<<$validatoraddresses) 268 | precommitcount=$(expr $precommitcount + $validatorprecommit) 269 | done 270 | if [ $NPRECOMMITS -eq 0 ]; then pctprecommits="1.0"; else pctprecommits=$(echo "scale=2 ; $precommitcount / $NPRECOMMITS" | bc); fi 271 | 272 | validatorinfo="isvalidator=$isvalidator pctprecommits=$pctprecommits pcttotcommits=$pcttotcommits" 273 | else 274 | isvalidator="no" 275 | validatorinfo="isvalidator=$isvalidator" 276 | fi 277 | fi 278 | 279 | # test if last block saved and new block height are the same 280 | if [ $lastblockheight -eq $blockheight ]; then #block are the same 281 | node_stuck_status="YES" 282 | if [ $node_stuck_n == "false" ]; then # node_stuck notification state was false 283 | node_stuck_n="true" 284 | send_telegram_notification "$nmsg_nodestuck" 285 | fi 286 | else #new node block is different 287 | node_stuck_status="NO" 288 | if [ $node_stuck_n == "true" ]; then # mean it was previously stuck 289 | node_stuck_n="false" 290 | send_telegram_notification "$nmsg_node_no_longer_stuck" 291 | fi 292 | lastblockheight=$blockheight 293 | fi 294 | 295 | #finalize the log output 296 | status="$catchingup" 297 | now=$(date --rfc-3339=seconds) 298 | blockheightfromnow=$(expr $(date +%s -d "$now") - $(date +%s -d $blocktime)) 299 | variables="status=$status blockheight=$blockheight node_stuck=$node_stuck_status tfromnow=$blockheightfromnow npeers=$npeers npersistentpeersoff=$npersistentpeersoff $validatorinfo" 300 | else 301 | status="error" 302 | now=$(date --rfc-3339=seconds) 303 | variables="status=$status" 304 | fi 305 | 306 | logentry="[$now] $variables" 307 | echo "$logentry" >>$logfile 308 | 309 | nloglines=$(wc -l <$logfile) 310 | if [ $nloglines -gt $LOGSIZE ]; then 311 | case $LOGROTATION in 312 | 1) 313 | mv $logfile "${logfile}.1" 314 | touch $logfile 315 | ;; 316 | 2) 317 | echo "$(cat $logfile)" >>${logfile}.1 318 | >$logfile 319 | ;; 320 | 3) 321 | sed -i '1d' $logfile 322 | if [ -f ${logfile}.1 ]; then rm ${logfile}.1; fi # no log rotation with option (3) 323 | ;; 324 | *) ;; 325 | 326 | esac 327 | fi 328 | 329 | case $status in 330 | synced) 331 | color=$colorI 332 | ;; 333 | error) 334 | color=$colorE 335 | ;; 336 | catchingup) 337 | color=$colorW 338 | ;; 339 | *) 340 | color=$noColor 341 | ;; 342 | esac 343 | 344 | pctprecommits=$(awk '{printf "%f", $0}' <<<"$pctprecommits") 345 | if [[ "$isvalidator" == "yes" ]] && [[ "$pctprecommits" < "1.0" ]]; then color=$colorW; fi 346 | if [[ "$isvalidator" == "no" ]] && [[ "$VALIDATORMETRICS" == "on" ]]; then color=$colorW; fi 347 | 348 | logentry="$(sed 's/[^ ]*[\=]/'\\${color}'&'\\${noColor}'/g' <<<$logentry)" 349 | echo -e $logentry 350 | echo -e "${colorD}sleep ${SLEEP1}${noColor}" 351 | echo 352 | 353 | variables_="" 354 | for var in $variables; do 355 | var_=$(grep -Po '^[0-9a-zA-Z_-]*' <<<$var) 356 | var_="$var_=\"\"" 357 | variables_="$var_; $variables_" 358 | done 359 | #echo $variables_ 360 | eval $variables_ 361 | 362 | sleep $SLEEP1 363 | done 364 | --------------------------------------------------------------------------------