├── CODE_OF_CONDUCT.md ├── CONTRIBUTIONS.md ├── LICENSE ├── README.md ├── aks.md ├── arch.png ├── bhive.png ├── build └── lib │ └── krs │ ├── __init__.py │ ├── krs.py │ ├── main.py │ └── utils │ ├── __init__.py │ ├── cluster_scanner.py │ ├── constants.py │ ├── fetch_tools_krs.py │ ├── functional.py │ └── llm_client.py ├── demo.gif ├── dist └── krs-0.1.0-py3.10.egg ├── dokc ├── dokc.md ├── eks.md ├── gke.md ├── kind.md ├── krs.egg-info ├── PKG-INFO ├── SOURCES.txt ├── dependency_links.txt ├── entry_points.txt ├── requires.txt └── top_level.txt ├── krs ├── __init__.py ├── krs.py ├── main.py ├── requirements.txt └── utils │ ├── __init__.py │ ├── cluster_scanner.py │ ├── constants.py │ ├── fetch_tools_krs.py │ ├── functional.py │ └── llm_client.py ├── mkc.md ├── samples ├── install-tools.sh └── uninstall-tools.sh └── setup.py /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | We as members, contributors, and leaders pledge to make participation in our 6 | community a harassment-free experience for everyone, regardless of age, body 7 | size, visible or invisible disability, ethnicity, sex characteristics, gender 8 | identity and expression, level of experience, education, socio-economic status, 9 | nationality, personal appearance, race, religion, or sexual identity 10 | and orientation. 11 | 12 | We pledge to act and interact in ways that contribute to an open, welcoming, 13 | diverse, inclusive, and healthy community. 14 | 15 | ## Our Standards 16 | 17 | Examples of behavior that contributes to a positive environment for our 18 | community include: 19 | 20 | * Demonstrating empathy and kindness toward other people 21 | * Being respectful of differing opinions, viewpoints, and experiences 22 | * Giving and gracefully accepting constructive feedback 23 | * Accepting responsibility and apologizing to those affected by our mistakes, 24 | and learning from the experience 25 | * Focusing on what is best not just for us as individuals, but for the 26 | overall community 27 | 28 | Examples of unacceptable behavior include: 29 | 30 | * The use of sexualized language or imagery, and sexual attention or 31 | advances of any kind 32 | * Trolling, insulting or derogatory comments, and personal or political attacks 33 | * Public or private harassment 34 | * Publishing others' private information, such as a physical or email 35 | address, without their explicit permission 36 | * Other conduct which could reasonably be considered inappropriate in a 37 | professional setting 38 | 39 | ## Enforcement Responsibilities 40 | 41 | Community leaders are responsible for clarifying and enforcing our standards of 42 | acceptable behavior and will take appropriate and fair corrective action in 43 | response to any behavior that they deem inappropriate, threatening, offensive, 44 | or harmful. 45 | 46 | Community leaders have the right and responsibility to remove, edit, or reject 47 | comments, commits, code, wiki edits, issues, and other contributions that are 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation 49 | decisions when appropriate. 50 | 51 | ## Scope 52 | 53 | This Code of Conduct applies within all community spaces, and also applies when 54 | an individual is officially representing the community in public spaces. 55 | Examples of representing our community include using an official e-mail address, 56 | posting via an official social media account, or acting as an appointed 57 | representative at an online or offline event. 58 | 59 | ## Enforcement 60 | 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 62 | reported to the community leaders responsible for enforcement at 63 | . 64 | All complaints will be reviewed and investigated promptly and fairly. 65 | 66 | All community leaders are obligated to respect the privacy and security of the 67 | reporter of any incident. 68 | 69 | ## Enforcement Guidelines 70 | 71 | Community leaders will follow these Community Impact Guidelines in determining 72 | the consequences for any action they deem in violation of this Code of Conduct: 73 | 74 | ### 1. Correction 75 | 76 | **Community Impact**: Use of inappropriate language or other behavior deemed 77 | unprofessional or unwelcome in the community. 78 | 79 | **Consequence**: A private, written warning from community leaders, providing 80 | clarity around the nature of the violation and an explanation of why the 81 | behavior was inappropriate. A public apology may be requested. 82 | 83 | ### 2. Warning 84 | 85 | **Community Impact**: A violation through a single incident or series 86 | of actions. 87 | 88 | **Consequence**: A warning with consequences for continued behavior. No 89 | interaction with the people involved, including unsolicited interaction with 90 | those enforcing the Code of Conduct, for a specified period of time. This 91 | includes avoiding interactions in community spaces as well as external channels 92 | like social media. Violating these terms may lead to a temporary or 93 | permanent ban. 94 | 95 | ### 3. Temporary Ban 96 | 97 | **Community Impact**: A serious violation of community standards, including 98 | sustained inappropriate behavior. 99 | 100 | **Consequence**: A temporary ban from any sort of interaction or public 101 | communication with the community for a specified period of time. No public or 102 | private interaction with the people involved, including unsolicited interaction 103 | with those enforcing the Code of Conduct, is allowed during this period. 104 | Violating these terms may lead to a permanent ban. 105 | 106 | ### 4. Permanent Ban 107 | 108 | **Community Impact**: Demonstrating a pattern of violation of community 109 | standards, including sustained inappropriate behavior, harassment of an 110 | individual, or aggression toward or disparagement of classes of individuals. 111 | 112 | **Consequence**: A permanent ban from any sort of public interaction within 113 | the community. 114 | 115 | ## Attribution 116 | 117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], 118 | version 2.0, available at 119 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. 120 | 121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct 122 | enforcement ladder](https://github.com/mozilla/diversity). 123 | 124 | [homepage]: https://www.contributor-covenant.org 125 | 126 | For answers to common questions about this code of conduct, see the FAQ at 127 | https://www.contributor-covenant.org/faq. Translations are available at 128 | https://www.contributor-covenant.org/translations. 129 | -------------------------------------------------------------------------------- /CONTRIBUTIONS.md: -------------------------------------------------------------------------------- 1 | # Contribution Guidelines 2 | 3 | Thank you for your interest in contributing to the Kubernetes Resource Scanner (KRS) tool! We welcome your contributions and are excited to help make this project better. 4 | 5 | ## Code of Conduct 6 | 7 | Before contributing, please review and adhere to the [Code of Conduct](CODE_OF_CONDUCT.md). We expect all contributors to follow these guidelines to ensure a positive and inclusive environment. 8 | 9 | ## Release Management and Pull Request Submission Guidelines 10 | 11 | ### Repository Branch Structure 12 | 13 | Our project employs a three-branch workflow to manage the development and release of new features and fixes: 14 | 15 | - **main**: Stable release branch that contains production-ready code. 16 | - **pre-release**: Staging branch for final testing before merging into main. 17 | - **release-v0.x.x**: Active development branch where all new changes, bug fixes, and features are made and tested. 18 | 19 | ### Contributing to the Project 20 | 21 | To contribute to our project, follow these steps to ensure your changes are properly integrated: 22 | 23 | **Selecting the Correct Branch** 24 | 25 | - Always base your work on the latest **release-vx.x.x** branch. This branch will be named according to the version, for example, release-v0.1.0. 26 | - Ensure you check the branch name in the repository for the most current version branch. 27 | 28 | **Working on Issues** 29 | 30 | - Before you start working on an issue, comment on that issue stating that you are taking it on. This helps prevent multiple contributors from working on the same issue simultaneously. 31 | - Include the issue number in your branch name for traceability, e.g., 123-fix-login-bug. 32 | 33 | ## **Pull Request (PR) Process** 34 | 35 | To maintain code quality and orderly management, all contributors must follow this PR process: 36 | 37 | 38 | ### **Step 1: Sync Your Fork** 39 | 40 | Before starting your work, sync your fork with the upstream repository to ensure you have the latest changes from the release-v0.x.x branch: 41 | ``` 42 | git checkout release-vx.x.x 43 | git pull origin release-vx.x.x 44 | ``` 45 | 46 | 47 | ### **Step 2: Create a New Branch** 48 | 49 | Create a new branch from the **release-vx.x.x** branch for your work: 50 | ``` 51 | git checkout -b your-branch-name 52 | ``` 53 | 54 | 55 | ### **Step 3: Make Changes and Commit** 56 | 57 | Make your changes locally and commit them with clear, concise commit messages. Your commits should reference the issue number: 58 | ``` 59 | git commit -m "Fix issue #123: resolve login bug" 60 | ``` 61 | 62 | 63 | ### **Step 4: Push Changes** 64 | 65 | Push your branch and changes to your fork: 66 | 67 | ``` 68 | git push -u origin your-branch-name 69 | ``` 70 | 71 | 72 | ### **Step 5: Open a Pull Request** 73 | 74 | - Go to the original repository on GitHub and open a pull request from your branch to the **release-vx.x.x** branch. 75 | - Clearly describe the changes you are proposing in the PR description. Link the PR to any relevant issues. 76 | 77 | 78 | ### **Step 6: PR Review** 79 | 80 | - All pull requests must undergo review by at least two peers before merging. 81 | - Address any feedback and make required updates to your PR; this may involve additional commits. 82 | 83 | 84 | ### **Step 7: Final Merging** 85 | 86 | - Once your PR is approved by the reviewers, one of the maintainers will merge it into the release-v0.x.x branch. 87 | - The changes will eventually be merged into pre-release and then main as part of our release process. 88 | 89 | 90 | **Notes on Contribution** 91 | 92 | - Please make sure all tests pass before submitting a PR. 93 | - Adhere to the coding standards and guidelines provided in our repository to ensure consistency and quality. 94 | 95 | ## Additional Resources 96 | 97 | - [GitHub Guides: Contributing to Open Source](https://guides.github.com/activities/contributing-to-open-source/) 98 | - [How to Contribute to an Open Source Project](https://opensource.guide/how-to-contribute/) 99 | - [The Art of Readable Code](https://www.goodreads.com/book/show/86770.The_Art_of_Readable_Code) 100 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/kubetoolsca/krs) 2 | ![GitHub release (latest by date)](https://img.shields.io/github/v/release/kubetoolsca/krs) 3 | ![stars](https://img.shields.io/github/stars/kubetoolsca/krs) 4 | ![forks](https://img.shields.io/github/forks/kubetoolsca/krs) 5 | ![issues](https://img.shields.io/github/issues/kubetoolsca/krs) 6 | ![Twitter](https://img.shields.io/twitter/follow/kubetools?style=social) 7 | 8 | 9 | 10 | # Kubetools Recommender System(a.k.a Krs) 11 | 12 | A GenAI-powered Kubetools Recommender system for your Kubernetes cluster. 13 | 14 | krs_logo 15 | 16 | 17 | 18 | 19 | 20 | 21 | # Table of Contents 22 | 23 | - [Kubetools Recommender System](#kubetools-recommender-system) 24 | - [Getting Started](#getting-started) 25 | - [Clone the repository](#clone-the-repository) 26 | - [Install the Krs Tool](#install-the-krs-tool) 27 | - [Krs CLI](#krs-cli) 28 | - [Initialise and load the scanner](#initialise-and-load-the-scanner) 29 | - [Scan your cluster](#scan-your-cluster) 30 | - [Lists all the namespaces](#lists-all-the-namespaces) 31 | - [Installing sample Kubernetes Tools](#installing-sample-kubernetes-tools) 32 | - [Use scanner](#use-scanner) 33 | - [Kubetools Recommender System](#kubetools-recommender-system-1) 34 | - [Krs health](#krs-health) 35 | - [Using OpenAI](#using-openai) 36 | - [Using Hugging Face](#using-hugging-face) 37 | - [FAQs](#faqs) 38 | 39 | 40 | 41 | The main functionalities of the project include: 42 | 43 | 44 | - **Scanning the Kubernetes cluster**: The tool scans the cluster to identify the deployed pods, services, and deployments. It retrieves information about the tools used in the cluster and their rankings. 45 | - **Detecting tools from the repository**: The tool detects the tools used in the cluster by analyzing the names of the pods and deployments. 46 | - **Extracting rankings**: The tool extracts the rankings of the detected tools based on predefined criteria. It categorizes the tools into different categories and provides the rankings for each category. 47 | - **Generating recommendations**: The tool generates recommendations for Kubernetes tools based on the detected tools and their rankings. It suggests the best tools for each category and compares them with the tools already used in the cluster. 48 | - **Health check**: The tool provides a health check for a selected pod in the cluster. It extracts logs and events from the pod and analyzes them using a language model (LLM) to identify potential issues and provide recommendations for resolving them. 49 | - **Exporting pod information**: The tool exports the information about the pods, services, and deployments in the cluster to a JSON file. 50 | - **Cleaning up**: The tool provides an option to clean up the project's data directory by deleting all files and directories within it. 51 | - **Model**: Supports OpenAI and Hugging Face models 52 | 53 | ## How does it work? 54 | 55 | image 56 | 57 | 58 | The project uses various Python libraries, such as typer, requests, kubernetes, tabulate, and pickle, to accomplish its functionalities. 59 | It also utilizes a language model (LLM) for the health check feature. 60 | The project's directory structure and package management are managed using requirements.txt. 61 | The project's data, such as tool rankings, CNCF status, and Kubernetes cluster information, are stored in JSON files and pickled files. 62 | 63 | image 64 | 65 | 66 | 67 | 68 | 69 | ## Prerequisites: 70 | 71 | 1. A Kubernetes cluster up and running locally or in the Cloud. 72 | 2. Python 3.6+ 73 | 74 | Note: If the kube config path for your cluster is not the default *(~/.kube/config)*, ensure you are providing it during `krs init` 75 | 76 | ## Tested Environment 77 | 78 | - [Docker Desktop(Mac, Linux and Windows)](https://github.com/kubetoolsca/krs?tab=readme-ov-file#getting-started) 79 | - [Minikube](https://github.com/kubetoolsca/krs/blob/main/mkc.md) 80 | - [Google Kubernetes Engine](https://github.com/kubetoolsca/krs/blob/main/gke.md) 81 | - [Amazon Elastic Kubernetes Service](eks.md) 82 | - [Azure Kubernetes Service](aks.md) 83 | - [DigitalOcean Kubernetes Cluster](dokc.md) 84 | 85 | 86 | ## Getting Started 87 | 88 | 89 | ## Clone the repository 90 | 91 | ``` 92 | git clone https://github.com/kubetoolsca/krs.git 93 | ``` 94 | 95 | ### Install the Krs Tool 96 | 97 | Change directory to /krs and run the following command to install krs locally on your system: 98 | 99 | ``` 100 | pip install . 101 | ```` 102 | 103 | 104 | ## Krs CLI 105 | 106 | ``` 107 | 108 | krs --help 109 | 110 | Usage: krs [OPTIONS] COMMAND [ARGS]... 111 | 112 | krs: A command line interface to scan your Kubernetes Cluster, detect errors, provide resolutions using LLMs and recommend latest tools for your cluster 113 | 114 | ╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ 115 | │ --install-completion Install completion for the current shell. │ 116 | │ --show-completion Show completion for the current shell, to copy it or customize the installation. │ 117 | │ --help Show this message and exit. │ 118 | ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ 119 | ╭─ Commands ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ 120 | │ exit Ends krs services safely and deletes all state files from system. Removes all cached data. │ 121 | │ export Exports pod info with logs and events. │ 122 | │ health Starts an interactive terminal using an LLM of your choice to detect and fix issues with your cluster │ 123 | │ init Initializes the services and loads the scanner. │ 124 | │ namespaces Lists all the namespaces. │ 125 | │ pods Lists all the pods with namespaces, or lists pods under a specified namespace. │ 126 | │ recommend Generates a table of recommended tools from our ranking database and their CNCF project status. │ 127 | │ scan Scans the cluster and extracts a list of tools that are currently used. │ 128 | ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ 129 | ``` 130 | 131 | ## Initialise and load the scanner 132 | 133 | Run the following command to initialize the services and loads the scanner. 134 | 135 | 136 | ``` 137 | krs init 138 | ``` 139 | 140 | ## Scan your cluster 141 | 142 | Run the following command to scan the cluster and extract a list of tools that are currently used. 143 | 144 | ``` 145 | krs scan 146 | ``` 147 | 148 | You will see the following results: 149 | 150 | ``` 151 | 152 | Scanning your cluster... 153 | 154 | Cluster scanned successfully... 155 | 156 | Extracted tools used in cluster... 157 | 158 | 159 | The cluster is using the following tools: 160 | 161 | +-------------+--------+------------+---------------+ 162 | | Tool Name | Rank | Category | CNCF Status | 163 | +=============+========+============+===============+ 164 | +-------------+--------+------------+---------------+ 165 | ``` 166 | 167 | 168 | ## Lists all the namespaces 169 | 170 | ``` 171 | krs namespaces 172 | Namespaces in your cluster are: 173 | 174 | 1. default 175 | 2. kube-node-lease 176 | 3. kube-public 177 | 4. kube-system 178 | ``` 179 | 180 | ## Installing sample Kubernetes Tools 181 | 182 | Assuming that you already have a bunch of Kubernetes tools running in your infrastructure. 183 | If not, you can leverage [samples/install-tools.sh](samples/install-tools.sh) script to install these sample tools. 184 | 185 | ``` 186 | cd samples 187 | sh install-tools.sh 188 | ``` 189 | 190 | ## Use scanner 191 | 192 | ``` 193 | krs scan 194 | 195 | Scanning your cluster... 196 | 197 | Cluster scanned successfully... 198 | 199 | Extracted tools used in cluster... 200 | 201 | 202 | The cluster is using the following tools: 203 | 204 | +-------------+--------+----------------------+---------------+ 205 | | Tool Name | Rank | Category | CNCF Status | 206 | +=============+========+======================+===============+ 207 | | kubeshark | 4 | Alert and Monitoring | unlisted | 208 | +-------------+--------+----------------------+---------------+ 209 | | portainer | 39 | Cluster Management | listed | 210 | +-------------+--------+----------------------+---------------+ 211 | ``` 212 | 213 | ## Kubetools Recommender System 214 | 215 | Generates a table of recommended tools from our ranking database and their CNCF project status. 216 | 217 | 218 | ``` 219 | krs recommend 220 | 221 | Our recommended tools for this deployment are: 222 | 223 | +----------------------+------------------+-------------+---------------+ 224 | | Category | Recommendation | Tool Name | CNCF Status | 225 | +======================+==================+=============+===============+ 226 | | Alert and Monitoring | Recommended tool | grafana | listed | 227 | +----------------------+------------------+-------------+---------------+ 228 | | Cluster Management | Recommended tool | rancher | unlisted | 229 | +----------------------+------------------+-------------+---------------+ 230 | ``` 231 | 232 | 233 | ## Krs health 234 | 235 | 236 | 237 | Assuming that there is a Nginx Pod under the namespace ns1 238 | 239 | ``` 240 | krs pods --namespace ns1 241 | 242 | Pods in namespace 'ns1': 243 | 244 | 1. nginx-pod 245 | ``` 246 | 247 | ``` 248 | krs health 249 | 250 | Starting interactive terminal... 251 | 252 | 253 | Choose the model provider for healthcheck: 254 | 255 | [1] OpenAI 256 | [2] Huggingface 257 | 258 | >> 259 | ``` 260 | 261 | The user is prompted to choose a model provider for the health check. 262 | The options provided are "OpenAI" and "Huggingface". The selected option determines which LLM model will be used for the health check. 263 | 264 | Let's say you choose the option "1", then it will install the necessary libraries. 265 | 266 | 267 | ``` 268 | Enter your OpenAI API key: sk-3iXXXXXTpTyyOq2mR 269 | 270 | Enter the OpenAI model name: gpt-3.5-turbo 271 | API key and model are valid. 272 | 273 | Namespaces in the cluster: 274 | 275 | 1. default 276 | 2. kube-node-lease 277 | 3. kube-public 278 | 4. kube-system 279 | 5. ns1 280 | 281 | Which namespace do you want to check the health for? Select a namespace by entering its number: >> 5 282 | 283 | Pods in the namespace ns1: 284 | 285 | 1. nginx-pod 286 | 287 | Which pod from ns1 do you want to check the health for? Select a pod by entering its number: >> 288 | Checking status of the pod... 289 | 290 | Extracting logs and events from the pod... 291 | 292 | Logs and events from the pod extracted successfully! 293 | 294 | 295 | Interactive session started. Type 'end chat' to exit from the session! 296 | 297 | >> The provided log entries are empty, as there is nothing between the curly braces {}. Therefore, everything looks good and there are no warnings or errors to report. 298 | ``` 299 | 300 | Let us pick up an example of Pod that throws an error: 301 | 302 | ``` 303 | krs health 304 | 305 | Starting interactive terminal... 306 | 307 | 308 | Do you want to continue fixing the previously selected pod ? (y/n): >> n 309 | 310 | Loading LLM State.. 311 | 312 | Model: gpt-3.5-turbo 313 | 314 | Namespaces in the cluster: 315 | 316 | 1. default 317 | 2. kube-node-lease 318 | 3. kube-public 319 | 4. kube-system 320 | 5. portainer 321 | 322 | Which namespace do you want to check the health for? Select a namespace by entering its number: >> 4 323 | 324 | Pods in the namespace kube-system: 325 | 326 | 1. coredns-76f75df574-mdk6w 327 | 2. coredns-76f75df574-vg6z2 328 | 3. etcd-docker-desktop 329 | 4. kube-apiserver-docker-desktop 330 | 5. kube-controller-manager-docker-desktop 331 | 6. kube-proxy-p5hw4 332 | 7. kube-scheduler-docker-desktop 333 | 8. storage-provisioner 334 | 9. vpnkit-controller 335 | 336 | Which pod from kube-system do you want to check the health for? Select a pod by entering its number: >> 4 337 | 338 | Checking status of the pod... 339 | 340 | Extracting logs and events from the pod... 341 | 342 | Logs and events from the pod extracted successfully! 343 | 344 | 345 | Interactive session started. Type 'end chat' to exit from the session! 346 | 347 | >> Warning/Error 1: 348 | "Unable to authenticate the request" with err="[invalid bearer token, service account token has expired]" 349 | This indicates that there was an issue with authenticating the request due to an invalid bearer token and an expired service account token. To resolve this issue, the bearer token needs to be updated or regenerated, and the service account token needs to be renewed. 350 | 351 | Warning/Error 2: 352 | "Failed to update lease" with err="StorageError: invalid object, Code: 4, Key: /registry/leases/kube-system/apiserver-457xdkumpkdqmkwikdt6khrbya" 353 | This error indicates a storage error while trying to update a lease, possibly due to an invalid object or a conflict in UIDs. To resolve this issue, the lease object needs to be checked and possibly recreated with the correct UID. 354 | 355 | Warning/Error 3: 356 | "Error removing old endpoints from kubernetes service: no API server IP addresses were listed in storage, refusing to erase all endpoints for the kubernetes Service" 357 | This warning indicates that there were no API server IP addresses listed in storage, causing the removal of old endpoints from the Kubernetes service to fail. To resolve this issue, the API server IP addresses need to be correctly listed in storage. 358 | 359 | Warning/Error 4: 360 | "Skipping API apps/v1beta1 because it has no resources." 361 | This warning indicates that the API version 'apps/v1beta1' is being skipped because it has no associated resources. This might be an informational warning and may not necessarily require immediate action, depending on the use case. 362 | 363 | Warning/Error 5: 364 | "watch chan error: etcdserver: mvcc: required revision has been compacted" 365 | This error indicates a watch channel error related to the revision being compacted in etcdserver. To resolve this issue, the etcd database revision or configuration may need to be checked and potentially adjusted. 366 | 367 | Overall, it is important to investigate and address each warning and error to ensure the stability and reliability of the Kubernetes cluster. If there are no concerns in the log entries provided, the message 'Everything looks good!' can be returned. 368 | 369 | >> can you help me in fixing this error message 370 | >> To help you fix the error messages mentioned in the log entries, here are some steps you can take: 371 | 372 | 1. For the "Unable to authenticate the request" error: 373 | - Generate a new bearer token and update it in the relevant configuration files. 374 | - Renew the service account token to prevent expiration issues. 375 | 376 | 2. For the "Failed to update lease" error: 377 | - Investigate the storage error and check for any invalid objects. 378 | - Recreate the lease object with the correct UID if needed. 379 | 380 | 3. For the "Error removing old endpoints from Kubernetes service" error: 381 | - Ensure that the API server IP addresses are correctly listed in the storage. 382 | - Verify the configuration of the Kubernetes service and update it if necessary. 383 | 384 | 4. For the "watch chan error: etcdserver: mvcc: required revision has been compacted" error: 385 | - Check the etcd server configuration and ensure that the revision is maintained correctly. 386 | - Monitor the etcd server for any potential issues causing revision compaction. 387 | 388 | After performing these steps, restart relevant services or components as needed to apply the changes. Monitor the Kubernetes cluster for any further errors and ensure that the issues have been resolved successfully. 389 | 390 | Feel free to provide more specific details or additional logs if you need further assistance with resolving the error messages. 391 | ``` 392 | 393 | 394 | ## Using Hugging Face 395 | 396 | ``` 397 | krs health 398 | 399 | Starting interactive terminal... 400 | 401 | 402 | Choose the model provider for healthcheck: 403 | 404 | [1] OpenAI 405 | [2] Huggingface 406 | 407 | >> 2 408 | 409 | Installing necessary libraries.......... 410 | 411 | transformers is already installed. 412 | 413 | torch is already installed. 414 | /opt/homebrew/lib/python3.11/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. 415 | torch.utils._pytree._register_pytree_node( 416 | 417 | Enter the Huggingface model name: codellama/CodeLlama-13b-hf 418 | tokenizer_config.json: 100%|█████████████████████████████████████████████| 749/749 [00:00<00:00, 768kB/s] 419 | tokenizer.model: 100%|████████████████████████████████████████████████| 500k/500k [00:00<00:00, 1.94MB/s] 420 | tokenizer.json: 100%|███████████████████████████████████████████████| 1.84M/1.84M [00:01<00:00, 1.78MB/s] 421 | special_tokens_map.json: 100%|██████████████████████████████████████████| 411/411 [00:00<00:00, 1.49MB/s] 422 | config.json: 100%|██████████████████████████████████████████████████████| 589/589 [00:00<00:00, 1.09MB/s] 423 | model.safetensors.index.json: 100%|█████████████████████████████████| 31.4k/31.4k [00:00<00:00, 13.9MB/s] 424 | ... 425 | ``` 426 | 427 | ## FAQs 428 | 429 |
430 | How safe is Krs for Prod environment 431 | 432 | The tool is designed to be a non-invasive tool that provides insights into the current state of a Kubernetes cluster without making any changes to the cluster itself. It does not store any sensitive data or credentials, and it only retrieves information from the cluster and external data sources. 433 |
434 | 435 | 436 | 437 | ## Community 438 | Find us on [Slack](https://www.launchpass.com/kubetoolsio) 439 | 440 | 441 | 442 | 443 | 444 | 445 | 446 | 447 | 448 | 449 | 450 | -------------------------------------------------------------------------------- /aks.md: -------------------------------------------------------------------------------- 1 | # Setting up Krs for an EKS cluster on Microsoft Azure 2 | 3 | Enhance your Kubernetes cluster management on Azure with KRS, a powerful tool designed to provide recommendations and health checks using AI. KRS scans your cluster to identify deployed pods, services, and deployments, analyzes the tools used, and provides rankings based on their popularity. With features like generating recommendations, performing health checks, and exporting pod information, KRS supports both OpenAI and Hugging Face models to ensure your Kubernetes environment runs efficiently. This guide will walk you through setting up KRS for an EKS cluster on Azure, from installation to advanced usage. 4 | 5 | ## Prerequisites: 6 | 7 | - An Azure account 8 | - Install Azure CLI on your laptop 9 | 10 | ## Installation of KRS:** 11 | 12 | ## 1. Clone the repository using the command: 13 | 14 | ``` 15 | git clone https://github.com/kubetoolsca/krs.git 16 | ``` 17 | 18 | ## 2. Install the Krs Tool: 19 | 20 | Change directory to /krs and run the following command to install krs locally on your system: 21 | 22 | ``` 23 | pip install . 24 | ``` 25 | 26 | ## 3. Check if the tool has been successfully installed using: 27 | 28 | ``` 29 | krs --help 30 | ``` 31 | 32 | Once you get a list of commands you can move onto the next part. 33 | 34 | ## Create an EKS cluster on your Azure account 35 | 36 | ## 1. Create an EKS Cluster: 37 | 38 | To create an EKS account, you can log into your account and search for Azure Kubernetes Service. 39 | 40 | Screenshot 2024-07-01 at 10 17 54 PM 41 | 42 | 43 | Once you click create, you can name your cluster, add a node pool (I used the default agent pool but you can create your own), and leave everything else to its default state. This will help you create a cluster. 44 | 45 | ## 2. Install Azure CLI: 46 | 47 | 48 | ``` 49 | brew update && brew install azure-cli 50 | ``` 51 | 52 | ## 3. Log into your Azure account: 53 | 54 | Once the CLI is installed, log into your Azure account using the command: 55 | 56 | ``` 57 | az login 58 | ``` 59 | 60 | ## 4. Connect to Your Cluster: 61 | 62 | Retrieve the connection command from your cluster details on the Azure portal and execute it to connect to your cluster. 63 | 64 | Screenshot 2024-07-01 at 10 18 25 PM 65 | Screenshot 2024-07-01 at 10 18 51 PM 66 | 67 | 68 | ## Using Krs 69 | 70 | ## 1. Initialise Krs: 71 | 72 | ``` 73 | % krs init 74 | ``` 75 | 76 | ## 2. Scan the Clusters: 77 | 78 | ``` 79 | % krs scan 80 | Scanning your cluster... 81 | Cluster scanned successfully... 82 | Extracted tools used in cluster... 83 | The cluster is using the following tools: 84 | +-------------+--------+-----------------------------+---------------+ 85 | | Tool Name | Rank | Category | CNCF Status | 86 | +=============+========+=============================+===============+ 87 | | autoscaler | 5 | Cluster with Core CLI tools | unlisted | 88 | +-------------+--------+-----------------------------+---------------+ 89 | ``` 90 | 91 | ## 3. Get Recommended Tools: 92 | 93 | ``` 94 | % krs recommend` 95 | Our recommended tools for this deployment are: 96 | +-----------------------------+------------------+-------------+---------------+ 97 | | Category | Recommendation | Tool Name | CNCF Status | 98 | +=============================+==================+=============+===============+ 99 | | Cluster with Core CLI tools | Recommended tool | k9s | unlisted | 100 | +-----------------------------+------------------+-------------+---------------+ 101 | ``` 102 | 103 | ## 4. Installing a few tools: 104 | 105 | ``` 106 | % brew install helm` 107 | `% helm install kubeview kubeview` 108 | ``` 109 | helm install kubeview kubeview 110 | NAME: kubeview 111 | LAST DEPLOYED: Sat Jun 29 21:44:17 2024 112 | NAMESPACE: default 113 | STATUS: deployed 114 | REVISION: 1 115 | NOTES: 116 | ===================================== 117 | ==== KubeView has been deployed! ==== 118 | ===================================== 119 | To get the external IP of your application, run the following: 120 | export SERVICE_IP=$(kubectl get svc --namespace default kubeview -o jsonpath='{.status.loadBalancer.ingress[0].ip}') 121 | echo http://$SERVICE_IP 122 | NOTE: It may take a few minutes for the LoadBalancer IP to be available. 123 | You can watch the status of by running 'kubectl get --namespace default svc -w kubeview' 124 | ``` 125 | 126 | ## 5. Exports pod info with logs and events: 127 | 128 | ``` 129 | % krs export` 130 | Pod info with logs and events exported. Json file saved to current directory! 131 | 132 | ``` 133 | meetsimarkaur@meetsimars-MBP krs % ls 134 | CODE_OF_CONDUCT.md arch.png gke.md kubeview 135 | CONTRIBUTIONS.md bhive.png krs samples 136 | LICENSE build krs.egg-info setup.py 137 | README.md exported_pod_info.json kubetail 138 | ``` 139 | 140 | ## 6. Detecting and Fixing Issues with my cluster: 141 | 142 | ``` 143 | % krs health` 144 | Starting interactive terminal... 145 | Choose the model provider for healthcheck: 146 | [1] OpenAI 147 | [2] Huggingface 148 | >> 1 149 | Installing necessary libraries......... 150 | openai is already installed. 151 | Enter your OpenAI API key: sk-proj-xxxxxxx 152 | Enter the OpenAI model name: gpt-3.5-turbo 153 | API key and model are valid. 154 | Namespaces in the cluster: 155 | 1. default 156 | 2. kube-node-lease 157 | 3. kube-public 158 | 4. kube-system 159 | Which namespace do you want to check the health for? Select a namespace by entering its number: 160 | >> 1 161 | Pods in the namespace default: 162 | 1. kubeview-64fd5d8b8c-khv8v 163 | Which pod from default do you want to check the health for? Select a pod by entering its number: 164 | >> 1 165 | Checking status of the pod... 166 | Extracting logs and events from the pod... 167 | Logs and events from the pod extracted successfully! 168 | Interactive session started. Type 'end chat' to exit from the session! 169 | >> Everything looks good! 170 | Since the log entries provided are empty, there are no warnings or errors to analyze or address. If there were actual log entries to review, common steps to resolve potential issues in a Kubernetes environment could include: 171 | 1. Checking the configuration files for any errors or inconsistencies. 172 | 2. Verifying that all necessary resources (e.g. pods, services, deployments) are running as expected. 173 | 3. Monitoring the cluster for any performance issues or resource constraints. 174 | 4. Troubleshooting any networking problems that may be impacting connectivity. 175 | 5. Updating Kubernetes components or applying patches as needed to ensure system stability and security. 176 | 6. Checking logs of specific pods or services for more detailed error messages to pinpoint the root cause of any issues. 177 | >> 2 178 | >> Since the log entries are still empty, the response remains the same: Everything looks good! If you encounter any specific issues or errors in the future, feel free to provide the logs for further analysis and troubleshooting. 179 | ``` 180 | 181 | Using KRS, you can effortlessly identify and optimize the tools within your Kubernetes clusters, whether on-premises or in the public cloud. The `krs` command feature, in particular, stands out by suggesting tools that are better suited for your cluster's specific needs. Discovering this functionality was a revelation, showcasing the tool's ingenuity in enhancing cluster management. It's a testament to the advanced capabilities of KRS, making it an indispensable asset for SRE and DevOps engineers and teams. 182 | -------------------------------------------------------------------------------- /arch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kubetoolsca/krs/f2af89aee7c3317d67dc51eb016168a948bc70d3/arch.png -------------------------------------------------------------------------------- /bhive.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kubetoolsca/krs/f2af89aee7c3317d67dc51eb016168a948bc70d3/bhive.png -------------------------------------------------------------------------------- /build/lib/krs/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kubetoolsca/krs/f2af89aee7c3317d67dc51eb016168a948bc70d3/build/lib/krs/__init__.py -------------------------------------------------------------------------------- /build/lib/krs/krs.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import typer, os 4 | from krs.main import KrsMain 5 | from krs.utils.constants import KRSSTATE_PICKLE_FILEPATH, KRS_DATA_DIRECTORY 6 | 7 | app = typer.Typer(help="krs: A command line interface to scan your Kubernetes Cluster, detect errors, provide resolutions using LLMs and recommend latest tools for your cluster") 8 | krs = KrsMain() 9 | 10 | def check_initialized(): 11 | if not os.path.exists(KRSSTATE_PICKLE_FILEPATH): 12 | typer.echo("KRS is not initialized. Please run 'krs init' first.") 13 | raise typer.Exit() 14 | 15 | if not os.path.exists(KRS_DATA_DIRECTORY): 16 | os.mkdir(KRS_DATA_DIRECTORY) 17 | 18 | @app.command() 19 | def init(kubeconfig: str = typer.Option('~/.kube/config', help="Custom path for kubeconfig file if not default")): 20 | """ 21 | Initializes the services and loads the scanner. 22 | """ 23 | krs.initialize(kubeconfig) 24 | typer.echo("Services initialized and scanner loaded.") 25 | 26 | @app.command() 27 | def scan(): 28 | """ 29 | Scans the cluster and extracts a list of tools that are currently used. 30 | """ 31 | check_initialized() 32 | krs.scan_cluster() 33 | 34 | 35 | @app.command() 36 | def namespaces(): 37 | """ 38 | Lists all the namespaces. 39 | """ 40 | check_initialized() 41 | namespaces = krs.list_namespaces() 42 | typer.echo("Namespaces in your cluster are: \n") 43 | for i, namespace in enumerate(namespaces): 44 | typer.echo(str(i+1)+ ". "+ namespace) 45 | 46 | @app.command() 47 | def pods(namespace: str = typer.Option(None, help="Specify namespace to list pods from")): 48 | """ 49 | Lists all the pods with namespaces, or lists pods under a specified namespace. 50 | """ 51 | check_initialized() 52 | if namespace: 53 | pods = krs.list_pods(namespace) 54 | if pods == 'wrong namespace name': 55 | typer.echo(f"\nWrong namespace name entered, try again!\n") 56 | raise typer.Abort() 57 | typer.echo(f"\nPods in namespace '{namespace}': \n") 58 | else: 59 | pods = krs.list_pods_all() 60 | typer.echo("\nAll pods in the cluster: \n") 61 | 62 | for i, pod in enumerate(pods): 63 | typer.echo(str(i+1)+ '. '+ pod) 64 | 65 | @app.command() 66 | def recommend(): 67 | """ 68 | Generates a table of recommended tools from our ranking database and their CNCF project status. 69 | """ 70 | check_initialized() 71 | krs.generate_recommendations() 72 | 73 | @app.command() 74 | def health(change_model: bool = typer.Option(False, help="Option to reinitialize/change the LLM, if set to True"), 75 | device: str = typer.Option('cpu', help='Option to run Huggingface models on GPU by entering the option as "gpu"')): 76 | """ 77 | Starts an interactive terminal using an LLM of your choice to detect and fix issues with your cluster 78 | """ 79 | check_initialized() 80 | typer.echo("\nStarting interactive terminal...\n") 81 | krs.health_check(change_model, device) 82 | 83 | @app.command() 84 | def export(): 85 | """ 86 | Exports pod info with logs and events. 87 | """ 88 | check_initialized() 89 | krs.export_pod_info() 90 | typer.echo("Pod info with logs and events exported. Json file saved to current directory!") 91 | 92 | @app.command() 93 | def exit(): 94 | """ 95 | Ends krs services safely and deletes all state files from system. Removes all cached data. 96 | """ 97 | check_initialized() 98 | krs.exit() 99 | typer.echo("Krs services closed safely.") 100 | 101 | if __name__ == "__main__": 102 | app() 103 | -------------------------------------------------------------------------------- /build/lib/krs/main.py: -------------------------------------------------------------------------------- 1 | from krs.utils.fetch_tools_krs import krs_tool_ranking_info 2 | from krs.utils.cluster_scanner import KubetoolsScanner 3 | from krs.utils.llm_client import KrsGPTClient 4 | from krs.utils.functional import extract_log_entries, CustomJSONEncoder 5 | import os, pickle, time, json 6 | from tabulate import tabulate 7 | from krs.utils.constants import (KRSSTATE_PICKLE_FILEPATH, LLMSTATE_PICKLE_FILEPATH, POD_INFO_FILEPATH, KRS_DATA_DIRECTORY) 8 | 9 | class KrsMain: 10 | 11 | def __init__(self): 12 | 13 | self.pod_info = None 14 | self.pod_list = None 15 | self.namespaces = None 16 | self.deployments = None 17 | self.state_file = KRSSTATE_PICKLE_FILEPATH 18 | self.isClusterScanned = False 19 | self.continue_chat = False 20 | self.logs_extracted = [] 21 | self.scanner = None 22 | self.get_events = True 23 | self.get_logs = True 24 | self.cluster_tool_list = None 25 | self.detailed_cluster_tool_list = None 26 | self.category_cluster_tools_dict = None 27 | 28 | self.load_state() 29 | 30 | def initialize(self, config_file='~/.kube/config'): 31 | self.config_file = config_file 32 | self.tools_dict, self.category_dict, cncf_status_dict = krs_tool_ranking_info() 33 | self.cncf_status = cncf_status_dict['cncftools'] 34 | self.scanner = KubetoolsScanner(self.get_events, self.get_logs, self.config_file) 35 | self.save_state() 36 | 37 | def save_state(self): 38 | state = { 39 | 'pod_info': self.pod_info, 40 | 'pod_list': self.pod_list, 41 | 'namespaces': self.namespaces, 42 | 'deployments': self.deployments, 43 | 'cncf_status': self.cncf_status, 44 | 'tools_dict': self.tools_dict, 45 | 'category_tools_dict': self.category_dict, 46 | 'extracted_logs': self.logs_extracted, 47 | 'kubeconfig': self.config_file, 48 | 'isScanned': self.isClusterScanned, 49 | 'cluster_tool_list': self.cluster_tool_list, 50 | 'detailed_tool_list': self.detailed_cluster_tool_list, 51 | 'category_tool_list': self.category_cluster_tools_dict 52 | } 53 | os.makedirs(os.path.dirname(self.state_file), exist_ok=True) 54 | with open(self.state_file, 'wb') as f: 55 | pickle.dump(state, f) 56 | 57 | def load_state(self): 58 | if os.path.exists(self.state_file): 59 | with open(self.state_file, 'rb') as f: 60 | state = pickle.load(f) 61 | self.pod_info = state.get('pod_info') 62 | self.pod_list = state.get('pod_list') 63 | self.namespaces = state.get('namespaces') 64 | self.deployments = state.get('deployments') 65 | self.cncf_status = state.get('cncf_status') 66 | self.tools_dict = state.get('tools_dict') 67 | self.category_dict = state.get('category_tools_dict') 68 | self.logs_extracted = state.get('extracted_logs') 69 | self.config_file = state.get('kubeconfig') 70 | self.isClusterScanned = state.get('isScanned') 71 | self.cluster_tool_list = state.get('cluster_tool_list') 72 | self.detailed_cluster_tool_list = state.get('detailed_tool_list') 73 | self.category_cluster_tools_dict = state.get('category_tool_list') 74 | self.scanner = KubetoolsScanner(self.get_events, self.get_logs, self.config_file) 75 | 76 | def check_scanned(self): 77 | if not self.isClusterScanned: 78 | self.pod_list, self.pod_info, self.deployments, self.namespaces = self.scanner.scan_kubernetes_deployment() 79 | self.save_state() 80 | 81 | def list_namespaces(self): 82 | self.check_scanned() 83 | return self.scanner.list_namespaces() 84 | 85 | def list_pods(self, namespace): 86 | self.check_scanned() 87 | if namespace not in self.list_namespaces(): 88 | return "wrong namespace name" 89 | return self.scanner.list_pods(namespace) 90 | 91 | def list_pods_all(self): 92 | self.check_scanned() 93 | return self.scanner.list_pods_all() 94 | 95 | def detect_tools_from_repo(self): 96 | tool_set = set() 97 | for pod in self.pod_list: 98 | for service_name in pod.split('-'): 99 | if service_name in self.tools_dict.keys(): 100 | tool_set.add(service_name) 101 | 102 | for dep in self.deployments: 103 | for service_name in dep.split('-'): 104 | if service_name in self.tools_dict.keys(): 105 | tool_set.add(service_name) 106 | 107 | return list(tool_set) 108 | 109 | def extract_rankings(self): 110 | tool_dict = {} 111 | category_tools_dict = {} 112 | for tool in self.cluster_tool_list: 113 | tool_details = self.tools_dict[tool] 114 | for detail in tool_details: 115 | rank = detail['rank'] 116 | category = detail['category'] 117 | if category not in category_tools_dict: 118 | category_tools_dict[category] = [] 119 | category_tools_dict[category].append(rank) 120 | 121 | tool_dict[tool] = tool_details 122 | 123 | return tool_dict, category_tools_dict 124 | 125 | def generate_recommendations(self): 126 | 127 | if not self.isClusterScanned: 128 | self.scan_cluster() 129 | 130 | self.print_recommendations() 131 | 132 | def scan_cluster(self): 133 | 134 | print("\nScanning your cluster...\n") 135 | self.pod_list, self.pod_info, self.deployments, self.namespaces = self.scanner.scan_kubernetes_deployment() 136 | self.isClusterScanned = True 137 | print("Cluster scanned successfully...\n") 138 | self.cluster_tool_list = self.detect_tools_from_repo() 139 | print("Extracted tools used in cluster...\n") 140 | self.detailed_cluster_tool_list, self.category_cluster_tools_dict = self.extract_rankings() 141 | 142 | self.print_scan_results() 143 | self.save_state() 144 | 145 | def print_scan_results(self): 146 | scan_results = [] 147 | 148 | for tool, details in self.detailed_cluster_tool_list.items(): 149 | first_entry = True 150 | for detail in details: 151 | row = [tool if first_entry else "", detail['rank'], detail['category'], self.cncf_status.get(tool, 'unlisted')] 152 | scan_results.append(row) 153 | first_entry = False 154 | 155 | print("\nThe cluster is using the following tools:\n") 156 | print(tabulate(scan_results, headers=["Tool Name", "Rank", "Category", "CNCF Status"], tablefmt="grid")) 157 | 158 | def print_recommendations(self): 159 | recommendations = [] 160 | 161 | for category, ranks in self.category_cluster_tools_dict.items(): 162 | rank = ranks[0] 163 | recommended_tool = self.category_dict[category][1]['name'] 164 | status = self.cncf_status.get(recommended_tool, 'unlisted') 165 | if rank == 1: 166 | row = [category, "Already using the best", recommended_tool, status] 167 | else: 168 | row = [category, "Recommended tool", recommended_tool, status] 169 | recommendations.append(row) 170 | 171 | print("\nOur recommended tools for this deployment are:\n") 172 | print(tabulate(recommendations, headers=["Category", "Recommendation", "Tool Name", "CNCF Status"], tablefmt="grid")) 173 | 174 | 175 | def health_check(self, change_model=False, device='cpu'): 176 | 177 | if os.path.exists(LLMSTATE_PICKLE_FILEPATH) and not change_model: 178 | continue_previous_chat = input("\nDo you want to continue fixing the previously selected pod ? (y/n): >> ") 179 | while True: 180 | if continue_previous_chat not in ['y', 'n']: 181 | continue_previous_chat = input("\nPlease enter one of the given options ? (y/n): >> ") 182 | else: 183 | break 184 | 185 | if continue_previous_chat=='y': 186 | krsllmclient = KrsGPTClient(device=device) 187 | self.continue_chat = True 188 | else: 189 | krsllmclient = KrsGPTClient(reset_history=True, device=device) 190 | 191 | else: 192 | krsllmclient = KrsGPTClient(reinitialize=True, device=device) 193 | self.continue_chat = False 194 | 195 | if not self.continue_chat: 196 | 197 | self.check_scanned() 198 | 199 | print("\nNamespaces in the cluster:\n") 200 | namespaces = self.list_namespaces() 201 | namespace_len = len(namespaces) 202 | for i, namespace in enumerate(namespaces, start=1): 203 | print(f"{i}. {namespace}") 204 | 205 | self.selected_namespace_index = int(input("\nWhich namespace do you want to check the health for? Select a namespace by entering its number: >> ")) 206 | while True: 207 | if self.selected_namespace_index not in list(range(1, namespace_len+1)): 208 | self.selected_namespace_index = int(input(f"\nWrong input! Select a namespace number between {1} to {namespace_len}: >> ")) 209 | else: 210 | break 211 | 212 | self.selected_namespace = namespaces[self.selected_namespace_index - 1] 213 | pod_list = self.list_pods(self.selected_namespace) 214 | pod_len = len(pod_list) 215 | print(f"\nPods in the namespace {self.selected_namespace}:\n") 216 | for i, pod in enumerate(pod_list, start=1): 217 | print(f"{i}. {pod}") 218 | self.selected_pod_index = int(input(f"\nWhich pod from {self.selected_namespace} do you want to check the health for? Select a pod by entering its number: >> ")) 219 | 220 | while True: 221 | if self.selected_pod_index not in list(range(1, pod_len+1)): 222 | self.selected_pod_index = int(input(f"\nWrong input! Select a pod number between {1} to {pod_len}: >> ")) 223 | else: 224 | break 225 | 226 | print("\nChecking status of the pod...") 227 | 228 | print("\nExtracting logs and events from the pod...") 229 | 230 | logs_from_pod = self.get_logs_from_pod(self.selected_namespace_index, self.selected_pod_index) 231 | 232 | self.logs_extracted = extract_log_entries(logs_from_pod) 233 | 234 | print("\nLogs and events from the pod extracted successfully!\n") 235 | 236 | prompt_to_llm = self.create_prompt(self.logs_extracted) 237 | 238 | krsllmclient.interactive_session(prompt_to_llm) 239 | 240 | self.save_state() 241 | 242 | def get_logs_from_pod(self, namespace_index, pod_index): 243 | try: 244 | namespace_index -= 1 245 | pod_index -= 1 246 | namespace = list(self.list_namespaces())[namespace_index] 247 | return list(self.pod_info[namespace][pod_index]['info']['Logs'].values())[0] 248 | except KeyError as e: 249 | print("\nKindly enter a value from the available namespaces and pods") 250 | return None 251 | 252 | def create_prompt(self, log_entries): 253 | prompt = "You are a DevOps expert with experience in Kubernetes. Analyze the following log entries:\n{\n" 254 | for entry in sorted(log_entries): # Sort to maintain consistent order 255 | prompt += f"{entry}\n" 256 | prompt += "}\nIf there is nothing of concern in between { }, return a message stating that 'Everything looks good!'. Explain the warnings and errors and the steps that should be taken to resolve the issues, only if they exist." 257 | return prompt 258 | 259 | def export_pod_info(self): 260 | 261 | self.check_scanned() 262 | 263 | with open(POD_INFO_FILEPATH, 'w') as f: 264 | json.dump(self.pod_info, f, cls=CustomJSONEncoder) 265 | 266 | 267 | def exit(self): 268 | 269 | try: 270 | # List all files and directories in the given directory 271 | files = os.listdir(KRS_DATA_DIRECTORY) 272 | for file in files: 273 | file_path = os.path.join(KRS_DATA_DIRECTORY, file) 274 | # Check if it's a file and not a directory 275 | if os.path.isfile(file_path): 276 | os.remove(file_path) # Delete the file 277 | print(f"Deleted file: {file_path}") 278 | 279 | except Exception as e: 280 | print(f"Error occurred: {e}") 281 | 282 | def main(self): 283 | self.scan_cluster() 284 | self.generate_recommendations() 285 | self.health_check() 286 | 287 | 288 | if __name__=='__main__': 289 | recommender = KrsMain() 290 | recommender.main() 291 | # logs_info = recommender.get_logs_from_pod(4,2) 292 | # print(logs_info) 293 | # logs = recommender.extract_log_entries(logs_info) 294 | # print(logs) 295 | # print(recommender.create_prompt(logs)) 296 | 297 | -------------------------------------------------------------------------------- /build/lib/krs/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kubetoolsca/krs/f2af89aee7c3317d67dc51eb016168a948bc70d3/build/lib/krs/utils/__init__.py -------------------------------------------------------------------------------- /build/lib/krs/utils/cluster_scanner.py: -------------------------------------------------------------------------------- 1 | from kubernetes import client, config 2 | import logging 3 | 4 | class KubetoolsScanner: 5 | def __init__(self, get_events=True, get_logs=True, config_file='~/.kube/config'): 6 | self.get_events = get_events 7 | self.get_logs = get_logs 8 | self.config_file = config_file 9 | self.v1 = None 10 | self.v2 = None 11 | self.setup_kubernetes_client() 12 | 13 | def setup_kubernetes_client(self): 14 | try: 15 | config.load_kube_config(config_file=self.config_file) 16 | self.v1 = client.AppsV1Api() 17 | self.v2 = client.CoreV1Api() 18 | except Exception as e: 19 | logging.error("Failed to load Kubernetes configuration: %s", e) 20 | raise 21 | 22 | def scan_kubernetes_deployment(self): 23 | try: 24 | deployments = self.v1.list_deployment_for_all_namespaces() 25 | namespaces = self.list_namespaces() 26 | except Exception as e: 27 | logging.error("Error fetching data from Kubernetes API: %s", e) 28 | return {}, {}, [] 29 | 30 | pod_dict = {} 31 | pod_list = [] 32 | for name in namespaces: 33 | pods = self.list_pods(name) 34 | pod_list += pods 35 | pod_dict[name] = [{'name': pod, 'info': self.get_pod_info(name, pod)} for pod in pods] 36 | 37 | deployment_list = [dep.metadata.name for dep in deployments.items] 38 | return pod_list, pod_dict, deployment_list, namespaces 39 | 40 | def list_namespaces(self): 41 | namespaces = self.v2.list_namespace() 42 | return [namespace.metadata.name for namespace in namespaces.items] 43 | 44 | def list_pods_all(self): 45 | pods = self.v2.list_pod_for_all_namespaces() 46 | return [pod.metadata.name for pod in pods.items] 47 | 48 | def list_pods(self, namespace): 49 | pods = self.v2.list_namespaced_pod(namespace) 50 | return [pod.metadata.name for pod in pods.items] 51 | 52 | def get_pod_info(self, namespace, pod, include_events=True, include_logs=True): 53 | """ 54 | Retrieves information about a specific pod in a given namespace. 55 | 56 | Args: 57 | namespace (str): The namespace of the pod. 58 | pod (str): The name of the pod. 59 | include_events (bool): Flag indicating whether to include events associated with the pod. 60 | include_logs (bool): Flag indicating whether to include logs of the pod. 61 | 62 | Returns: 63 | dict: A dictionary containing the pod information, events (if include_events is True), and logs (if include_logs is True). 64 | """ 65 | pod_info = self.v2.read_namespaced_pod(pod, namespace) 66 | pod_info_map = pod_info.to_dict() 67 | pod_info_map["metadata"]["managed_fields"] = None # Clean up metadata 68 | 69 | info = {'PodInfo': pod_info_map} 70 | 71 | if include_events: 72 | info['Events'] = self.fetch_pod_events(namespace, pod) 73 | 74 | if include_logs: 75 | # Retrieve logs for all containers within the pod 76 | container_logs = {} 77 | for container in pod_info.spec.containers: 78 | try: 79 | logs = self.v2.read_namespaced_pod_log(name=pod, namespace=namespace, container=container.name) 80 | container_logs[container.name] = logs 81 | except Exception as e: 82 | logging.error("Failed to fetch logs for container %s in pod %s: %s", container.name, pod, e) 83 | container_logs[container.name] = "Error fetching logs: " + str(e) 84 | info['Logs'] = container_logs 85 | 86 | return info 87 | 88 | def fetch_pod_events(self, namespace, pod): 89 | events = self.v2.list_namespaced_event(namespace) 90 | return [{ 91 | 'Name': event.metadata.name, 92 | 'Message': event.message, 93 | 'Reason': event.reason 94 | } for event in events.items if event.involved_object.name == pod] 95 | 96 | 97 | if __name__ == '__main__': 98 | 99 | scanner = KubetoolsScanner() 100 | pod_list, pod_info, deployments, namespaces = scanner.scan_kubernetes_deployment() 101 | print("POD List: \n\n", pod_list) 102 | print("\n\nPOD Info: \n\n", pod_info.keys()) 103 | print("\n\nNamespaces: \n\n", namespaces) 104 | print("\n\nDeployments : \n\n", deployments) 105 | 106 | -------------------------------------------------------------------------------- /build/lib/krs/utils/constants.py: -------------------------------------------------------------------------------- 1 | KUBETOOLS_JSONPATH = 'krs/data/kubetools_data.json' 2 | KUBETOOLS_DATA_JSONURL = 'https://raw.githubusercontent.com/Kubetools-Technologies-Inc/kubetools_data/main/data/kubetools_data.json' 3 | 4 | CNCF_YMLPATH = 'krs/data/landscape.yml' 5 | CNCF_YMLURL = 'https://raw.githubusercontent.com/cncf/landscape/master/landscape.yml' 6 | CNCF_TOOLS_JSONPATH = 'krs/data/cncf_tools.json' 7 | 8 | TOOLS_RANK_JSONPATH = 'krs/data/tools_rank.json' 9 | CATEGORY_RANK_JSONPATH = 'krs/data/category_rank.json' 10 | 11 | LLMSTATE_PICKLE_FILEPATH = 'krs/data/llmstate.pkl' 12 | KRSSTATE_PICKLE_FILEPATH = 'krs/data/krsstate.pkl' 13 | 14 | POD_INFO_FILEPATH = './exported_pod_info.json' 15 | 16 | MAX_OUTPUT_TOKENS = 512 17 | 18 | KRS_DATA_DIRECTORY = 'krs/data' 19 | -------------------------------------------------------------------------------- /build/lib/krs/utils/fetch_tools_krs.py: -------------------------------------------------------------------------------- 1 | import json 2 | import requests 3 | import yaml 4 | from krs.utils.constants import (KUBETOOLS_DATA_JSONURL, KUBETOOLS_JSONPATH, CNCF_YMLPATH, CNCF_YMLURL, CNCF_TOOLS_JSONPATH, TOOLS_RANK_JSONPATH, CATEGORY_RANK_JSONPATH) 5 | 6 | # Function to convert 'githubStars' to a float, or return 0 if it cannot be converted 7 | def get_github_stars(tool): 8 | stars = tool.get('githubStars', 0) 9 | try: 10 | return float(stars) 11 | except ValueError: 12 | return 0.0 13 | 14 | # Function to download and save a file 15 | def download_file(url, filename): 16 | response = requests.get(url) 17 | response.raise_for_status() # Ensure we notice bad responses 18 | with open(filename, 'wb') as file: 19 | file.write(response.content) 20 | 21 | def parse_yaml_to_dict(yaml_file_path): 22 | with open(yaml_file_path, 'r') as file: 23 | data = yaml.safe_load(file) 24 | 25 | cncftools = {} 26 | 27 | for category in data.get('landscape', []): 28 | for subcategory in category.get('subcategories', []): 29 | for item in subcategory.get('items', []): 30 | item_name = item.get('name').lower() 31 | project_status = item.get('project', 'listed') 32 | cncftools[item_name] = project_status 33 | 34 | return {'cncftools': cncftools} 35 | 36 | def save_json_file(jsondict, jsonpath): 37 | 38 | # Write the category dictionary to a new JSON file 39 | with open(jsonpath, 'w') as f: 40 | json.dump(jsondict, f, indent=4) 41 | 42 | 43 | def krs_tool_ranking_info(): 44 | # New dictionaries 45 | tools_dict = {} 46 | category_tools_dict = {} 47 | 48 | download_file(KUBETOOLS_DATA_JSONURL, KUBETOOLS_JSONPATH) 49 | download_file(CNCF_YMLURL, CNCF_YMLPATH) 50 | 51 | with open(KUBETOOLS_JSONPATH) as f: 52 | data = json.load(f) 53 | 54 | for category in data: 55 | # Sort the tools in the current category by the number of GitHub stars 56 | sorted_tools = sorted(category['tools'], key=get_github_stars, reverse=True) 57 | 58 | for i, tool in enumerate(sorted_tools, start=1): 59 | tool["name"] = tool['name'].replace("\t", "").lower() 60 | tool['ranking'] = i 61 | 62 | # Update tools_dict 63 | tools_dict.setdefault(tool['name'], []).append({ 64 | 'rank': i, 65 | 'category': category['category']['name'], 66 | 'url': tool['link'] 67 | }) 68 | 69 | # Update ranked_tools_dict 70 | category_tools_dict.setdefault(category['category']['name'], {}).update({i: {'name': tool['name'], 'url': tool['link']}}) 71 | 72 | 73 | cncf_tools_dict = parse_yaml_to_dict(CNCF_YMLPATH) 74 | save_json_file(cncf_tools_dict, CNCF_TOOLS_JSONPATH) 75 | save_json_file(tools_dict, TOOLS_RANK_JSONPATH) 76 | save_json_file(category_tools_dict, CATEGORY_RANK_JSONPATH) 77 | 78 | return tools_dict, category_tools_dict, cncf_tools_dict 79 | 80 | if __name__=='__main__': 81 | tools_dict, category_tools_dict, cncf_tools_dict = krs_tool_ranking_info() 82 | print(cncf_tools_dict) 83 | 84 | -------------------------------------------------------------------------------- /build/lib/krs/utils/functional.py: -------------------------------------------------------------------------------- 1 | from difflib import SequenceMatcher 2 | import re, json 3 | from datetime import datetime 4 | 5 | class CustomJSONEncoder(json.JSONEncoder): 6 | """JSON Encoder for complex objects not serializable by default json code.""" 7 | def default(self, obj): 8 | if isinstance(obj, datetime): 9 | # Format datetime object as a string in ISO 8601 format 10 | return obj.isoformat() 11 | # Let the base class default method raise the TypeError 12 | return json.JSONEncoder.default(self, obj) 13 | 14 | def similarity(a, b): 15 | return SequenceMatcher(None, a, b).ratio() 16 | 17 | def filter_similar_entries(log_entries): 18 | unique_entries = list(log_entries) 19 | to_remove = set() 20 | 21 | # Compare each pair of log entries 22 | for i in range(len(unique_entries)): 23 | for j in range(i + 1, len(unique_entries)): 24 | if similarity(unique_entries[i], unique_entries[j]) > 0.85: 25 | # Choose the shorter entry to remove, or either if they are the same length 26 | if len(unique_entries[i]) > len(unique_entries[j]): 27 | to_remove.add(unique_entries[i]) 28 | else: 29 | to_remove.add(unique_entries[j]) 30 | 31 | # Filter out the highly similar entries 32 | filtered_entries = {entry for entry in unique_entries if entry not in to_remove} 33 | return filtered_entries 34 | 35 | def extract_log_entries(log_contents): 36 | # Patterns to match different log formats 37 | patterns = [ 38 | re.compile(r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{6}Z\s+(warn|error)\s+\S+\s+(.*)', re.IGNORECASE), 39 | re.compile(r'[WE]\d{4} \d{2}:\d{2}:\d{2}.\d+\s+\d+\s+(.*)'), 40 | re.compile(r'({.*})') 41 | ] 42 | 43 | log_entries = set() 44 | # Attempt to match each line with all patterns 45 | for line in log_contents.split('\n'): 46 | for pattern in patterns: 47 | match = pattern.search(line) 48 | if match: 49 | if match.groups()[0].startswith('{'): 50 | # Handle JSON formatted log entries 51 | try: 52 | log_json = json.loads(match.group(1)) 53 | if 'severity' in log_json and log_json['severity'].lower() in ['error', 'warning']: 54 | level = "Error" if log_json['severity'] == "ERROR" else "Warning" 55 | message = log_json.get('error', '') if 'error' in log_json.keys() else line 56 | log_entries.add(f"{level}: {message.strip()}") 57 | elif 'level' in log_json: 58 | level = "Error" if log_json['level'] == "error" else "Warning" 59 | message = log_json.get('msg', '') + log_json.get('error', '') 60 | log_entries.add(f"{level}: {message.strip()}") 61 | except json.JSONDecodeError: 62 | continue # Skip if JSON is not valid 63 | else: 64 | if len(match.groups()) == 2: 65 | level, message = match.groups() 66 | elif len(match.groups()) == 1: 67 | message = match.group(1) # Assuming error as default 68 | level = "ERROR" # Default if not specified in the log 69 | 70 | level = "Error" if "error" in level.lower() else "Warning" 71 | formatted_message = f"{level}: {message.strip()}" 72 | log_entries.add(formatted_message) 73 | break # Stop after the first match 74 | 75 | return filter_similar_entries(log_entries) -------------------------------------------------------------------------------- /build/lib/krs/utils/llm_client.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | import subprocess 3 | import os, time 4 | from krs.utils.constants import (MAX_OUTPUT_TOKENS, LLMSTATE_PICKLE_FILEPATH) 5 | 6 | class KrsGPTClient: 7 | 8 | def __init__(self, reinitialize=False, reset_history=False, device='cpu'): 9 | 10 | self.reinitialize = reinitialize 11 | self.client = None 12 | self.pipeline = None 13 | self.provider = None 14 | self.model = None 15 | self.openai_api_key = None 16 | self.continue_chat = False 17 | self.history = [] 18 | self.max_tokens = MAX_OUTPUT_TOKENS 19 | self.device = device 20 | 21 | 22 | if not self.reinitialize: 23 | print("\nLoading LLM State..") 24 | self.load_state() 25 | print("\nModel: ", self.model) 26 | if not self.model: 27 | self.initialize_client() 28 | 29 | self.history = [] if reset_history == True else self.history 30 | 31 | if self.history: 32 | continue_chat = input("\n\nDo you want to continue previous chat ? (y/n) >> ") 33 | while continue_chat not in ['y', 'n']: 34 | print("Please enter either y or n!") 35 | continue_chat = input("\nDo you want to continue previous chat ? (y/n) >> ") 36 | if continue_chat == 'No': 37 | self.history = [] 38 | else: 39 | self.continue_chat = True 40 | 41 | def save_state(self, filename=LLMSTATE_PICKLE_FILEPATH): 42 | state = { 43 | 'provider': self.provider, 44 | 'model': self.model, 45 | 'history': self.history, 46 | 'openai_api_key': self.openai_api_key 47 | } 48 | with open(filename, 'wb') as output: 49 | pickle.dump(state, output, pickle.HIGHEST_PROTOCOL) 50 | 51 | def load_state(self): 52 | try: 53 | with open(LLMSTATE_PICKLE_FILEPATH, 'rb') as f: 54 | state = pickle.load(f) 55 | self.provider = state['provider'] 56 | self.model = state['model'] 57 | self.history = state.get('history', []) 58 | self.openai_api_key = state.get('openai_api_key', '') 59 | if self.provider == 'OpenAI': 60 | self.init_openai_client(reinitialize=True) 61 | elif self.provider == 'huggingface': 62 | self.init_huggingface_client(reinitialize=True) 63 | except (FileNotFoundError, EOFError): 64 | pass 65 | 66 | def install_package(self, package_name): 67 | import importlib 68 | try: 69 | importlib.import_module(package_name) 70 | print(f"\n{package_name} is already installed.") 71 | except ImportError: 72 | print(f"\nInstalling {package_name}...", end='', flush=True) 73 | result = subprocess.run(['pip', 'install', package_name], stdout=subprocess.PIPE, stderr=subprocess.PIPE) 74 | if result.returncode == 0: 75 | print(f" \n{package_name} installed successfully.") 76 | else: 77 | print(f" \nFailed to install {package_name}.") 78 | 79 | 80 | def initialize_client(self): 81 | if not self.client and not self.pipeline: 82 | choice = input("\nChoose the model provider for healthcheck: \n\n[1] OpenAI \n[2] Huggingface\n\n>> ") 83 | if choice == '1': 84 | self.init_openai_client() 85 | elif choice == '2': 86 | self.init_huggingface_client() 87 | else: 88 | raise ValueError("Invalid option selected") 89 | 90 | def init_openai_client(self, reinitialize=False): 91 | 92 | if not reinitialize: 93 | print("\nInstalling necessary libraries..........") 94 | self.install_package('openai') 95 | 96 | import openai 97 | from openai import OpenAI 98 | 99 | self.provider = 'OpenAI' 100 | self.openai_api_key = input("\nEnter your OpenAI API key: ") if not reinitialize else self.openai_api_key 101 | self.model = input("\nEnter the OpenAI model name: ") if not reinitialize else self.model 102 | 103 | self.client = OpenAI(api_key=self.openai_api_key) 104 | 105 | if not reinitialize or self.reinitialize: 106 | while True: 107 | try: 108 | self.validate_openai_key() 109 | break 110 | except openai.error.AuthenticationError: 111 | self.openai_api_key = input("\nInvalid Key! Please enter the correct OpenAI API key: ") 112 | except openai.error.InvalidRequestError as e: 113 | print(e) 114 | self.model = input("\nEnter an OpenAI model name from latest OpenAI docs: ") 115 | except openai.APIConnectionError as e: 116 | print(e) 117 | self.init_openai_client(reinitialize=False) 118 | 119 | self.save_state() 120 | 121 | def init_huggingface_client(self, reinitialize=False): 122 | 123 | if not reinitialize: 124 | print("\nInstalling necessary libraries..........") 125 | self.install_package('transformers') 126 | self.install_package('torch') 127 | 128 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 129 | 130 | import warnings 131 | from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer 132 | 133 | warnings.filterwarnings("ignore", category=FutureWarning) 134 | 135 | self.provider = 'huggingface' 136 | self.model = input("\nEnter the Huggingface model name: ") if not reinitialize else self.model 137 | 138 | try: 139 | self.tokenizer = AutoTokenizer.from_pretrained(self.model) 140 | self.model_hf = AutoModelForCausalLM.from_pretrained(self.model) 141 | self.pipeline = pipeline('text-generation', model=self.model_hf, tokenizer=self.tokenizer, device=0 if self.device == 'gpu' else -1) 142 | 143 | except OSError as e: 144 | print("\nError loading model: ", e) 145 | print("\nPlease enter a valid Huggingface model name.") 146 | self.init_huggingface_client(reinitialize=True) 147 | 148 | self.save_state() 149 | 150 | def validate_openai_key(self): 151 | """Validate the OpenAI API key by attempting a small request.""" 152 | response = self.client.chat.completions.create( 153 | model=self.model, 154 | messages=[{"role": "user", "content": "Test prompt, do nothing"}], 155 | max_tokens=5 156 | ) 157 | print("API key and model are valid.") 158 | 159 | def infer(self, prompt): 160 | self.history.append({"role": "user", "content": prompt}) 161 | input_prompt = self.history_to_prompt() 162 | 163 | if self.provider == 'OpenAI': 164 | response = self.client.chat.completions.create( 165 | model=self.model, 166 | messages=input_prompt, 167 | max_tokens = self.max_tokens 168 | ) 169 | output = response.choices[0].message.content.strip() 170 | 171 | elif self.provider == 'huggingface': 172 | responses = self.pipeline(input_prompt, max_new_tokens=self.max_tokens) 173 | output = responses[0]['generated_text'] 174 | 175 | self.history.append({"role": "assistant", "content": output}) 176 | print(">> ", output) 177 | 178 | def interactive_session(self, prompt_input): 179 | print("\nInteractive session started. Type 'end chat' to exit from the session!\n") 180 | 181 | if self.continue_chat: 182 | print('>> ', self.history[-1]['content']) 183 | else: 184 | initial_prompt = prompt_input 185 | self.infer(initial_prompt) 186 | 187 | while True: 188 | prompt = input("\n>> ") 189 | if prompt.lower() == 'end chat': 190 | break 191 | self.infer(prompt) 192 | self.save_state() 193 | 194 | def history_to_prompt(self): 195 | if self.provider == 'OpenAI': 196 | return self.history 197 | elif self.provider == 'huggingface': 198 | return " ".join([item["content"] for item in self.history]) 199 | 200 | if __name__ == "__main__": 201 | client = KrsGPTClient(reinitialize=False) 202 | # client.interactive_session("You are an 8th grade math tutor. Ask questions to gauge my expertise so that you can generate a training plan for me.") 203 | 204 | -------------------------------------------------------------------------------- /demo.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kubetoolsca/krs/f2af89aee7c3317d67dc51eb016168a948bc70d3/demo.gif -------------------------------------------------------------------------------- /dist/krs-0.1.0-py3.10.egg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kubetoolsca/krs/f2af89aee7c3317d67dc51eb016168a948bc70d3/dist/krs-0.1.0-py3.10.egg -------------------------------------------------------------------------------- /dokc: -------------------------------------------------------------------------------- 1 | ## Prerequisites 2 | -------------------------------------------------------------------------------- /dokc.md: -------------------------------------------------------------------------------- 1 | # Install and Configure Krs with Digital Ocean Kubernetes Cluster 2 | 3 | 4 | ## Prerequisites 5 | 6 | - Digital Ocean Account 7 | - Homebrew(if you're on Mac) 8 | 9 | ## Getting Started 10 | 11 | ### 1. Setup a Kubernetes Cluster on Digital Ocean 12 | 13 | ![DGO_Kube_Cluster](https://github.com/kubetoolsca/krs/assets/171302280/6bd344c8-1a3f-4006-b187-49f01d0f5e56) 14 | 15 | ### 2. Install and Setup doctl on your Local Machine 16 | 17 | If on Ubuntu use: 18 | 19 | ``` 20 | sudo snap install doctl 21 | sudo snap connect doctl:kube-config 22 | sudo snap connect doctl:ssh-keys :ssh-keys 23 | sudo snap connect doctl:dot-docker 24 | ``` 25 | 26 | If on Mac, use: 27 | 28 | ``` 29 | brew install doctl 30 | ``` 31 | 32 | ### 3. Authenticate your Digital Ocean Account 33 | 34 | ``` 35 | doctl auth init 36 | ``` 37 | 38 | ### 4. Connect your Local Machine to your Digital Ocean Kubernetes Cluster 39 | 40 | ``` 41 | doctl kubernetes cluster kubeconfig save ea3a5a97-fdba-4455-bd81-46df80c68267 42 | ``` 43 | 44 | ### 5. Setup KRS using these commands: 45 | 46 | ``` 47 | git clone https://github.com/kubetoolsca/krs.git 48 | cd krs 49 | pip install . 50 | ``` 51 | 52 | ### 6. Initialize KRS to permit it access to your cluster using the given command, 53 | 54 | ``` 55 | krs init 56 | ``` 57 | 58 | ### 7. Get a view of all possible actions with KRS, by running the given command 59 | ``` 60 | krs --help 61 | ``` 62 | 63 | ``` 64 | krs --help 65 | 66 | Usage: krs [OPTIONS] COMMAND [ARGS]... 67 | 68 | krs: A command line interface to scan your Kubernetes Cluster, detect errors, 69 | provide resolutions using LLMs and recommend latest tools for your cluster 70 | 71 | ╭─ Options ────────────────────────────────────────────────────────────────────╮ 72 | │ --install-completion Install completion for the current shell. │ 73 | │ --show-completion Show completion for the current shell, to copy │ 74 | │ it or customize the installation. │ 75 | │ --help Show this message and exit. │ 76 | ╰──────────────────────────────────────────────────────────────────────────────╯ 77 | ╭─ Commands ───────────────────────────────────────────────────────────────────╮ 78 | │ exit Ends krs services safely and deletes all state files from │ 79 | │ system. Removes all cached data. │ 80 | │ export Exports pod info with logs and events. │ 81 | │ health Starts an interactive terminal using an LLM of your choice to │ 82 | │ detect and fix issues with your cluster │ 83 | │ init Initializes the services and loads the scanner. │ 84 | │ namespaces Lists all the namespaces. │ 85 | │ pods Lists all the pods with namespaces, or lists pods under a │ 86 | │ specified namespace. │ 87 | │ recommend Generates a table of recommended tools from our ranking │ 88 | │ database and their CNCF project status. │ 89 | │ scan Scans the cluster and extracts a list of tools that are │ 90 | │ currently used. │ 91 | ╰──────────────────────────────────────────────────────────────────────────────╯ 92 | ``` 93 | ### 8. Permit KRS to get information on the tools utilized in your cluster by running the given command 94 | 95 | ``` 96 | krs scan 97 | ``` 98 | 99 | ``` 100 | krs scan 101 | 102 | Scanning your cluster... 103 | 104 | Cluster scanned successfully... 105 | 106 | Extracted tools used in cluster... 107 | 108 | 109 | The cluster is using the following tools: 110 | 111 | +-------------+--------+------------------+---------------+ 112 | | Tool Name | Rank | Category | CNCF Status | 113 | +=============+========+==================+===============+ 114 | | cilium | 1 | Network Policies | graduated | 115 | +-------------+--------+------------------+---------------+ 116 | | hubble | 7 | Security Tools | listed | 117 | +-------------+--------+------------------+---------------+ 118 | 119 | ``` 120 | 121 | ### 9. Get recommendations on possible tools to use in your cluster by running the given command 122 | 123 | ``` 124 | krs recommend 125 | ``` 126 | 127 | ``` 128 | krs recommend 129 | 130 | Our recommended tools for this deployment are: 131 | 132 | +------------------+------------------------+-------------+---------------+ 133 | | Category | Recommendation | Tool Name | CNCF Status | 134 | +==================+========================+=============+===============+ 135 | | Network Policies | Already using the best | cilium | graduated | 136 | +------------------+------------------------+-------------+---------------+ 137 | | Security Tools | Recommended tool | trivy | listed | 138 | +------------------+------------------------+-------------+---------------+ 139 | 140 | ``` 141 | 142 | ### 10. Check the pod and namespace status in your Kubernetes cluster, including errors. 143 | 144 | ``` 145 | krs health 146 | ``` 147 | 148 | ``` 149 | krs health 150 | 151 | Starting interactive terminal... 152 | 153 | 154 | Choose the model provider for healthcheck: 155 | 156 | [1] OpenAI 157 | [2] Huggingface 158 | 159 | >> 1 160 | 161 | Installing necessary libraries.......... 162 | 163 | openai is already installed. 164 | 165 | Enter your OpenAI API key: sk-proj-qxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxP 166 | 167 | Enter the OpenAI model name: gpt-3.5-turbo 168 | API key and model are valid. 169 | 170 | Namespaces in the cluster: 171 | 172 | 1. default 173 | 2. kube-node-lease 174 | 3. kube-public 175 | 4. kube-system 176 | 5. portainer 177 | 178 | Which namespace do you want to check the health for? Select a namespace by entering its number: >> 4 179 | 180 | Pods in the namespace kube-system: 181 | 182 | 1. cilium-9lqbq 183 | 2. cilium-ffpct 184 | 3. cilium-pvknr 185 | 4. coredns-85f59d8784-nvr2n 186 | 5. coredns-85f59d8784-p9jcv 187 | 6. cpc-bridge-proxy-c6xzr 188 | 7. cpc-bridge-proxy-p7r4p 189 | 8. cpc-bridge-proxy-tkfrd 190 | 9. csi-do-node-hwxn7 191 | 10. csi-do-node-q27rc 192 | 11. csi-do-node-rn7dm 193 | 12. do-node-agent-6t5ms 194 | 13. do-node-agent-85r8b 195 | 14. do-node-agent-m7bvr 196 | 15. hubble-relay-74686df4df-856pj 197 | 16. hubble-ui-86cc69bddc-xc745 198 | 17. konnectivity-agent-9k8vk 199 | 18. konnectivity-agent-h5fm2 200 | 19. konnectivity-agent-kf4xh 201 | 20. kube-proxy-94945 202 | 21. kube-proxy-qgv4j 203 | 22. kube-proxy-vztzf 204 | 205 | Which pod from kube-system do you want to check the health for? Select a pod by entering its number: >> 1 206 | 207 | Checking status of the pod... 208 | 209 | Extracting logs and events from the pod... 210 | 211 | Logs and events from the pod extracted successfully! 212 | 213 | 214 | Interactive session started. Type 'end chat' to exit from the session! 215 | 216 | >> The log entries provided are empty {}, so there is nothing to analyze. Therefore, I can confirm that 'Everything looks good!' in this case. 217 | 218 | If there were warnings or errors in the log entries, I would have analyzed them thoroughly to identify the root cause. Depending on the specific warnings or errors, potential steps to resolve the issues could include: 219 | 220 | 1. Analyzing the specific error message to understand the problem 221 | 2. Checking Kubernetes resources (e.g., pods, deployments, configmaps) for any misconfigurations 222 | 3. Verifying connectivity to external resources or dependencies 223 | 4. Checking for resource limitations or constraints that could be causing issues 224 | 5. Reviewing recent changes in the Kubernetes environment that could have introduced problems 225 | 6. Using Kubernetes troubleshooting tools like kubectl logs, describe, or events to gather more information 226 | 227 | By following these steps and addressing any identified issues, you can resolve warnings or errors in the Kubernetes environment. 228 | 229 | >> Wonderful, anything else to note? 230 | >> In addition to resolving warnings or errors in Kubernetes logs, it's important to regularly monitor and maintain the Kubernetes environment to ensure smooth operation. Some best practices for Kubernetes maintenance include: 231 | 232 | 1. Regularly updating Kubernetes components and cluster nodes to the latest stable versions to benefit from bug fixes and security patches. 233 | 2. Implementing automated backups of critical data and configurations to prevent data loss in case of failures. 234 | 3. Monitoring resource utilization and scaling components as needed to optimize performance and cost efficiency. 235 | 4. Implementing security best practices, such as network policies, RBAC, and pod security policies to protect the cluster from unauthorized access or malicious activities. 236 | 5. Conducting regular health checks and performance tuning to identify and address any bottlenecks or inefficiencies in the cluster. 237 | 6. Developing a disaster recovery plan and testing it periodically to ensure business continuity in case of unexpected events. 238 | 239 | By following these maintenance practices, you can ensure the stability, scalability, and security of your Kubernetes environment for optimal DevOps operations. 240 | 241 | >> Alright, so are we done? 242 | >> Yes, we have covered the analysis of the Kubernetes log entries and discussed best practices for Kubernetes maintenance. If you have any more questions or need further assistance, feel free to ask. Otherwise, we can consider this conversation complete. 243 | 244 | >> Wonderful. Thanks! end chat 245 | >> You're welcome! If you have any more questions in the future, feel free to reach out. Have a great day! Goodbye! 246 | 247 | >> end chat 248 | 249 | 250 | ``` 251 | 252 | 253 | -------------------------------------------------------------------------------- /eks.md: -------------------------------------------------------------------------------- 1 | # Install and Configure Krs with EKS (AWS) 2 | 3 | 4 | ## Prerequisites 5 | 6 | - AWS Account 7 | - AWSCLI installed on your system 8 | - Homebrew(if you're on Mac) 9 | 10 | 11 | ## Getting Started 12 | 13 | ### 1. Setup Amazon EKS Cluster 14 | 15 | ``` 16 | $ eksctl create cluster --name --version --region --nodegroup-name --node-type --nodes --zones= 17 | ``` 18 | 19 | 20 | 21 | ![EKS_Clusters](https://github.com/kubetoolsca/krs/assets/171302280/edd250c6-12d6-4380-b430-302b06c98a73) 22 | 23 | 24 | ### 2. Authenticate your AWS account 25 | 26 | 27 | ``` 28 | aws configure 29 | ``` 30 | 31 | 32 | 33 | ### 3. Extract the list of running clusters on AWS using this command: 34 | 35 | ``` 36 | $ aws eks list-clusters 37 | ``` 38 | 39 | ### 4. Create a config file that permits KRS access to the EKS cluster using this command:
40 | 41 | ``` 42 | aws eks update-kubeconfig --name 43 | ``` 44 | 45 | 46 | ### 5. Setup KRS using these commands:
47 | 48 | ``` 49 | $git clone https://github.com/kubetoolsca/krs.git 50 | $ cd krs 51 | $ pip install 52 | ``` 53 | 54 | ### 6. Initialize KRS to permit it access to your cluster using the given command,
55 | 56 | ``` 57 | krs init 58 | ``` 59 | 60 | ### 7. Get a view of all possible actions with KRS, by running the given command
61 | 62 | 63 | ``` 64 | krs --help 65 | 66 | Usage: krs [OPTIONS] COMMAND [ARGS]... 67 | 68 | krs: A command line interface to scan your Kubernetes Cluster, detect errors, 69 | provide resolutions using LLMs and recommend latest tools for your cluster 70 | 71 | ╭─ Options ────────────────────────────────────────────────────────────────────╮ 72 | │ --install-completion Install completion for the current shell. │ 73 | │ --show-completion Show completion for the current shell, to copy │ 74 | │ it or customize the installation. │ 75 | │ --help Show this message and exit. │ 76 | ╰──────────────────────────────────────────────────────────────────────────────╯ 77 | ╭─ Commands ───────────────────────────────────────────────────────────────────╮ 78 | │ exit Ends krs services safely and deletes all state files from │ 79 | │ system. Removes all cached data. │ 80 | │ export Exports pod info with logs and events. │ 81 | │ health Starts an interactive terminal using an LLM of your choice to │ 82 | │ detect and fix issues with your cluster │ 83 | │ init Initializes the services and loads the scanner. │ 84 | │ namespaces Lists all the namespaces. │ 85 | │ pods Lists all the pods with namespaces, or lists pods under a │ 86 | │ specified namespace. │ 87 | │ recommend Generates a table of recommended tools from our ranking │ 88 | │ database and their CNCF project status. │ 89 | │ scan Scans the cluster and extracts a list of tools that are │ 90 | │ currently used. │ 91 | ╰──────────────────────────────────────────────────────────────────────────────╯ 92 | ``` 93 | 94 | ### 8. Permit KRS to get information on the tools utilized in your cluster by running the given command
95 | 96 | ``` 97 | krs scan 98 | 99 | Scanning your cluster... 100 | 101 | Cluster scanned successfully... 102 | 103 | Extracted tools used in cluster... 104 | 105 | The cluster is using the following tools: 106 | 107 | +-------------+--------+-----------------------------+---------------+ 108 | | Tool Name | Rank | Category | CNCF Status | 109 | +=============+========+=============================+===============+ 110 | | autoscaler | 5 | Cluster with Core CLI tools | unlisted | 111 | +-------------+--------+-----------------------------+---------------+ 112 | | istio | 2 | Service Mesh | graduated | 113 | +-------------+--------+-----------------------------+---------------+ 114 | | kserve | 3 | Artificial Intelligence | listed | 115 | +-------------+--------+-----------------------------+---------------+ 116 | ``` 117 | 118 | #### 9. Get recommendations on possible tools to use in your cluster by running the given command
119 | 120 | ``` 121 | krs recommend 122 | ``` 123 | ``` 124 | +-----------------------------+------------------+-------------+---------------+ 125 | | Category | Recommendation | Tool Name | CNCF Status | 126 | +=============================+==================+=============+===============+ 127 | | Cluster with Core CLI tools | Recommended tool | k9s | unlisted | 128 | +-----------------------------+------------------+-------------+---------------+ 129 | | Service Mesh | Recommended tool | traefik | listed | 130 | +-----------------------------+------------------+-------------+---------------+ 131 | | Artificial Intelligence | Recommended tool | k8sgpt | sandbox | 132 | +-----------------------------+------------------+-------------+---------------+ 133 | ``` 134 | 135 | #### 10. Check the pod and namespace status in your Kubernetes cluster, including errors. 136 | 137 | ``` 138 | krs health 139 | ``` 140 | ``` 141 | Starting interactive terminal... 142 | 143 | Choose the model provider for healthcheck: 144 | 145 | [1] OpenAI 146 | [2] Huggingface 147 | 148 | >> 1 149 | 150 | Installing necessary libraries.......... 151 | 152 | openai is already installed. 153 | 154 | Enter your OpenAI API key: sk-proj-xxxxxxxxxx 155 | 156 | Enter the OpenAI model name: gpt-3.5-turbo 157 | API key and model are valid. 158 | 159 | Namespaces in the cluster: 160 | 161 | 1. cert-manager 162 | 2. default 163 | 3. istio-system 164 | 4. knative-serving 165 | 5. kserve 166 | 6. kserve-test 167 | 7. kube-node-lease 168 | 8. kube-public 169 | 9. Kube-system 170 | 171 | Which namespace do you want to check the health for? Select a namespace by entering its number: >> 9 172 | 173 | Pods in the namespace kube-system: 174 | 175 | 1. aws-node-46hzm 176 | 2. aws-node-wdgnn 177 | 3. coredns-586b798467-54t6h 178 | 4. coredns-586b798467-jmlrp 179 | 5. kube-proxy-hfmjl 180 | 6. kube-proxy-n8lc6 181 | 182 | Which pod from kube-system do you want to check the health for? Select a pod by entering its number: >> 1 183 | 184 | Checking status of the pod... 185 | 186 | Extracting logs and events from the pod... 187 | 188 | Logs and events from the pod extracted successfully! 189 | 190 | 191 | Interactive session started. Type 'end chat' to exit from the session! 192 | 193 | >> The provided log entries are empty, so there is nothing to analyze. Everything looks good! 194 | 195 | >> Wonderful, so what next 196 | >> If you have any specific questions or another set of log entries you would like me to analyze, feel free to provide them. I'm here to help with any DevOps or Kubernetes-related queries you may have. Just let me know how I can assist you further! 197 | ``` 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | -------------------------------------------------------------------------------- /gke.md: -------------------------------------------------------------------------------- 1 | ## Setting up Krs for Google Kubernetes Engine 2 | 3 | ## Prerequisite 4 | 5 | - A Google Cloud Account 6 | - Installing Google Cloud SDK on your macOS 7 | 8 | Execute the following command to install Google Cloud SDK in your system: 9 | 10 | ``` 11 | tar xfz google-cloud-sdk-195.0.0-darwin-x86_64.tar.gz 12 | ./google-cloud-sdk/install.sh 13 | ``` 14 | 15 | 16 | - Enable Google Cloud Engine API 17 | 18 | ![image](https://github.com/kubetoolsca/krs/assets/313480/6c441226-9e8e-4a91-ba8a-e0c595173faa) 19 | 20 | 21 | - Authenticate Your Google Cloud using gcloud auth 22 | 23 | 24 | ``` 25 | gcloud init 26 | ``` 27 | 28 | In your browser, log in to your Google user account when prompted and click Allow to grant permission to access Google Cloud Platform resources. 29 | 30 | 31 | ## Creating GKE Cluster 32 | 33 | ``` 34 | gcloud container clusters create k8s-lab1 --disk-size 10 --zone asia-east1-a --machine-type n1-standard-2 --num-nodes 3 --scopes compute-rw 35 | ``` 36 | 37 | ## Viewing it on Google Cloud Platform 38 | 39 | ![image](https://github.com/kubetoolsca/krs/assets/313480/733cfe3a-c951-4ea0-b7f5-4a28f7393c8e) 40 | 41 | 42 | ## Viewing the new context on Docker Desktop 43 | 44 | image 45 | 46 | ### Verifying the Google Kubernetes Cluster 47 | 48 | ``` 49 | kubectl get nodes 50 | NAME STATUS ROLES AGE VERSION 51 | gke-k8s-lab1-default-pool-5dfb7153-3fr7 Ready 3m1s v1.29.4-gke.1043002 52 | gke-k8s-lab1-default-pool-5dfb7153-nl3v Ready 3m1s v1.29.4-gke.1043002 53 | gke-k8s-lab1-default-pool-5dfb7153-rkg8 Ready 3m2s v1.29.4-gke.1043002 54 | ``` 55 | 56 | ## Initialize the KRS 57 | 58 | ``` 59 | krs init 60 | Services initialized and scanner loaded. 61 | ``` 62 | 63 | ## Running the scanner 64 | 65 | ``` 66 | krs scan 67 | 68 | Scanning your cluster... 69 | 70 | Cluster scanned successfully... 71 | 72 | Extracted tools used in cluster... 73 | 74 | 75 | The cluster is using the following tools: 76 | 77 | +-------------+--------+-----------------------------+---------------+ 78 | | Tool Name | Rank | Category | CNCF Status | 79 | +=============+========+=============================+===============+ 80 | | autoscaler | 5 | Cluster with Core CLI tools | unlisted | 81 | +-------------+--------+-----------------------------+---------------+ 82 | | fluentbit | 4 | Logging and Tracing | unlisted | 83 | +-------------+--------+-----------------------------+---------------+ 84 | ``` 85 | 86 | ## Checking the Krs Recommendation 87 | 88 | ``` 89 | krs recommend 90 | 91 | Our recommended tools for this deployment are: 92 | 93 | +-----------------------------+------------------+-------------+---------------+ 94 | | Category | Recommendation | Tool Name | CNCF Status | 95 | +=============================+==================+=============+===============+ 96 | | Cluster with Core CLI tools | Recommended tool | k9s | unlisted | 97 | +-----------------------------+------------------+-------------+---------------+ 98 | | Logging and Tracing | Recommended tool | elk | unlisted | 99 | ``` 100 | 101 | 102 | ## Installing Kubeview 103 | 104 | ``` 105 | git clone https://github.com/benc-uk/kubeview 106 | cd kubeview/charts/ 107 | helm install kubeview kubeview 108 | ``` 109 | 110 | ## Running the scanner again 111 | 112 | ``` 113 | krs scan 114 | 115 | Scanning your cluster... 116 | 117 | Cluster scanned successfully... 118 | 119 | Extracted tools used in cluster... 120 | 121 | 122 | The cluster is using the following tools: 123 | 124 | +-------------+--------+-----------------------------+---------------+ 125 | | Tool Name | Rank | Category | CNCF Status | 126 | +=============+========+=============================+===============+ 127 | | kubeview | 30 | Cluster with Core CLI tools | unlisted | 128 | +-------------+--------+-----------------------------+---------------+ 129 | | | 3 | Cluster Management | unlisted | 130 | +-------------+--------+-----------------------------+---------------+ 131 | | autoscaler | 5 | Cluster with Core CLI tools | unlisted | 132 | +-------------+--------+-----------------------------+---------------+ 133 | | fluentbit | 4 | Logging and Tracing | unlisted | 134 | +-------------+--------+-----------------------------+---------------+ 135 | ``` 136 | -------------------------------------------------------------------------------- /kind.md: -------------------------------------------------------------------------------- 1 | # Install and Configure Krs with Kind 2 | 3 | ## Prerequisites 4 | 5 | - Podman, Docker, or Virtual Box (container runtime) 6 | - Kubectl 7 | - go (version 1.16+) 8 | 9 | ## Getting Started 10 | 11 | ### 1. Setup a Kind Kubernetes Cluster on your Local Machine 12 | ``` 13 | go install sigs.k8s.io/kind@v0.23.0 && kind create cluster 14 | ``` 15 | ![Kind_Cluster](https://github.com/user-attachments/assets/f8eeace2-86ef-45f7-bbed-657b4f7cffa0) 16 | 17 | ### 2. Setup KRS using these commands: 18 | 19 | ``` 20 | git clone https://github.com/kubetoolsca/krs.git 21 | cd krs 22 | pip install . 23 | ``` 24 | 25 | ### 3. Initialize KRS to permit it access to your cluster using the given command, 26 | 27 | ``` 28 | krs init 29 | ``` 30 | 31 | ### 4. Get a view of all possible actions with KRS, by running the given command 32 | ``` 33 | krs --help 34 | ``` 35 | 36 | ``` 37 | krs --help 38 | 39 | Usage: krs [OPTIONS] COMMAND [ARGS]... 40 | 41 | krs: A command line interface to scan your Kubernetes Cluster, detect errors, 42 | provide resolutions using LLMs and recommend latest tools for your cluster 43 | 44 | ╭─ Options ────────────────────────────────────────────────────────────────────╮ 45 | │ --install-completion Install completion for the current shell. │ 46 | │ --show-completion Show completion for the current shell, to copy │ 47 | │ it or customize the installation. │ 48 | │ --help Show this message and exit. │ 49 | ╰──────────────────────────────────────────────────────────────────────────────╯ 50 | ╭─ Commands ───────────────────────────────────────────────────────────────────╮ 51 | │ exit Ends krs services safely and deletes all state files from │ 52 | │ system. Removes all cached data. │ 53 | │ export Exports pod info with logs and events. │ 54 | │ health Starts an interactive terminal using an LLM of your choice to │ 55 | │ detect and fix issues with your cluster │ 56 | │ init Initializes the services and loads the scanner. │ 57 | │ namespaces Lists all the namespaces. │ 58 | │ pods Lists all the pods with namespaces, or lists pods under a │ 59 | │ specified namespace. │ 60 | │ recommend Generates a table of recommended tools from our ranking │ 61 | │ database and their CNCF project status. │ 62 | │ scan Scans the cluster and extracts a list of tools that are │ 63 | │ currently used. │ 64 | ╰──────────────────────────────────────────────────────────────────────────────╯ 65 | ``` 66 | ### 5. Permit KRS to get information on the tools utilized in your cluster by running the given command 67 | 68 | ``` 69 | krs scan 70 | ``` 71 | 72 | ``` 73 | krs scan 74 | 75 | Scanning your cluster... 76 | 77 | Cluster scanned successfully... 78 | 79 | Extracted tools used in cluster... 80 | 81 | 82 | The cluster is using the following tools: 83 | 84 | +-------------+--------+-----------------------------------+---------------+ 85 | | Tool Name | Rank | Category | CNCF Status | 86 | +=============+========+===================================+===============+ 87 | | kind | 3 | Alternative Tools for Development | listed | 88 | +-------------+--------+-----------------------------------+---------------+ 89 | | | 4 | Cluster Management | listed | 90 | +-------------+--------+-----------------------------------+---------------+ 91 | 92 | ``` 93 | 94 | ### 6. Get recommendations on possible tools to use in your cluster by running the given command 95 | 96 | ``` 97 | krs recommend 98 | ``` 99 | 100 | ``` 101 | krs recommend 102 | 103 | Our recommended tools for this deployment are: 104 | 105 | +-----------------------------------+------------------+-------------+---------------+ 106 | | Category | Recommendation | Tool Name | CNCF Status | 107 | +===================================+==================+=============+===============+ 108 | | Alternative Tools for Development | Recommended tool | minikube | listed | 109 | +-----------------------------------+------------------+-------------+---------------+ 110 | | Cluster Management | Recommended tool | rancher | unlisted | 111 | +-----------------------------------+------------------+-------------+---------------+ 112 | 113 | 114 | ``` 115 | 116 | ### 7. Check the pod and namespace status in your Kubernetes cluster, including errors. 117 | 118 | ``` 119 | krs health 120 | ``` 121 | 122 | ``` 123 | krs health 124 | 125 | Starting interactive terminal... 126 | 127 | 128 | Choose the model provider for healthcheck: 129 | 130 | [1] OpenAI 131 | [2] Huggingface 132 | 133 | >> 1 134 | 135 | Installing necessary libraries.......... 136 | 137 | openai is already installed. 138 | 139 | Enter your OpenAI API key: sk-proj-qxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxP 140 | 141 | Enter the OpenAI model name: gpt-3.5-turbo 142 | API key and model are valid. 143 | 144 | Namespaces in the cluster: 145 | 146 | 1. default 147 | 2. kube-node-lease 148 | 3. kube-public 149 | 4. kube-system 150 | 5. local-path-storage 151 | 152 | Which namespace do you want to check the health for? Select a namespace by entering its number: >> 4 153 | 154 | Pods in the namespace kube-system: 155 | 156 | 1. cilium-9lqbq 157 | 2. cilium-ffpct 158 | 3. cilium-pvknr 159 | 4. coredns-85f59d8784-nvr2n 160 | 5. coredns-85f59d8784-p9jcv 161 | 6. cpc-bridge-proxy-c6xzr 162 | 7. cpc-bridge-proxy-p7r4p 163 | 8. cpc-bridge-proxy-tkfrd 164 | 9. csi-do-node-hwxn7 165 | 10. csi-do-node-q27rc 166 | 11. csi-do-node-rn7dm 167 | 12. do-node-agent-6t5ms 168 | 13. do-node-agent-85r8b 169 | 14. do-node-agent-m7bvr 170 | 15. hubble-relay-74686df4df-856pj 171 | 16. hubble-ui-86cc69bddc-xc745 172 | 17. konnectivity-agent-9k8vk 173 | 18. konnectivity-agent-h5fm2 174 | 19. konnectivity-agent-kf4xh 175 | 20. kube-proxy-94945 176 | 21. kube-proxy-qgv4j 177 | 22. kube-proxy-vztzf 178 | 179 | Which pod from kube-system do you want to check the health for? Select a pod by entering its number: >> 1 180 | 181 | Checking status of the pod... 182 | 183 | Extracting logs and events from the pod... 184 | 185 | Logs and events from the pod extracted successfully! 186 | 187 | 188 | Interactive session started. Type 'end chat' to exit from the session! 189 | 190 | >> The log entries provided are empty {}, so there is nothing to analyze. Therefore, I can confirm that 'Everything looks good!' in this case. 191 | 192 | If there were warnings or errors in the log entries, I would have analyzed them thoroughly to identify the root cause. Depending on the specific warnings or errors, potential steps to resolve the issues could include: 193 | 194 | 1. Analyzing the specific error message to understand the problem 195 | 2. Checking Kubernetes resources (e.g., pods, deployments, configmaps) for any misconfigurations 196 | 3. Verifying connectivity to external resources or dependencies 197 | 4. Checking for resource limitations or constraints that could be causing issues 198 | 5. Reviewing recent changes in the Kubernetes environment that could have introduced problems 199 | 6. Using Kubernetes troubleshooting tools like kubectl logs, describe, or events to gather more information 200 | 201 | By following these steps and addressing any identified issues, you can resolve warnings or errors in the Kubernetes environment. 202 | 203 | >> Wonderful, anything else to note? 204 | >> In addition to resolving warnings or errors in Kubernetes logs, it's important to regularly monitor and maintain the Kubernetes environment to ensure smooth operation. Some best practices for Kubernetes maintenance include: 205 | 206 | 1. Regularly updating Kubernetes components and cluster nodes to the latest stable versions to benefit from bug fixes and security patches. 207 | 2. Implementing automated backups of critical data and configurations to prevent data loss in case of failures. 208 | 3. Monitoring resource utilization and scaling components as needed to optimize performance and cost efficiency. 209 | 4. Implementing security best practices, such as network policies, RBAC, and pod security policies to protect the cluster from unauthorized access or malicious activities. 210 | 5. Conducting regular health checks and performance tuning to identify and address any bottlenecks or inefficiencies in the cluster. 211 | 6. Developing a disaster recovery plan and testing it periodically to ensure business continuity in case of unexpected events. 212 | 213 | By following these maintenance practices, you can ensure the stability, scalability, and security of your Kubernetes environment for optimal DevOps operations. 214 | 215 | >> Alright, so are we done? 216 | >> Yes, we have covered the analysis of the Kubernetes log entries and discussed best practices for Kubernetes maintenance. If you have any more questions or need further assistance, feel free to ask. Otherwise, we can consider this conversation complete. 217 | 218 | >> Wonderful. Thanks! end chat 219 | >> You're welcome! If you have any more questions in the future, feel free to reach out. Have a great day! Goodbye! 220 | 221 | >> end chat 222 | 223 | 224 | ``` 225 | -------------------------------------------------------------------------------- /krs.egg-info/PKG-INFO: -------------------------------------------------------------------------------- 1 | Metadata-Version: 2.1 2 | Name: krs 3 | Version: 0.1.0 4 | Summary: Kubernetes Recommendation Service with LLM integration 5 | Home-page: https://github.com/KrsGPTs/krs 6 | Author: Abhijeet Mazumdar , Karan Singh & Ajeet Singh Raina 7 | Author-email: abhijeet@kubetools.ca, karan@kubetools.ca, ajeet@kubetools.ca 8 | Classifier: Programming Language :: Python :: 3 9 | Classifier: License :: OSI Approved :: MIT License 10 | Classifier: Operating System :: OS Independent 11 | Requires-Python: >=3.6 12 | License-File: LICENSE 13 | Requires-Dist: typer==0.12.3 14 | Requires-Dist: requests==2.32.2 15 | Requires-Dist: kubernetes==29.0.0 16 | Requires-Dist: tabulate==0.9.0 17 | -------------------------------------------------------------------------------- /krs.egg-info/SOURCES.txt: -------------------------------------------------------------------------------- 1 | LICENSE 2 | README.md 3 | setup.py 4 | krs/__init__.py 5 | krs/krs.py 6 | krs/main.py 7 | krs.egg-info/PKG-INFO 8 | krs.egg-info/SOURCES.txt 9 | krs.egg-info/dependency_links.txt 10 | krs.egg-info/entry_points.txt 11 | krs.egg-info/requires.txt 12 | krs.egg-info/top_level.txt 13 | krs/utils/__init__.py 14 | krs/utils/cluster_scanner.py 15 | krs/utils/constants.py 16 | krs/utils/fetch_tools_krs.py 17 | krs/utils/functional.py 18 | krs/utils/llm_client.py -------------------------------------------------------------------------------- /krs.egg-info/dependency_links.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /krs.egg-info/entry_points.txt: -------------------------------------------------------------------------------- 1 | [console_scripts] 2 | krs = krs.krs:app 3 | -------------------------------------------------------------------------------- /krs.egg-info/requires.txt: -------------------------------------------------------------------------------- 1 | typer==0.12.3 2 | requests==2.32.2 3 | kubernetes==29.0.0 4 | tabulate==0.9.0 5 | -------------------------------------------------------------------------------- /krs.egg-info/top_level.txt: -------------------------------------------------------------------------------- 1 | krs 2 | -------------------------------------------------------------------------------- /krs/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kubetoolsca/krs/f2af89aee7c3317d67dc51eb016168a948bc70d3/krs/__init__.py -------------------------------------------------------------------------------- /krs/krs.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import typer, os 4 | from krs.main import KrsMain 5 | from krs.utils.constants import KRSSTATE_PICKLE_FILEPATH, KRS_DATA_DIRECTORY 6 | 7 | app = typer.Typer(help="krs: A command line interface to scan your Kubernetes Cluster, detect errors, provide resolutions using LLMs and recommend latest tools for your cluster") 8 | krs = KrsMain() 9 | 10 | def check_initialized(): 11 | if not os.path.exists(KRSSTATE_PICKLE_FILEPATH): 12 | typer.echo("KRS is not initialized. Please run 'krs init' first.") 13 | raise typer.Exit() 14 | 15 | if not os.path.exists(KRS_DATA_DIRECTORY): 16 | os.mkdir(KRS_DATA_DIRECTORY) 17 | 18 | @app.command() 19 | def init(kubeconfig: str = typer.Option('~/.kube/config', help="Custom path for kubeconfig file if not default")): 20 | """ 21 | Initializes the services and loads the scanner. 22 | """ 23 | krs.initialize(kubeconfig) 24 | typer.echo("Services initialized and scanner loaded.") 25 | 26 | @app.command() 27 | def scan(): 28 | """ 29 | Scans the cluster and extracts a list of tools that are currently used. 30 | """ 31 | check_initialized() 32 | krs.scan_cluster() 33 | 34 | 35 | @app.command() 36 | def namespaces(): 37 | """ 38 | Lists all the namespaces. 39 | """ 40 | check_initialized() 41 | namespaces = krs.list_namespaces() 42 | typer.echo("Namespaces in your cluster are: \n") 43 | for i, namespace in enumerate(namespaces): 44 | typer.echo(str(i+1)+ ". "+ namespace) 45 | 46 | @app.command() 47 | def pods(namespace: str = typer.Option(None, help="Specify namespace to list pods from")): 48 | """ 49 | Lists all the pods with namespaces, or lists pods under a specified namespace. 50 | """ 51 | check_initialized() 52 | if namespace: 53 | pods = krs.list_pods(namespace) 54 | if pods == 'wrong namespace name': 55 | typer.echo(f"\nWrong namespace name entered, try again!\n") 56 | raise typer.Abort() 57 | typer.echo(f"\nPods in namespace '{namespace}': \n") 58 | else: 59 | pods = krs.list_pods_all() 60 | typer.echo("\nAll pods in the cluster: \n") 61 | 62 | for i, pod in enumerate(pods): 63 | typer.echo(str(i+1)+ '. '+ pod) 64 | 65 | @app.command() 66 | def recommend(): 67 | """ 68 | Generates a table of recommended tools from our ranking database and their CNCF project status. 69 | """ 70 | check_initialized() 71 | krs.generate_recommendations() 72 | 73 | @app.command() 74 | def health(change_model: bool = typer.Option(False, help="Option to reinitialize/change the LLM, if set to True"), 75 | device: str = typer.Option('cpu', help='Option to run Huggingface models on GPU by entering the option as "gpu"')): 76 | """ 77 | Starts an interactive terminal using an LLM of your choice to detect and fix issues with your cluster 78 | """ 79 | check_initialized() 80 | typer.echo("\nStarting interactive terminal...\n") 81 | krs.health_check(change_model, device) 82 | 83 | @app.command() 84 | def export(): 85 | """ 86 | Exports pod info with logs and events. 87 | """ 88 | check_initialized() 89 | krs.export_pod_info() 90 | typer.echo("Pod info with logs and events exported. Json file saved to current directory!") 91 | 92 | @app.command() 93 | def exit(): 94 | """ 95 | Ends krs services safely and deletes all state files from system. Removes all cached data. 96 | """ 97 | check_initialized() 98 | krs.exit() 99 | typer.echo("Krs services closed safely.") 100 | 101 | if __name__ == "__main__": 102 | app() 103 | -------------------------------------------------------------------------------- /krs/main.py: -------------------------------------------------------------------------------- 1 | from krs.utils.fetch_tools_krs import krs_tool_ranking_info 2 | from krs.utils.cluster_scanner import KubetoolsScanner 3 | from krs.utils.llm_client import KrsGPTClient 4 | from krs.utils.functional import extract_log_entries, CustomJSONEncoder 5 | import os, pickle, time, json 6 | from tabulate import tabulate 7 | from krs.utils.constants import (KRSSTATE_PICKLE_FILEPATH, LLMSTATE_PICKLE_FILEPATH, POD_INFO_FILEPATH, KRS_DATA_DIRECTORY) 8 | 9 | class KrsMain: 10 | 11 | def __init__(self): 12 | 13 | self.pod_info = None 14 | self.pod_list = None 15 | self.namespaces = None 16 | self.deployments = None 17 | self.state_file = KRSSTATE_PICKLE_FILEPATH 18 | self.isClusterScanned = False 19 | self.continue_chat = False 20 | self.logs_extracted = [] 21 | self.scanner = None 22 | self.get_events = True 23 | self.get_logs = True 24 | self.cluster_tool_list = None 25 | self.detailed_cluster_tool_list = None 26 | self.category_cluster_tools_dict = None 27 | 28 | self.load_state() 29 | 30 | def initialize(self, config_file='~/.kube/config'): 31 | self.config_file = config_file 32 | self.tools_dict, self.category_dict, cncf_status_dict = krs_tool_ranking_info() 33 | self.cncf_status = cncf_status_dict['cncftools'] 34 | self.scanner = KubetoolsScanner(self.get_events, self.get_logs, self.config_file) 35 | self.save_state() 36 | 37 | def save_state(self): 38 | state = { 39 | 'pod_info': self.pod_info, 40 | 'pod_list': self.pod_list, 41 | 'namespaces': self.namespaces, 42 | 'deployments': self.deployments, 43 | 'cncf_status': self.cncf_status, 44 | 'tools_dict': self.tools_dict, 45 | 'category_tools_dict': self.category_dict, 46 | 'extracted_logs': self.logs_extracted, 47 | 'kubeconfig': self.config_file, 48 | 'isScanned': self.isClusterScanned, 49 | 'cluster_tool_list': self.cluster_tool_list, 50 | 'detailed_tool_list': self.detailed_cluster_tool_list, 51 | 'category_tool_list': self.category_cluster_tools_dict 52 | } 53 | os.makedirs(os.path.dirname(self.state_file), exist_ok=True) 54 | with open(self.state_file, 'wb') as f: 55 | pickle.dump(state, f) 56 | 57 | def load_state(self): 58 | if os.path.exists(self.state_file): 59 | with open(self.state_file, 'rb') as f: 60 | state = pickle.load(f) 61 | self.pod_info = state.get('pod_info') 62 | self.pod_list = state.get('pod_list') 63 | self.namespaces = state.get('namespaces') 64 | self.deployments = state.get('deployments') 65 | self.cncf_status = state.get('cncf_status') 66 | self.tools_dict = state.get('tools_dict') 67 | self.category_dict = state.get('category_tools_dict') 68 | self.logs_extracted = state.get('extracted_logs') 69 | self.config_file = state.get('kubeconfig') 70 | self.isClusterScanned = state.get('isScanned') 71 | self.cluster_tool_list = state.get('cluster_tool_list') 72 | self.detailed_cluster_tool_list = state.get('detailed_tool_list') 73 | self.category_cluster_tools_dict = state.get('category_tool_list') 74 | self.scanner = KubetoolsScanner(self.get_events, self.get_logs, self.config_file) 75 | 76 | def check_scanned(self): 77 | if not self.isClusterScanned: 78 | self.pod_list, self.pod_info, self.deployments, self.namespaces = self.scanner.scan_kubernetes_deployment() 79 | self.save_state() 80 | 81 | def list_namespaces(self): 82 | self.check_scanned() 83 | return self.scanner.list_namespaces() 84 | 85 | def list_pods(self, namespace): 86 | self.check_scanned() 87 | if namespace not in self.list_namespaces(): 88 | return "wrong namespace name" 89 | return self.scanner.list_pods(namespace) 90 | 91 | def list_pods_all(self): 92 | self.check_scanned() 93 | return self.scanner.list_pods_all() 94 | 95 | def detect_tools_from_repo(self): 96 | tool_set = set() 97 | for pod in self.pod_list: 98 | for service_name in pod.split('-'): 99 | if service_name in self.tools_dict.keys(): 100 | tool_set.add(service_name) 101 | 102 | for dep in self.deployments: 103 | for service_name in dep.split('-'): 104 | if service_name in self.tools_dict.keys(): 105 | tool_set.add(service_name) 106 | 107 | return list(tool_set) 108 | 109 | def extract_rankings(self): 110 | tool_dict = {} 111 | category_tools_dict = {} 112 | for tool in self.cluster_tool_list: 113 | tool_details = self.tools_dict[tool] 114 | for detail in tool_details: 115 | rank = detail['rank'] 116 | category = detail['category'] 117 | if category not in category_tools_dict: 118 | category_tools_dict[category] = [] 119 | category_tools_dict[category].append(rank) 120 | 121 | tool_dict[tool] = tool_details 122 | 123 | return tool_dict, category_tools_dict 124 | 125 | def generate_recommendations(self): 126 | 127 | if not self.isClusterScanned: 128 | self.scan_cluster() 129 | 130 | self.print_recommendations() 131 | 132 | def scan_cluster(self): 133 | 134 | print("\nScanning your cluster...\n") 135 | self.pod_list, self.pod_info, self.deployments, self.namespaces = self.scanner.scan_kubernetes_deployment() 136 | self.isClusterScanned = True 137 | print("Cluster scanned successfully...\n") 138 | self.cluster_tool_list = self.detect_tools_from_repo() 139 | print("Extracted tools used in cluster...\n") 140 | self.detailed_cluster_tool_list, self.category_cluster_tools_dict = self.extract_rankings() 141 | 142 | self.print_scan_results() 143 | self.save_state() 144 | 145 | def print_scan_results(self): 146 | scan_results = [] 147 | 148 | for tool, details in self.detailed_cluster_tool_list.items(): 149 | first_entry = True 150 | for detail in details: 151 | row = [tool if first_entry else "", detail['rank'], detail['category'], self.cncf_status.get(tool, 'unlisted')] 152 | scan_results.append(row) 153 | first_entry = False 154 | 155 | print("\nThe cluster is using the following tools:\n") 156 | print(tabulate(scan_results, headers=["Tool Name", "Rank", "Category", "CNCF Status"], tablefmt="grid")) 157 | 158 | def print_recommendations(self): 159 | recommendations = [] 160 | 161 | for category, ranks in self.category_cluster_tools_dict.items(): 162 | rank = ranks[0] 163 | recommended_tool = self.category_dict[category][1]['name'] 164 | status = self.cncf_status.get(recommended_tool, 'unlisted') 165 | if rank == 1: 166 | row = [category, "Already using the best", recommended_tool, status] 167 | else: 168 | row = [category, "Recommended tool", recommended_tool, status] 169 | recommendations.append(row) 170 | 171 | print("\nOur recommended tools for this deployment are:\n") 172 | print(tabulate(recommendations, headers=["Category", "Recommendation", "Tool Name", "CNCF Status"], tablefmt="grid")) 173 | 174 | 175 | def health_check(self, change_model=False, device='cpu'): 176 | 177 | if os.path.exists(LLMSTATE_PICKLE_FILEPATH) and not change_model: 178 | continue_previous_chat = input("\nDo you want to continue fixing the previously selected pod ? (y/n): >> ") 179 | while True: 180 | if continue_previous_chat not in ['y', 'n']: 181 | continue_previous_chat = input("\nPlease enter one of the given options ? (y/n): >> ") 182 | else: 183 | break 184 | 185 | if continue_previous_chat=='y': 186 | krsllmclient = KrsGPTClient(device=device) 187 | self.continue_chat = True 188 | else: 189 | krsllmclient = KrsGPTClient(reset_history=True, device=device) 190 | 191 | else: 192 | krsllmclient = KrsGPTClient(reinitialize=True, device=device) 193 | self.continue_chat = False 194 | 195 | if not self.continue_chat: 196 | 197 | self.check_scanned() 198 | 199 | print("\nNamespaces in the cluster:\n") 200 | namespaces = self.list_namespaces() 201 | namespace_len = len(namespaces) 202 | for i, namespace in enumerate(namespaces, start=1): 203 | print(f"{i}. {namespace}") 204 | 205 | self.selected_namespace_index = int(input("\nWhich namespace do you want to check the health for? Select a namespace by entering its number: >> ")) 206 | while True: 207 | if self.selected_namespace_index not in list(range(1, namespace_len+1)): 208 | self.selected_namespace_index = int(input(f"\nWrong input! Select a namespace number between {1} to {namespace_len}: >> ")) 209 | else: 210 | break 211 | 212 | self.selected_namespace = namespaces[self.selected_namespace_index - 1] 213 | pod_list = self.list_pods(self.selected_namespace) 214 | pod_len = len(pod_list) 215 | print(f"\nPods in the namespace {self.selected_namespace}:\n") 216 | for i, pod in enumerate(pod_list, start=1): 217 | print(f"{i}. {pod}") 218 | self.selected_pod_index = int(input(f"\nWhich pod from {self.selected_namespace} do you want to check the health for? Select a pod by entering its number: >> ")) 219 | 220 | while True: 221 | if self.selected_pod_index not in list(range(1, pod_len+1)): 222 | self.selected_pod_index = int(input(f"\nWrong input! Select a pod number between {1} to {pod_len}: >> ")) 223 | else: 224 | break 225 | 226 | print("\nChecking status of the pod...") 227 | 228 | print("\nExtracting logs and events from the pod...") 229 | 230 | logs_from_pod = self.get_logs_from_pod(self.selected_namespace_index, self.selected_pod_index) 231 | 232 | self.logs_extracted = extract_log_entries(logs_from_pod) 233 | 234 | print("\nLogs and events from the pod extracted successfully!\n") 235 | 236 | prompt_to_llm = self.create_prompt(self.logs_extracted) 237 | 238 | krsllmclient.interactive_session(prompt_to_llm) 239 | 240 | self.save_state() 241 | 242 | def get_logs_from_pod(self, namespace_index, pod_index): 243 | try: 244 | namespace_index -= 1 245 | pod_index -= 1 246 | namespace = list(self.list_namespaces())[namespace_index] 247 | return list(self.pod_info[namespace][pod_index]['info']['Logs'].values())[0] 248 | except KeyError as e: 249 | print("\nKindly enter a value from the available namespaces and pods") 250 | return None 251 | 252 | def create_prompt(self, log_entries): 253 | prompt = "You are a DevOps expert with experience in Kubernetes. Analyze the following log entries:\n{\n" 254 | for entry in sorted(log_entries): # Sort to maintain consistent order 255 | prompt += f"{entry}\n" 256 | prompt += "}\nIf there is nothing of concern in between { }, return a message stating that 'Everything looks good!'. Explain the warnings and errors and the steps that should be taken to resolve the issues, only if they exist." 257 | return prompt 258 | 259 | def export_pod_info(self): 260 | 261 | self.check_scanned() 262 | 263 | with open(POD_INFO_FILEPATH, 'w') as f: 264 | json.dump(self.pod_info, f, cls=CustomJSONEncoder) 265 | 266 | 267 | def exit(self): 268 | 269 | try: 270 | # List all files and directories in the given directory 271 | files = os.listdir(KRS_DATA_DIRECTORY) 272 | for file in files: 273 | file_path = os.path.join(KRS_DATA_DIRECTORY, file) 274 | # Check if it's a file and not a directory 275 | if os.path.isfile(file_path): 276 | os.remove(file_path) # Delete the file 277 | print(f"Deleted file: {file_path}") 278 | 279 | except Exception as e: 280 | print(f"Error occurred: {e}") 281 | 282 | def main(self): 283 | self.scan_cluster() 284 | self.generate_recommendations() 285 | self.health_check() 286 | 287 | 288 | if __name__=='__main__': 289 | recommender = KrsMain() 290 | recommender.main() 291 | # logs_info = recommender.get_logs_from_pod(4,2) 292 | # print(logs_info) 293 | # logs = recommender.extract_log_entries(logs_info) 294 | # print(logs) 295 | # print(recommender.create_prompt(logs)) 296 | 297 | -------------------------------------------------------------------------------- /krs/requirements.txt: -------------------------------------------------------------------------------- 1 | typer==0.12.3 2 | requests==2.32.2 3 | kubernetes==29.0.0 4 | tabulate==0.9.0 5 | 6 | -------------------------------------------------------------------------------- /krs/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kubetoolsca/krs/f2af89aee7c3317d67dc51eb016168a948bc70d3/krs/utils/__init__.py -------------------------------------------------------------------------------- /krs/utils/cluster_scanner.py: -------------------------------------------------------------------------------- 1 | from kubernetes import client, config 2 | import logging 3 | 4 | class KubetoolsScanner: 5 | def __init__(self, get_events=True, get_logs=True, config_file='~/.kube/config'): 6 | self.get_events = get_events 7 | self.get_logs = get_logs 8 | self.config_file = config_file 9 | self.v1 = None 10 | self.v2 = None 11 | self.setup_kubernetes_client() 12 | 13 | def setup_kubernetes_client(self): 14 | try: 15 | config.load_kube_config(config_file=self.config_file) 16 | self.v1 = client.AppsV1Api() 17 | self.v2 = client.CoreV1Api() 18 | except Exception as e: 19 | logging.error("Failed to load Kubernetes configuration: %s", e) 20 | raise 21 | 22 | def scan_kubernetes_deployment(self): 23 | try: 24 | deployments = self.v1.list_deployment_for_all_namespaces() 25 | namespaces = self.list_namespaces() 26 | except Exception as e: 27 | logging.error("Error fetching data from Kubernetes API: %s", e) 28 | return {}, {}, [] 29 | 30 | pod_dict = {} 31 | pod_list = [] 32 | for name in namespaces: 33 | pods = self.list_pods(name) 34 | pod_list += pods 35 | pod_dict[name] = [{'name': pod, 'info': self.get_pod_info(name, pod)} for pod in pods] 36 | 37 | deployment_list = [dep.metadata.name for dep in deployments.items] 38 | return pod_list, pod_dict, deployment_list, namespaces 39 | 40 | def list_namespaces(self): 41 | namespaces = self.v2.list_namespace() 42 | return [namespace.metadata.name for namespace in namespaces.items] 43 | 44 | def list_pods_all(self): 45 | pods = self.v2.list_pod_for_all_namespaces() 46 | return [pod.metadata.name for pod in pods.items] 47 | 48 | def list_pods(self, namespace): 49 | pods = self.v2.list_namespaced_pod(namespace) 50 | return [pod.metadata.name for pod in pods.items] 51 | 52 | def get_pod_info(self, namespace, pod, include_events=True, include_logs=True): 53 | """ 54 | Retrieves information about a specific pod in a given namespace. 55 | 56 | Args: 57 | namespace (str): The namespace of the pod. 58 | pod (str): The name of the pod. 59 | include_events (bool): Flag indicating whether to include events associated with the pod. 60 | include_logs (bool): Flag indicating whether to include logs of the pod. 61 | 62 | Returns: 63 | dict: A dictionary containing the pod information, events (if include_events is True), and logs (if include_logs is True). 64 | """ 65 | pod_info = self.v2.read_namespaced_pod(pod, namespace) 66 | pod_info_map = pod_info.to_dict() 67 | pod_info_map["metadata"]["managed_fields"] = None # Clean up metadata 68 | 69 | info = {'PodInfo': pod_info_map} 70 | 71 | if include_events: 72 | info['Events'] = self.fetch_pod_events(namespace, pod) 73 | 74 | if include_logs: 75 | # Retrieve logs for all containers within the pod 76 | container_logs = {} 77 | for container in pod_info.spec.containers: 78 | try: 79 | logs = self.v2.read_namespaced_pod_log(name=pod, namespace=namespace, container=container.name) 80 | container_logs[container.name] = logs 81 | except Exception as e: 82 | logging.error("Failed to fetch logs for container %s in pod %s: %s", container.name, pod, e) 83 | container_logs[container.name] = "Error fetching logs: " + str(e) 84 | info['Logs'] = container_logs 85 | 86 | return info 87 | 88 | def fetch_pod_events(self, namespace, pod): 89 | events = self.v2.list_namespaced_event(namespace) 90 | return [{ 91 | 'Name': event.metadata.name, 92 | 'Message': event.message, 93 | 'Reason': event.reason 94 | } for event in events.items if event.involved_object.name == pod] 95 | 96 | 97 | if __name__ == '__main__': 98 | 99 | scanner = KubetoolsScanner() 100 | pod_list, pod_info, deployments, namespaces = scanner.scan_kubernetes_deployment() 101 | print("POD List: \n\n", pod_list) 102 | print("\n\nPOD Info: \n\n", pod_info.keys()) 103 | print("\n\nNamespaces: \n\n", namespaces) 104 | print("\n\nDeployments : \n\n", deployments) 105 | 106 | -------------------------------------------------------------------------------- /krs/utils/constants.py: -------------------------------------------------------------------------------- 1 | KUBETOOLS_JSONPATH = 'krs/data/kubetools_data.json' 2 | KUBETOOLS_DATA_JSONURL = 'https://raw.githubusercontent.com/Kubetools-Technologies-Inc/kubetools_data/main/data/kubetools_data.json' 3 | 4 | CNCF_YMLPATH = 'krs/data/landscape.yml' 5 | CNCF_YMLURL = 'https://raw.githubusercontent.com/cncf/landscape/master/landscape.yml' 6 | CNCF_TOOLS_JSONPATH = 'krs/data/cncf_tools.json' 7 | 8 | TOOLS_RANK_JSONPATH = 'krs/data/tools_rank.json' 9 | CATEGORY_RANK_JSONPATH = 'krs/data/category_rank.json' 10 | 11 | LLMSTATE_PICKLE_FILEPATH = 'krs/data/llmstate.pkl' 12 | KRSSTATE_PICKLE_FILEPATH = 'krs/data/krsstate.pkl' 13 | 14 | POD_INFO_FILEPATH = './exported_pod_info.json' 15 | 16 | MAX_OUTPUT_TOKENS = 512 17 | 18 | KRS_DATA_DIRECTORY = 'krs/data' 19 | -------------------------------------------------------------------------------- /krs/utils/fetch_tools_krs.py: -------------------------------------------------------------------------------- 1 | import json 2 | import requests 3 | import yaml 4 | from krs.utils.constants import (KUBETOOLS_DATA_JSONURL, KUBETOOLS_JSONPATH, CNCF_YMLPATH, CNCF_YMLURL, CNCF_TOOLS_JSONPATH, TOOLS_RANK_JSONPATH, CATEGORY_RANK_JSONPATH) 5 | 6 | # Function to convert 'githubStars' to a float, or return 0 if it cannot be converted 7 | def get_github_stars(tool): 8 | stars = tool.get('githubStars', 0) 9 | try: 10 | return float(stars) 11 | except ValueError: 12 | return 0.0 13 | 14 | # Function to download and save a file 15 | def download_file(url, filename): 16 | response = requests.get(url) 17 | response.raise_for_status() # Ensure we notice bad responses 18 | with open(filename, 'wb') as file: 19 | file.write(response.content) 20 | 21 | def parse_yaml_to_dict(yaml_file_path): 22 | with open(yaml_file_path, 'r') as file: 23 | data = yaml.safe_load(file) 24 | 25 | cncftools = {} 26 | 27 | for category in data.get('landscape', []): 28 | for subcategory in category.get('subcategories', []): 29 | for item in subcategory.get('items', []): 30 | item_name = item.get('name').lower() 31 | project_status = item.get('project', 'listed') 32 | cncftools[item_name] = project_status 33 | 34 | return {'cncftools': cncftools} 35 | 36 | def save_json_file(jsondict, jsonpath): 37 | 38 | # Write the category dictionary to a new JSON file 39 | with open(jsonpath, 'w') as f: 40 | json.dump(jsondict, f, indent=4) 41 | 42 | 43 | def krs_tool_ranking_info(): 44 | # New dictionaries 45 | tools_dict = {} 46 | category_tools_dict = {} 47 | 48 | download_file(KUBETOOLS_DATA_JSONURL, KUBETOOLS_JSONPATH) 49 | download_file(CNCF_YMLURL, CNCF_YMLPATH) 50 | 51 | with open(KUBETOOLS_JSONPATH) as f: 52 | data = json.load(f) 53 | 54 | for category in data: 55 | # Sort the tools in the current category by the number of GitHub stars 56 | sorted_tools = sorted(category['tools'], key=get_github_stars, reverse=True) 57 | 58 | for i, tool in enumerate(sorted_tools, start=1): 59 | tool["name"] = tool['name'].replace("\t", "").lower() 60 | tool['ranking'] = i 61 | 62 | # Update tools_dict 63 | tools_dict.setdefault(tool['name'], []).append({ 64 | 'rank': i, 65 | 'category': category['category']['name'], 66 | 'url': tool['link'] 67 | }) 68 | 69 | # Update ranked_tools_dict 70 | category_tools_dict.setdefault(category['category']['name'], {}).update({i: {'name': tool['name'], 'url': tool['link']}}) 71 | 72 | 73 | cncf_tools_dict = parse_yaml_to_dict(CNCF_YMLPATH) 74 | save_json_file(cncf_tools_dict, CNCF_TOOLS_JSONPATH) 75 | save_json_file(tools_dict, TOOLS_RANK_JSONPATH) 76 | save_json_file(category_tools_dict, CATEGORY_RANK_JSONPATH) 77 | 78 | return tools_dict, category_tools_dict, cncf_tools_dict 79 | 80 | if __name__=='__main__': 81 | tools_dict, category_tools_dict, cncf_tools_dict = krs_tool_ranking_info() 82 | print(cncf_tools_dict) 83 | 84 | -------------------------------------------------------------------------------- /krs/utils/functional.py: -------------------------------------------------------------------------------- 1 | from difflib import SequenceMatcher 2 | import re, json 3 | from datetime import datetime 4 | 5 | class CustomJSONEncoder(json.JSONEncoder): 6 | """JSON Encoder for complex objects not serializable by default json code.""" 7 | def default(self, obj): 8 | if isinstance(obj, datetime): 9 | # Format datetime object as a string in ISO 8601 format 10 | return obj.isoformat() 11 | # Let the base class default method raise the TypeError 12 | return json.JSONEncoder.default(self, obj) 13 | 14 | def similarity(a, b): 15 | return SequenceMatcher(None, a, b).ratio() 16 | 17 | def filter_similar_entries(log_entries): 18 | unique_entries = list(log_entries) 19 | to_remove = set() 20 | 21 | # Compare each pair of log entries 22 | for i in range(len(unique_entries)): 23 | for j in range(i + 1, len(unique_entries)): 24 | if similarity(unique_entries[i], unique_entries[j]) > 0.85: 25 | # Choose the shorter entry to remove, or either if they are the same length 26 | if len(unique_entries[i]) > len(unique_entries[j]): 27 | to_remove.add(unique_entries[i]) 28 | else: 29 | to_remove.add(unique_entries[j]) 30 | 31 | # Filter out the highly similar entries 32 | filtered_entries = {entry for entry in unique_entries if entry not in to_remove} 33 | return filtered_entries 34 | 35 | def extract_log_entries(log_contents): 36 | # Patterns to match different log formats 37 | patterns = [ 38 | re.compile(r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{6}Z\s+(warn|error)\s+\S+\s+(.*)', re.IGNORECASE), 39 | re.compile(r'[WE]\d{4} \d{2}:\d{2}:\d{2}.\d+\s+\d+\s+(.*)'), 40 | re.compile(r'({.*})') 41 | ] 42 | 43 | log_entries = set() 44 | # Attempt to match each line with all patterns 45 | for line in log_contents.split('\n'): 46 | for pattern in patterns: 47 | match = pattern.search(line) 48 | if match: 49 | if match.groups()[0].startswith('{'): 50 | # Handle JSON formatted log entries 51 | try: 52 | log_json = json.loads(match.group(1)) 53 | if 'severity' in log_json and log_json['severity'].lower() in ['error', 'warning']: 54 | level = "Error" if log_json['severity'] == "ERROR" else "Warning" 55 | message = log_json.get('error', '') if 'error' in log_json.keys() else line 56 | log_entries.add(f"{level}: {message.strip()}") 57 | elif 'level' in log_json: 58 | level = "Error" if log_json['level'] == "error" else "Warning" 59 | message = log_json.get('msg', '') + log_json.get('error', '') 60 | log_entries.add(f"{level}: {message.strip()}") 61 | except json.JSONDecodeError: 62 | continue # Skip if JSON is not valid 63 | else: 64 | if len(match.groups()) == 2: 65 | level, message = match.groups() 66 | elif len(match.groups()) == 1: 67 | message = match.group(1) # Assuming error as default 68 | level = "ERROR" # Default if not specified in the log 69 | 70 | level = "Error" if "error" in level.lower() else "Warning" 71 | formatted_message = f"{level}: {message.strip()}" 72 | log_entries.add(formatted_message) 73 | break # Stop after the first match 74 | 75 | return filter_similar_entries(log_entries) -------------------------------------------------------------------------------- /krs/utils/llm_client.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | import subprocess 3 | import os, time 4 | from krs.utils.constants import (MAX_OUTPUT_TOKENS, LLMSTATE_PICKLE_FILEPATH) 5 | 6 | class KrsGPTClient: 7 | 8 | def __init__(self, reinitialize=False, reset_history=False, device='cpu'): 9 | 10 | self.reinitialize = reinitialize 11 | self.client = None 12 | self.pipeline = None 13 | self.provider = None 14 | self.model = None 15 | self.openai_api_key = None 16 | self.continue_chat = False 17 | self.history = [] 18 | self.max_tokens = MAX_OUTPUT_TOKENS 19 | self.device = device 20 | 21 | 22 | if not self.reinitialize: 23 | print("\nLoading LLM State..") 24 | self.load_state() 25 | print("\nModel: ", self.model) 26 | if not self.model: 27 | self.initialize_client() 28 | 29 | self.history = [] if reset_history == True else self.history 30 | 31 | if self.history: 32 | continue_chat = input("\n\nDo you want to continue previous chat ? (y/n) >> ") 33 | while continue_chat not in ['y', 'n']: 34 | print("Please enter either y or n!") 35 | continue_chat = input("\nDo you want to continue previous chat ? (y/n) >> ") 36 | if continue_chat == 'No': 37 | self.history = [] 38 | else: 39 | self.continue_chat = True 40 | 41 | def save_state(self, filename=LLMSTATE_PICKLE_FILEPATH): 42 | state = { 43 | 'provider': self.provider, 44 | 'model': self.model, 45 | 'history': self.history, 46 | 'openai_api_key': self.openai_api_key 47 | } 48 | with open(filename, 'wb') as output: 49 | pickle.dump(state, output, pickle.HIGHEST_PROTOCOL) 50 | 51 | def load_state(self): 52 | try: 53 | with open(LLMSTATE_PICKLE_FILEPATH, 'rb') as f: 54 | state = pickle.load(f) 55 | self.provider = state['provider'] 56 | self.model = state['model'] 57 | self.history = state.get('history', []) 58 | self.openai_api_key = state.get('openai_api_key', '') 59 | if self.provider == 'OpenAI': 60 | self.init_openai_client(reinitialize=True) 61 | elif self.provider == 'huggingface': 62 | self.init_huggingface_client(reinitialize=True) 63 | except (FileNotFoundError, EOFError): 64 | pass 65 | 66 | def install_package(self, package_name): 67 | import importlib 68 | try: 69 | importlib.import_module(package_name) 70 | print(f"\n{package_name} is already installed.") 71 | except ImportError: 72 | print(f"\nInstalling {package_name}...", end='', flush=True) 73 | result = subprocess.run(['pip', 'install', package_name], stdout=subprocess.PIPE, stderr=subprocess.PIPE) 74 | if result.returncode == 0: 75 | print(f" \n{package_name} installed successfully.") 76 | else: 77 | print(f" \nFailed to install {package_name}.") 78 | 79 | 80 | def initialize_client(self): 81 | if not self.client and not self.pipeline: 82 | choice = input("\nChoose the model provider for healthcheck: \n\n[1] OpenAI \n[2] Huggingface\n\n>> ") 83 | if choice == '1': 84 | self.init_openai_client() 85 | elif choice == '2': 86 | self.init_huggingface_client() 87 | else: 88 | raise ValueError("Invalid option selected") 89 | 90 | def init_openai_client(self, reinitialize=False): 91 | 92 | if not reinitialize: 93 | print("\nInstalling necessary libraries..........") 94 | self.install_package('openai') 95 | 96 | import openai 97 | from openai import OpenAI 98 | import getpass 99 | 100 | self.provider = 'OpenAI' 101 | self.openai_api_key = getpass.getpass("\nEnter your OpenAI API key: ") if not reinitialize else self.openai_api_key 102 | self.model = input("\nEnter the OpenAI model name: ") if not reinitialize else self.model 103 | 104 | self.client = OpenAI(api_key=self.openai_api_key) 105 | 106 | if not reinitialize or self.reinitialize: 107 | while True: 108 | try: 109 | self.validate_openai_key() 110 | break 111 | except openai.error.AuthenticationError: 112 | self.openai_api_key = input("\nInvalid Key! Please enter the correct OpenAI API key: ") 113 | except openai.error.InvalidRequestError as e: 114 | print(e) 115 | self.model = input("\nEnter an OpenAI model name from latest OpenAI docs: ") 116 | except openai.APIConnectionError as e: 117 | print(e) 118 | self.init_openai_client(reinitialize=False) 119 | 120 | self.save_state() 121 | 122 | def init_huggingface_client(self, reinitialize=False): 123 | 124 | if not reinitialize: 125 | print("\nInstalling necessary libraries..........") 126 | self.install_package('transformers') 127 | self.install_package('torch') 128 | 129 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 130 | 131 | import warnings 132 | from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer 133 | 134 | warnings.filterwarnings("ignore", category=FutureWarning) 135 | 136 | self.provider = 'huggingface' 137 | self.model = input("\nEnter the Huggingface model name: ") if not reinitialize else self.model 138 | 139 | try: 140 | self.tokenizer = AutoTokenizer.from_pretrained(self.model) 141 | self.model_hf = AutoModelForCausalLM.from_pretrained(self.model) 142 | self.pipeline = pipeline('text-generation', model=self.model_hf, tokenizer=self.tokenizer, device=0 if self.device == 'gpu' else -1) 143 | 144 | except OSError as e: 145 | print("\nError loading model: ", e) 146 | print("\nPlease enter a valid Huggingface model name.") 147 | self.init_huggingface_client(reinitialize=True) 148 | 149 | self.save_state() 150 | 151 | def validate_openai_key(self): 152 | """Validate the OpenAI API key by attempting a small request.""" 153 | response = self.client.chat.completions.create( 154 | model=self.model, 155 | messages=[{"role": "user", "content": "Test prompt, do nothing"}], 156 | max_tokens=5 157 | ) 158 | print("API key and model are valid.") 159 | 160 | def infer(self, prompt): 161 | self.history.append({"role": "user", "content": prompt}) 162 | input_prompt = self.history_to_prompt() 163 | 164 | if self.provider == 'OpenAI': 165 | response = self.client.chat.completions.create( 166 | model=self.model, 167 | messages=input_prompt, 168 | max_tokens = self.max_tokens 169 | ) 170 | output = response.choices[0].message.content.strip() 171 | 172 | elif self.provider == 'huggingface': 173 | responses = self.pipeline(input_prompt, max_new_tokens=self.max_tokens) 174 | output = responses[0]['generated_text'] 175 | 176 | self.history.append({"role": "assistant", "content": output}) 177 | print(">> ", output) 178 | 179 | def interactive_session(self, prompt_input): 180 | print("\nInteractive session started. Type 'end chat' to exit from the session!\n") 181 | 182 | if self.continue_chat: 183 | print('>> ', self.history[-1]['content']) 184 | else: 185 | initial_prompt = prompt_input 186 | self.infer(initial_prompt) 187 | 188 | while True: 189 | prompt = input("\n>> ") 190 | if prompt.lower() == 'end chat': 191 | break 192 | self.infer(prompt) 193 | self.save_state() 194 | 195 | def history_to_prompt(self): 196 | if self.provider == 'OpenAI': 197 | return self.history 198 | elif self.provider == 'huggingface': 199 | return " ".join([item["content"] for item in self.history]) 200 | 201 | if __name__ == "__main__": 202 | client = KrsGPTClient(reinitialize=False) 203 | # client.interactive_session("You are an 8th grade math tutor. Ask questions to gauge my expertise so that you can generate a training plan for me.") 204 | 205 | -------------------------------------------------------------------------------- /mkc.md: -------------------------------------------------------------------------------- 1 | # Install and Configure Krs with MiniKube 2 | 3 | ## Prerequisites 4 | 5 | - Podman, Docker, or Virtual Box (container runtime) 6 | - Kubectl 7 | 8 | ## Getting Started 9 | 10 | ### 1. Setup a MiniKube Kubernetes Cluster on your Local Machine 11 | ``` 12 | curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 13 | sudo install minikube-linux-amd64 /usr/local/bin/minikube && rm minikube-linux-amd64 14 | minikube start 15 | ``` 16 | ![minikube_cluster_done](https://github.com/kubetoolsca/krs/assets/171302280/23638e61-8416-43eb-943f-d8da34346585) 17 | 18 | ### 2. Setup KRS using these commands: 19 | 20 | ``` 21 | git clone https://github.com/kubetoolsca/krs.git 22 | cd krs 23 | pip install . 24 | ``` 25 | 26 | ### 3. Initialize KRS to permit it access to your cluster using the given command, 27 | 28 | ``` 29 | krs init 30 | ``` 31 | 32 | ### 4. Get a view of all possible actions with KRS, by running the given command 33 | ``` 34 | krs --help 35 | ``` 36 | 37 | ``` 38 | krs --help 39 | 40 | Usage: krs [OPTIONS] COMMAND [ARGS]... 41 | 42 | krs: A command line interface to scan your Kubernetes Cluster, detect errors, 43 | provide resolutions using LLMs and recommend latest tools for your cluster 44 | 45 | ╭─ Options ────────────────────────────────────────────────────────────────────╮ 46 | │ --install-completion Install completion for the current shell. │ 47 | │ --show-completion Show completion for the current shell, to copy │ 48 | │ it or customize the installation. │ 49 | │ --help Show this message and exit. │ 50 | ╰──────────────────────────────────────────────────────────────────────────────╯ 51 | ╭─ Commands ───────────────────────────────────────────────────────────────────╮ 52 | │ exit Ends krs services safely and deletes all state files from │ 53 | │ system. Removes all cached data. │ 54 | │ export Exports pod info with logs and events. │ 55 | │ health Starts an interactive terminal using an LLM of your choice to │ 56 | │ detect and fix issues with your cluster │ 57 | │ init Initializes the services and loads the scanner. │ 58 | │ namespaces Lists all the namespaces. │ 59 | │ pods Lists all the pods with namespaces, or lists pods under a │ 60 | │ specified namespace. │ 61 | │ recommend Generates a table of recommended tools from our ranking │ 62 | │ database and their CNCF project status. │ 63 | │ scan Scans the cluster and extracts a list of tools that are │ 64 | │ currently used. │ 65 | ╰──────────────────────────────────────────────────────────────────────────────╯ 66 | ``` 67 | ### 5. Permit KRS to get information on the tools utilized in your cluster by running the given command 68 | 69 | ``` 70 | krs scan 71 | ``` 72 | 73 | ``` 74 | krs scan 75 | 76 | Scanning your cluster... 77 | 78 | Cluster scanned successfully... 79 | 80 | Extracted tools used in cluster... 81 | 82 | 83 | The cluster is using the following tools: 84 | 85 | +-------------+--------+------------------+---------------+ 86 | | Tool Name | Rank | Category | CNCF Status | 87 | +=============+========+==================+===============+ 88 | | cilium | 1 | Network Policies | graduated | 89 | +-------------+--------+------------------+---------------+ 90 | | hubble | 7 | Security Tools | listed | 91 | +-------------+--------+------------------+---------------+ 92 | 93 | ``` 94 | 95 | ### 6. Get recommendations on possible tools to use in your cluster by running the given command 96 | 97 | ``` 98 | krs recommend 99 | ``` 100 | 101 | ``` 102 | krs recommend 103 | 104 | Our recommended tools for this deployment are: 105 | 106 | +------------------+------------------------+-------------+---------------+ 107 | | Category | Recommendation | Tool Name | CNCF Status | 108 | +==================+========================+=============+===============+ 109 | | Network Policies | Already using the best | cilium | graduated | 110 | +------------------+------------------------+-------------+---------------+ 111 | | Security Tools | Recommended tool | trivy | listed | 112 | +------------------+------------------------+-------------+---------------+ 113 | 114 | ``` 115 | 116 | ### 7. Check the pod and namespace status in your Kubernetes cluster, including errors. 117 | 118 | ``` 119 | krs health 120 | ``` 121 | 122 | ``` 123 | krs health 124 | 125 | Starting interactive terminal... 126 | 127 | 128 | Choose the model provider for healthcheck: 129 | 130 | [1] OpenAI 131 | [2] Huggingface 132 | 133 | >> 1 134 | 135 | Installing necessary libraries.......... 136 | 137 | openai is already installed. 138 | 139 | Enter your OpenAI API key: sk-proj-qxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxP 140 | 141 | Enter the OpenAI model name: gpt-3.5-turbo 142 | API key and model are valid. 143 | 144 | Namespaces in the cluster: 145 | 146 | 1. default 147 | 2. kube-node-lease 148 | 3. kube-public 149 | 4. kube-system 150 | 5. portainer 151 | 152 | Which namespace do you want to check the health for? Select a namespace by entering its number: >> 4 153 | 154 | Pods in the namespace kube-system: 155 | 156 | 1. cilium-9lqbq 157 | 2. cilium-ffpct 158 | 3. cilium-pvknr 159 | 4. coredns-85f59d8784-nvr2n 160 | 5. coredns-85f59d8784-p9jcv 161 | 6. cpc-bridge-proxy-c6xzr 162 | 7. cpc-bridge-proxy-p7r4p 163 | 8. cpc-bridge-proxy-tkfrd 164 | 9. csi-do-node-hwxn7 165 | 10. csi-do-node-q27rc 166 | 11. csi-do-node-rn7dm 167 | 12. do-node-agent-6t5ms 168 | 13. do-node-agent-85r8b 169 | 14. do-node-agent-m7bvr 170 | 15. hubble-relay-74686df4df-856pj 171 | 16. hubble-ui-86cc69bddc-xc745 172 | 17. konnectivity-agent-9k8vk 173 | 18. konnectivity-agent-h5fm2 174 | 19. konnectivity-agent-kf4xh 175 | 20. kube-proxy-94945 176 | 21. kube-proxy-qgv4j 177 | 22. kube-proxy-vztzf 178 | 179 | Which pod from kube-system do you want to check the health for? Select a pod by entering its number: >> 1 180 | 181 | Checking status of the pod... 182 | 183 | Extracting logs and events from the pod... 184 | 185 | Logs and events from the pod extracted successfully! 186 | 187 | 188 | Interactive session started. Type 'end chat' to exit from the session! 189 | 190 | >> The log entries provided are empty {}, so there is nothing to analyze. Therefore, I can confirm that 'Everything looks good!' in this case. 191 | 192 | If there were warnings or errors in the log entries, I would have analyzed them thoroughly to identify the root cause. Depending on the specific warnings or errors, potential steps to resolve the issues could include: 193 | 194 | 1. Analyzing the specific error message to understand the problem 195 | 2. Checking Kubernetes resources (e.g., pods, deployments, configmaps) for any misconfigurations 196 | 3. Verifying connectivity to external resources or dependencies 197 | 4. Checking for resource limitations or constraints that could be causing issues 198 | 5. Reviewing recent changes in the Kubernetes environment that could have introduced problems 199 | 6. Using Kubernetes troubleshooting tools like kubectl logs, describe, or events to gather more information 200 | 201 | By following these steps and addressing any identified issues, you can resolve warnings or errors in the Kubernetes environment. 202 | 203 | >> Wonderful, anything else to note? 204 | >> In addition to resolving warnings or errors in Kubernetes logs, it's important to regularly monitor and maintain the Kubernetes environment to ensure smooth operation. Some best practices for Kubernetes maintenance include: 205 | 206 | 1. Regularly updating Kubernetes components and cluster nodes to the latest stable versions to benefit from bug fixes and security patches. 207 | 2. Implementing automated backups of critical data and configurations to prevent data loss in case of failures. 208 | 3. Monitoring resource utilization and scaling components as needed to optimize performance and cost efficiency. 209 | 4. Implementing security best practices, such as network policies, RBAC, and pod security policies to protect the cluster from unauthorized access or malicious activities. 210 | 5. Conducting regular health checks and performance tuning to identify and address any bottlenecks or inefficiencies in the cluster. 211 | 6. Developing a disaster recovery plan and testing it periodically to ensure business continuity in case of unexpected events. 212 | 213 | By following these maintenance practices, you can ensure the stability, scalability, and security of your Kubernetes environment for optimal DevOps operations. 214 | 215 | >> Alright, so are we done? 216 | >> Yes, we have covered the analysis of the Kubernetes log entries and discussed best practices for Kubernetes maintenance. If you have any more questions or need further assistance, feel free to ask. Otherwise, we can consider this conversation complete. 217 | 218 | >> Wonderful. Thanks! end chat 219 | >> You're welcome! If you have any more questions in the future, feel free to reach out. Have a great day! Goodbye! 220 | 221 | >> end chat 222 | 223 | 224 | ``` 225 | -------------------------------------------------------------------------------- /samples/install-tools.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Update Helm repository cache 4 | helm repo update 5 | 6 | # Install Kubeshark 7 | helm repo add kubeshark https://helm.kubeshark.co 8 | helm install kubeshark kubeshark/kubeshark 9 | 10 | ## Installing Portainer 11 | 12 | kubectl create namespace portainer 13 | helm repo add portainer https://portainer.github.io/k8s/ 14 | helm repo update 15 | 16 | # Dry run Portainer installation to see what gets installed 17 | helm install --dry-run --debug portainer -n portainer deploy/helm/portainer 18 | 19 | # Install Portainer 20 | helm upgrade -i -n portainer portainer portainer/portainer 21 | 22 | 23 | echo "Kubeshark and Portainer installed successfully!" 24 | -------------------------------------------------------------------------------- /samples/uninstall-tools.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Function to uninstall a Helm release 4 | uninstall_helm_release() { 5 | local release_name="$1" 6 | helm uninstall "$release_name" || true # Suppress errors if release not found 7 | } 8 | 9 | # Update Helm repository cache 10 | helm repo update 11 | 12 | 13 | # Uninstall Kubeshark 14 | uninstall_helm_release kubeshark 15 | 16 | # Uninstall Portainer 17 | uninstall_helm_release portainer 18 | 19 | ## deleting the namespaces 20 | kubectl delete ns portainer 21 | kubectl delete ns kubeshark 22 | 23 | echo "Kubeshark and Portainer uninstalled (if previously installed)." 24 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | 3 | # Read the requirements.txt file for dependencies 4 | with open('krs/requirements.txt') as f: 5 | requirements = f.read().splitlines() 6 | 7 | setup( 8 | name='krs', 9 | version='0.1.0', 10 | description='Kubernetes Recommendation Service with LLM integration', 11 | author='Abhijeet Mazumdar , Karan Singh & Ajeet Singh Raina', 12 | author_email='abhijeet@kubetools.ca, karan@kubetools.ca, ajeet@kubetools.ca', 13 | url='https://github.com/kubetoolsca/krs', 14 | packages=find_packages(), 15 | include_package_data=True, 16 | install_requires=requirements, 17 | entry_points={ 18 | 'console_scripts': [ 19 | 'krs=krs.krs:app', # Adjust the module and function path as needed 20 | ], 21 | }, 22 | classifiers=[ 23 | 'Programming Language :: Python :: 3', 24 | 'License :: OSI Approved :: MIT License', 25 | 'Operating System :: OS Independent', 26 | ], 27 | python_requires='>=3.6', 28 | ) 29 | --------------------------------------------------------------------------------